🛡️ Data Talk: Train and test setting up ?

The Zimnat Insurance Assurance Challenge by #ZindiWeekendz

Helping Africa

$300 USD

Completed (~6 years ago)

Skills you will learn

Prediction

295 joined

105 active

Info Data Chat Leaderboard

Start

May 22, 20

May 24, 20

Reveal

May 24, 20

OussemaDS

Higher school of communication of tunis

Train and test setting up ?

Data · 22 May 2020, 17:08 · 22

Greeting everyone,

I'm facing difficults regarding to creating my test df and my train df while merging files.

Anyone can help me please.

Thank you

Discussion 22 answers

devnikhilmishra

Setting up train and test will be different and challenging for everyone. I believe the winner in the competition will be one who most carefully solves the challenge of setting up train and test. It's not easy for anyone right now, so keep your grey cells thinking.

22 May 2020, 17:11

Upvotes 0

OussemaDS

Higher school of communication of tunis

Thank you for your reply

replied to devnikhilmishra22 May 2020, 17:32

Upvotes 0

Krishna_Priya

lol. zindi hackathons are no more model.fit() , model.predict() .......xD

22 May 2020, 17:28

Upvotes 0

OussemaDS

Higher school of communication of tunis

If you don't have a meaningful reply, don't reply please . Thank you

replied to Krishna_Priya22 May 2020, 17:35

Upvotes 0

Krishna_Priya

If you don't have a meaningful reply to my absurd reply, don't reply please. Thank you :)

replied to OussemaDS22 May 2020, 17:55

Upvotes 0

Engineer

Lol..you are so funny @krishna_priya

replied to Krishna_Priya22 May 2020, 21:00

Upvotes 0

chetan-ambi

@Krishna_Priya lol :)

replied to Krishna_Priya23 May 2020, 07:30

Upvotes 0

kratos

If the difficulty is about the syntax and technique here is what I did ,I don't know if I am right tho.

Start from train.csv

Train = pandas.read_csv("train.csv")

Test_set =Train.loc[train["lapse"]=="?"]

Train_set = =Train.loc[train["lapse"] !="?"]

After this you merge train_set with the others based on policy_id

If you spot any logical error please point it out so I can correct on my code.

If your question was about how to model your data you need to tell me what approach you chosen to use then I will help you model it

22 May 2020, 18:51

Upvotes 0

OussemaDS

Higher school of communication of tunis

Thank you for your response brother . I have already done the same steps you did, but when i merged test set with the others i obtained rows more than the demanded ones in the sample submission files . This is why i ma in trouble . I will be thankful if you help tackling this issue

replied to kratos22 May 2020, 19:03

Upvotes 0

Sahilkid

try drop_duplicates(subset='Policy ID')

replied to OussemaDS22 May 2020, 20:55

Upvotes 0

Engineer

Check for the shape of the sample submission files and type this.. example if the shape is 43097do this , to get the test set.. test=test[:43097]

replied to OussemaDS22 May 2020, 21:05

Upvotes 0

OussemaDS

Higher school of communication of tunis

I tried it thank you bro

replied to Sahilkid22 May 2020, 21:15

Upvotes 0

OussemaDS

Higher school of communication of tunis

I tried drop_dublicates to tackle this problem . but the problem of one label persist

replied to Engineer22 May 2020, 21:16

Upvotes 0

Engineer

what was the problem; kindly tell the error, so i can help out

replied to OussemaDS22 May 2020, 22:29

Upvotes 0

Gozie

Freelance

@engineer I guess he meant that when he dropped duplicates of Policy ID and separated the test from the train, the train now has only one unique label (ie 1) as against 2 ('?' and '1') before separation..

replied to Engineer22 May 2020, 22:46

Upvotes 0

OussemaDS

Higher school of communication of tunis

Yes exactly my friend. Should I fit my model only on one label ?

replied to Gozie22 May 2020, 22:52

Upvotes 0

kratos

I think that has to do with the fact that a single policy can cover son, wife and mother as stated in the info ... Read the competition info again .. it should help you understand.

replied to OussemaDS22 May 2020, 23:41

Upvotes 0

Gozie

Freelance

I too am not quite sure of what to do in this case.

replied to OussemaDS22 May 2020, 23:44

Upvotes 0

Engineer

yes! u ' ll have only a target value == 1, go ahead and predict probability; it would work

replied to Gozie23 May 2020, 11:33 (edited less than a minute later)

Upvotes 0

Gozie