The Zimnat Insurance Assurance Challenge by #ZindiWeekendz
$300 USD
Predict when an insurance policy will lapse in Zimbabwe
314 data scientists enrolled, 97 on the leaderboard
22 May—24 May
59.983333333333334 hours
Train and test setting up ?

Greeting everyone,

I'm facing difficults regarding to creating my test df and my train df while merging files.

Anyone can help me please.

Thank you

Setting up train and test will be different and challenging for everyone. I believe the winner in the competition will be one who most carefully solves the challenge of setting up train and test. It's not easy for anyone right now, so keep your grey cells thinking.

lol. zindi hackathons are no more , model.predict() .......xD

If you don't have a meaningful reply, don't reply please . Thank you

If you don't have a meaningful reply to my absurd reply, don't reply please. Thank you :)

If the difficulty is about the syntax and technique here is what I did ,I don't know if I am right tho.

Start from train.csv

Train = pandas.read_csv("train.csv")

Test_set =Train.loc[train["lapse"]=="?"]

Train_set = =Train.loc[train["lapse"] !="?"]

After this you merge train_set with the others based on policy_id

If you spot any logical error please point it out so I can correct on my code.

If your question was about how to model your data you need to tell me what approach you chosen to use then I will help you model it

Thank you for your response brother . I have already done the same steps you did, but when i merged test set with the others i obtained rows more than the demanded ones in the sample submission files . This is why i ma in trouble . I will be thankful if you help tackling this issue

Check for the shape of the sample submission files and type this.. example if the shape is 43097do this , to get the test set.. test=test[:43097]

I tried drop_dublicates to tackle this problem . but the problem of one label persist

what was the problem; kindly tell the error, so i can help out

@engineer I guess he meant that when he dropped duplicates of Policy ID and separated the test from the train, the train now has only one unique label (ie 1) as against 2 ('?' and '1') before separation..

Yes exactly my friend. Should I fit my model only on one label ?

I too am not quite sure of what to do in this case.

yes! u ' ll have only a target value == 1, go ahead and predict probability; it would work

when i try to fit model , they show error. [y_true contains only one label (1)]

Hi Oussema, What is the final shape of train data?

I think that has to do with the fact that a single policy can cover son, wife and mother as stated in the info ... Read the competition info again .. it should help you understand.