Setting up train and test will be different and challenging for everyone. I believe the winner in the competition will be one who most carefully solves the challenge of setting up train and test. It's not easy for anyone right now, so keep your grey cells thinking.
Thank you for your response brother . I have already done the same steps you did, but when i merged test set with the others i obtained rows more than the demanded ones in the sample submission files . This is why i ma in trouble . I will be thankful if you help tackling this issue
@engineer I guess he meant that when he dropped duplicates of Policy ID and separated the test from the train, the train now has only one unique label (ie 1) as against 2 ('?' and '1') before separation..
I think that has to do with the fact that a single policy can cover son, wife and mother as stated in the info ... Read the competition info again .. it should help you understand.
Setting up train and test will be different and challenging for everyone. I believe the winner in the competition will be one who most carefully solves the challenge of setting up train and test. It's not easy for anyone right now, so keep your grey cells thinking.
Thank you for your reply
lol. zindi hackathons are no more model.fit() , model.predict() .......xD
If you don't have a meaningful reply, don't reply please . Thank you
If you don't have a meaningful reply to my absurd reply, don't reply please. Thank you :)
Lol..you are so funny @krishna_priya
@Krishna_Priya lol :)
If the difficulty is about the syntax and technique here is what I did ,I don't know if I am right tho.
Start from train.csv
Train = pandas.read_csv("train.csv")
Test_set =Train.loc[train["lapse"]=="?"]
Train_set = =Train.loc[train["lapse"] !="?"]
After this you merge train_set with the others based on policy_id
If you spot any logical error please point it out so I can correct on my code.
If your question was about how to model your data you need to tell me what approach you chosen to use then I will help you model it
Thank you for your response brother . I have already done the same steps you did, but when i merged test set with the others i obtained rows more than the demanded ones in the sample submission files . This is why i ma in trouble . I will be thankful if you help tackling this issue
try drop_duplicates(subset='Policy ID')
Check for the shape of the sample submission files and type this.. example if the shape is 43097do this , to get the test set.. test=test[:43097]
I tried it thank you bro
I tried drop_dublicates to tackle this problem . but the problem of one label persist
what was the problem; kindly tell the error, so i can help out
@engineer I guess he meant that when he dropped duplicates of Policy ID and separated the test from the train, the train now has only one unique label (ie 1) as against 2 ('?' and '1') before separation..
Yes exactly my friend. Should I fit my model only on one label ?
I think that has to do with the fact that a single policy can cover son, wife and mother as stated in the info ... Read the competition info again .. it should help you understand.
I too am not quite sure of what to do in this case.
yes! u ' ll have only a target value == 1, go ahead and predict probability; it would work
Ok. Thank u.
when i try to fit model , they show error. [y_true contains only one label (1)]
Hi Oussema, What is the final shape of train data?