Primary competition visual

The Zimnat Insurance Assurance Challenge by #ZindiWeekendz

Helping Africa
$300 USD
Challenge completed over 5 years ago
Prediction
295 joined
105 active
Starti
May 22, 20
Closei
May 24, 20
Reveali
May 24, 20
User avatar
Higher school of communication of tunis
Train and test setting up ?
Data · 22 May 2020, 17:08 · 22

Greeting everyone,

I'm facing difficults regarding to creating my test df and my train df while merging files.

Anyone can help me please.

Thank you

Discussion 22 answers

Setting up train and test will be different and challenging for everyone. I believe the winner in the competition will be one who most carefully solves the challenge of setting up train and test. It's not easy for anyone right now, so keep your grey cells thinking.

22 May 2020, 17:11
Upvotes 0
User avatar
Higher school of communication of tunis

Thank you for your reply

User avatar
Krishna_Priya

lol. zindi hackathons are no more model.fit() , model.predict() .......xD

22 May 2020, 17:28
Upvotes 0
User avatar
Higher school of communication of tunis

If you don't have a meaningful reply, don't reply please . Thank you

User avatar
Krishna_Priya

If you don't have a meaningful reply to my absurd reply, don't reply please. Thank you :)

If the difficulty is about the syntax and technique here is what I did ,I don't know if I am right tho.

Start from train.csv

Train = pandas.read_csv("train.csv")

Test_set =Train.loc[train["lapse"]=="?"]

Train_set = =Train.loc[train["lapse"] !="?"]

After this you merge train_set with the others based on policy_id

If you spot any logical error please point it out so I can correct on my code.

If your question was about how to model your data you need to tell me what approach you chosen to use then I will help you model it

22 May 2020, 18:51
Upvotes 0
User avatar
Higher school of communication of tunis

Thank you for your response brother . I have already done the same steps you did, but when i merged test set with the others i obtained rows more than the demanded ones in the sample submission files . This is why i ma in trouble . I will be thankful if you help tackling this issue

try drop_duplicates(subset='Policy ID')

Check for the shape of the sample submission files and type this.. example if the shape is 43097do this , to get the test set.. test=test[:43097]

User avatar
Higher school of communication of tunis

I tried it thank you bro

User avatar
Higher school of communication of tunis

I tried drop_dublicates to tackle this problem . but the problem of one label persist

what was the problem; kindly tell the error, so i can help out

@engineer I guess he meant that when he dropped duplicates of Policy ID and separated the test from the train, the train now has only one unique label (ie 1) as against 2 ('?' and '1') before separation..

User avatar
Higher school of communication of tunis

Yes exactly my friend. Should I fit my model only on one label ?

I think that has to do with the fact that a single policy can cover son, wife and mother as stated in the info ... Read the competition info again .. it should help you understand.

I too am not quite sure of what to do in this case.

yes! u ' ll have only a target value == 1, go ahead and predict probability; it would work

when i try to fit model , they show error. [y_true contains only one label (1)]

Hi Oussema, What is the final shape of train data?