Primary competition visual

The Zimnat Insurance Assurance Challenge by #ZindiWeekendz

Helping Africa
$300 USD
Challenge completed over 5 years ago
Prediction
295 joined
105 active
Starti
May 22, 20
Closei
May 24, 20
Reveali
May 24, 20
Difficulties sorting and merging the datasets
Help · 22 May 2020, 22:52 · 9

I am having trouble with merging the datasets in such a way that the final dataset I want to use to train my algorithm has only unique policy ID's, just as in the training set. In the client data set policy ID'd occur multiple times for example, and so in the policy and payment data set. Any tips?? Thanks in advance :-)

Discussion 9 answers

Hey try this:

drop_duplicates(subset='Policy ID')

22 May 2020, 23:16
Upvotes 0

Thanks for your reply! However, then a lot of information will just be removed right? And all that information could be of interest! Now your command simply keeps a random right?

User avatar
University of zimbabwe

To merge two data frames datasets use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join).

# merge two data frames by ID total <- merge(data frameA,data frameB,by="ID")

# merge two data frames by ID and Country total <- merge(data frameA,data frameB,by=c("ID","Country"))

Professor Tapiwa Mandere

If you use Pandas, use .groupby('Policy ID').agg(); and then merge datasets together by Policy ID

hi all,

I have ask question, how to get the Lapse= 0 because all the lapse in train is 1?

User avatar
Federal University of Technology Akure

train['Lapse' ] = np.where( ( train.Lapse == "?" ) & ( train['Lapse Year'] == "?" ), 0,1)

This should work

Thanks for your reply! What agg option do you use then? You can't sum up the sex variables or take the mean for example I guess ... So what argument do you use?

First or last for text vars