🛡️ This Week on Zindi: Difficulties sorting and mergi...

The Zimnat Insurance Assurance Challenge by #ZindiWeekendz

Helping Africa

$300 USD

Challenge completed over 5 years ago

Skills you will learn

Prediction

295 joined

105 active

Info Data Chat Leaderboard

Start

May 22, 20

May 24, 20

Reveal

May 24, 20

Woella

Difficulties sorting and merging the datasets

Help · 22 May 2020, 22:52 · 9

I am having trouble with merging the datasets in such a way that the final dataset I want to use to train my algorithm has only unique policy ID's, just as in the training set. In the client data set policy ID'd occur multiple times for example, and so in the policy and payment data set. Any tips?? Thanks in advance :-)

Discussion 9 answers

ruerue

Hey try this:

drop_duplicates(subset='Policy ID')

22 May 2020, 23:16

Upvotes 0

Woella

Thanks for your reply! However, then a lot of information will just be removed right? And all that information could be of interest! Now your command simply keeps a random right?

replied to ruerue23 May 2020, 08:09

Upvotes 0

Taps1

University of zimbabwe

To merge two data frames datasets use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join).

# merge two data frames by ID total <- merge(data frameA,data frameB,by="ID")

# merge two data frames by ID and Country total <- merge(data frameA,data frameB,by=c("ID","Country"))

Professor Tapiwa Mandere

replied to Woella12 Oct 2021, 12:52

Upvotes 0

If you use Pandas, use .groupby('Policy ID').agg(); and then merge datasets together by Policy ID

23 May 2020, 06:22 (edited less than a minute later)

Upvotes 0

hi all,

I have ask question, how to get the Lapse= 0 because all the lapse in train is 1?

replied to AK23 May 2020, 06:37

Upvotes 0

temmyzeus

Federal University of Technology Akure

train['Lapse' ] = np.where( ( train.Lapse == "?" ) & ( train['Lapse Year'] == "?" ), 0,1)

This should work

replied to mi23 May 2020, 07:29 (edited ~1 hour later)

Upvotes 0

thanks @temmyzeus

replied to temmyzeus23 May 2020, 07:50

Upvotes 0

Woella

Thanks for your reply! What agg option do you use then? You can't sum up the sex variables or take the mean for example I guess ... So what argument do you use?

replied to AK23 May 2020, 08:11

Upvotes 0

First or last for text vars

replied to Woella23 May 2020, 09:14

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status