I am having trouble with merging the datasets in such a way that the final dataset I want to use to train my algorithm has only unique policy ID's, just as in the training set. In the client data set policy ID'd occur multiple times for example, and so in the policy and payment data set. Any tips?? Thanks in advance :-)
Hey try this:
drop_duplicates(subset='Policy ID')
Thanks for your reply! However, then a lot of information will just be removed right? And all that information could be of interest! Now your command simply keeps a random right?
To merge two data frames datasets use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join).
# merge two data frames by ID total <- merge(data frameA,data frameB,by="ID")
# merge two data frames by ID and Country total <- merge(data frameA,data frameB,by=c("ID","Country"))
Professor Tapiwa Mandere
If you use Pandas, use .groupby('Policy ID').agg(); and then merge datasets together by Policy ID
hi all,
I have ask question, how to get the Lapse= 0 because all the lapse in train is 1?
train['Lapse' ] = np.where( ( train.Lapse == "?" ) & ( train['Lapse Year'] == "?" ), 0,1)
This should work
thanks @temmyzeus
Thanks for your reply! What agg option do you use then? You can't sum up the sex variables or take the mean for example I guess ... So what argument do you use?
First or last for text vars