I am having trouble with merging the datasets in such a way that the final dataset I want to use to train my algorithm has only unique policy ID's, just as in the training set. In the client data set policy ID'd occur multiple times for example, and so in the policy and payment data set. Any tips?? Thanks in advance :-)
Hey try this:
Thanks for your reply! However, then a lot of information will just be removed right? And all that information could be of interest! Now your command simply keeps a random right?
If you use Pandas, use .groupby('Policy ID').agg(); and then merge datasets together by Policy ID
I have ask question, how to get the Lapse= 0 because all the lapse in train is 1?
train['Lapse' ] = np.where( ( train.Lapse == "?" ) & ( train['Lapse Year'] == "?" ), 0,1)
This should work
Thanks for your reply! What agg option do you use then? You can't sum up the sex variables or take the mean for example I guess ... So what argument do you use?
First or last for text vars