When I am only using the Users dataset the accuracy is high. But when I join other data sets the auc score is decreasing. Any insights would be appretitated.
You might want to do the feature engineering on appended columns from other data sets or do appropriate aggregation first and then combine the data sets. See what happens.
The trainset has years 1,2,3 but testset has future year 4 with most user ids needing predictions for month 1,2&3 year 4. This means we have a forecasting problem and need to populate the trainset with aggregated data from the other datasets for each month. The test set can then be extracted from the trainset using all data from year 3 month 12.
We then need to get the targets for the current month using the Target column of the next 3 months-train.groupby(userid).shift(-1/-2/-3). Then train a model for each target month1/2/3, and predict on user samples from year 3 month 12.
@basketball stars That usually means the extra datasets are adding noise or leakage, so try cleaning, feature-selecting, or engineering only the useful fields before joining to see if performance improves.
You might want to do the feature engineering on appended columns from other data sets or do appropriate aggregation first and then combine the data sets. See what happens.
Hi Hari,
The trainset has years 1,2,3 but testset has future year 4 with most user ids needing predictions for month 1,2&3 year 4. This means we have a forecasting problem and need to populate the trainset with aggregated data from the other datasets for each month. The test set can then be extracted from the trainset using all data from year 3 month 12.
We then need to get the targets for the current month using the Target column of the next 3 months-train.groupby(userid).shift(-1/-2/-3). Then train a model for each target month1/2/3, and predict on user samples from year 3 month 12.
@basketball stars That usually means the extra datasets are adding noise or leakage, so try cleaning, feature-selecting, or engineering only the useful fields before joining to see if performance improves.