Primary competition visual

Zindi User Behaviour Birthday Challenge

Helping Africa
$3 000 USD
Completed (~4 years ago)
Prediction
871 joined
174 active
Starti
Sep 24, 21
Closei
Jan 23, 22
Reveali
Jan 23, 22
Datasets
Help · 27 Dec 2021, 17:53 · 3

When I am only using the Users dataset the accuracy is high. But when I join other data sets the auc score is decreasing. Any insights would be appretitated.

Discussion 3 answers

You might want to do the feature engineering on appended columns from other data sets or do appropriate aggregation first and then combine the data sets. See what happens.

27 Dec 2021, 18:03
Upvotes 0
User avatar
21db

Hi Hari,

The trainset has years 1,2,3 but testset has future year 4 with most user ids needing predictions for month 1,2&3 year 4. This means we have a forecasting problem and need to populate the trainset with aggregated data from the other datasets for each month. The test set can then be extracted from the trainset using all data from year 3 month 12.

We then need to get the targets for the current month using the Target column of the next 3 months-train.groupby(userid).shift(-1/-2/-3). Then train a model for each target month1/2/3, and predict on user samples from year 3 month 12.

13 Jan 2022, 12:30
Upvotes 0

@basketball stars That usually means the extra datasets are adding noise or leakage, so try cleaning, feature-selecting, or engineering only the useful fields before joining to see if performance improves.

9 Dec 2025, 07:51
Upvotes 0