Thank you @ZINDI for the awesome competition! Big congrats to those who were able to build the robust models for private and to those who flourished on the public LB with models tailored to those specific users!
I feel there were way too many possibilities for approaching this competition and I think I've tried most of them and would really like to hear your approach.
The approach that gave most of my submissions a limit of around 0.9099 private and 0.91157 public:
Creation of intuitive features, like months_left till the end of a competition, competition duration, cumsum features to capture a users activity over time, and a calculation that indicated how frequently a user made submissions in a month.
When it came to modeling I separated the users into groups of 3 for predicting the next month, and groups of 2 for predicting the following 2 months. Users were grouped based on when they joined Zindi. Grouping users allowed me to tune models to the max and still get a reliable CV. This gave me my final boost on public leaderboard, from 0.907 - 0.911.
Only training models on year 3 data gave me the first boost from 0.90 - 0.907. This made sense as half of the users joined in year 3.
I don't feel I found any magic and I'm really curious as to how you approached this problem, given all the data.
You can find my solution fully commented here:
https://www.kaggle.com/danielbruintjies/zindi-user-behaviour-5th-place-solution/notebook
Awesome approach @DanielBruintjies, thanks for sharing.
@Professor Thank you
Wow this is enlightening.
Amazing ! Congratulations and thanks for sharing your approach!
@100i Thanks
Your approach is truly magical @DanielBruintjies .
Thanks for sharing.
Thank you @TRIUMPHANTPRINCE
Amazing approach @DanielBruintjies kudos!
Thank you @Data-Man
Welldone!
@flamethrower Thank you
good one. thanks for sharing.
Thank you @eat-sleep-ai-repeat