I created a super clean baseline code from scratch using catboost and shap which can be used as a starting point for participants. It was able to achieve a final score of 10.05605957 using the Root Mean Squared Error metric, which put me in position 12 on the leaderboard.
I hope that this baseline code will be helpful for the participants of this competition and that it will encourage more people to join the competition and share their solutions.
If you are interested in learning more about my approach and results, I have created a video on my youtube channel, you can watch it here: https://www.youtube.com/watch?v=Wchnhp3HHDo.
I also make the code public on Kaggle: https://www.kaggle.com/code/tauilabdelilah/datadrive2030-early-learning-predictors-baseline?scriptVersionId=120130541
I would like to thank the @Zindi team and DataDrive2030 for organizing the competition and for giving me the opportunity to contribute to it.