Primary competition visual

African Credit Scoring Challenge

Helping Africa
$5 000 USD
Completed (~1 year ago)
1959 joined
1022 active
Starti
Nov 29, 24
Closei
Jan 12, 25
Reveali
Jan 13, 25
User avatar
Mohamed_Elnageeb
University of khartoum
32nd Solution notebook
Notebooks · 14 Jan 2025, 16:32 · 4

Hello Zindians! 👋

I’m glad to share my solution to this Competition, which was actually my first competition in almost 2 years! it was a month fill of fun, failures, and lots of learning.

You can check out the full code here: https://www.kaggle.com/code/thelastsmilodon/zindi-s-africa-credit-challenge-solution . 🎉

For my solution I got 32nd place but could have gotten 16th if I had trusted my Cv more (a lesson well learned). My approach consisted of training four base models (XGBoost, CatBoost, LightGBM, and MLP ) and blending them using a stacking ensemble. No Post-processing or "magic" was used.

  • What worked :
1. Feature Engineering : this was probably the most impactful step, the original features had very low correlation to the target but adding features like repayment_ratio = Total_Amount_to_Repay/Total_Amount had a high correlation (0.64) with the target.
2. Adding economic indicators data 
3. hyperparameter optimisation using optuna
4. Stratified group (by customer_id) K-Fold CV 
5. Feature Selection (inspired by https://www.kaggle.com/code/prashant111/comprehensive-guide-on-feature-selection/notebook)

  • What didn't work :
1. target encoding
2. SMOTE
3. SVM and Tabnet and other models
4. post processing (changing the 0.5 threshold worsened score)

Discussion 4 answers

Congratulations for your good performance and thank you for sharing.

15 Jan 2025, 09:49
Upvotes 1
User avatar
Mohamed_Elnageeb
University of khartoum

Thanks and congratulations to you too .

Would love to hear what worked for you too.

I did a lot of feature engineering ( mostly financial ratios, i even divided the debt rate by the duration ), i mostly worked with random forest classifier, probability threshold adjustement after predict_proba results to capture more true positives cases. I didn't have time to tune several models and then perform blending/stacking. This competition was tough for me but i'm still a beginner, i will continue to learn.

User avatar
Ecole Supérieure de la Statistique et de l'Analyse de l'Information

Congratulations to you! Unfortunately, the public/private leaderboard dilemma cost us dearly. Our biggest mistake was focusing too much on the public leaderboard, which resulted in a disappointing 106th place on the private leaderboard. What’s even more frustrating is that we had a submission capable of securing 31st place, but we overwrote it due to not saving our trials properly. This lack of organization ultimately cost us a much better rank.

16 Jan 2025, 03:17
Upvotes 0