💳 Let's Talk About: 32nd Solution notebook

African Credit Scoring Challenge

Helping Africa

$5 000 USD

Completed (over 1 year ago)

Skills you will learn

2006 joined

1020 active

Info Data Chat Leaderboard

Start

Nov 29, 24

Jan 12, 25

Reveal

Jan 13, 25

Mohamed_Elnageeb

University of khartoum

32nd Solution notebook

Notebooks · 14 Jan 2025, 16:32 · 4

Hello Zindians! 👋

I’m glad to share my solution to this Competition, which was actually my first competition in almost 2 years! it was a month fill of fun, failures, and lots of learning.

You can check out the full code here: https://www.kaggle.com/code/thelastsmilodon/zindi-s-africa-credit-challenge-solution . 🎉

For my solution I got 32nd place but could have gotten 16th if I had trusted my Cv more (a lesson well learned). My approach consisted of training four base models (XGBoost, CatBoost, LightGBM, and MLP ) and blending them using a stacking ensemble. No Post-processing or "magic" was used.

What worked :

1. Feature Engineering : this was probably the most impactful step, the original features had very low correlation to the target but adding features like repayment_ratio = Total_Amount_to_Repay/Total_Amount had a high correlation (0.64) with the target.
2. Adding economic indicators data 
3. hyperparameter optimisation using optuna
4. Stratified group (by customer_id) K-Fold CV 
5. Feature Selection (inspired by https://www.kaggle.com/code/prashant111/comprehensive-guide-on-feature-selection/notebook)

What didn't work :

1. target encoding
2. SMOTE
3. SVM and Tabnet and other models
4. post processing (changing the 0.5 threshold worsened score)

Discussion 4 answers

Man_bassa

Congratulations for your good performance and thank you for sharing.

15 Jan 2025, 09:49

Upvotes 1

Mohamed_Elnageeb

University of khartoum

Thanks and congratulations to you too .

Would love to hear what worked for you too.

replied to Man_bassa15 Jan 2025, 10:48

Upvotes 1

Man_bassa

I did a lot of feature engineering ( mostly financial ratios, i even divided the debt rate by the duration ), i mostly worked with random forest classifier, probability threshold adjustement after predict_proba results to capture more true positives cases. I didn't have time to tune several models and then perform blending/stacking. This competition was tough for me but i'm still a beginner, i will continue to learn.

replied to Mohamed_Elnageeb15 Jan 2025, 17:01

Upvotes 3

Yassine_ben_zekri

Ecole Supérieure de la Statistique et de l'Analyse de l'Information

Congratulations to you! Unfortunately, the public/private leaderboard dilemma cost us dearly. Our biggest mistake was focusing too much on the public leaderboard, which resulted in a disappointing 106th place on the private leaderboard. What’s even more frustrating is that we had a submission capable of securing 31st place, but we overwrote it due to not saving our trials properly. This lack of organization ultimately cost us a much better rank.

16 Jan 2025, 03:17

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status