Primary competition visual

data.org Financial Health Prediction Challenge

Helping Eswatini, Lesotho
and 2 other countries
  • Eswatini
  • Lesotho
  • Zimbabwe
  • Malawi
  • Scroll to see more
$1 500 USD
Under code review
Prediction
Machine Learning
1686 joined
898 active
Starti
Dec 12, 25
Closei
Mar 15, 26
Reveali
Mar 16, 26
User avatar
J0NNY
πŸš€ [0.8955 Public LB] Full End-to-End Ensemble Pipeline & Solution Repository
14 Mar 2026, 12:40 Β· 5

Hi everyone,

I've just open-sourced my complete machine learning pipeline for this challenge, currently scoring 0.8955 on the public leaderboard.

The approach focuses heavily on mathematical rigor, handling structural zeros, and contextualizing the MSME data across different regional economies. Here are a few highlights from the pipeline:

  • Robust Feature Engineering: Custom composite metrics (financial access, informal access, and stress scores) and cross-interactions like stress_x_expense_ratio.
  • Country-Specific Context: Financial features are log-transformed and scaled dynamically by the mean/std of each business's operating country, alongside country-percentile rankings. Segmented models are also spawned for countries with sufficient sample sizes.
  • Calibrated Ensemble: A soft voting classifier combining heavily tuned LightGBM, XGBoost, and HistGradientBoosting models, balanced with SMOTE and calibrated via Isotonic Regression.

You can check out the full Google Colab notebook and preprocessing strategy here:

🔗 https://github.com/JONNY-ME/data.org-Financial-Health-Prediction

If you find the code or feature engineering strategies helpful for your own models, please consider dropping a ⭐ on the repo!

Good luck to everyone in the final stretch!

Discussion 5 answers

upvote

14 Mar 2026, 16:21
Upvotes 1
User avatar
CodeJoe

Amazing @J0NNY ! Thank you big man! A star from me⭐.

14 Mar 2026, 18:16
Upvotes 1
User avatar
J0NNY

Thank you @CodeJoe

Playing with the RANDOM_STATE gives a huge difference in the score. You can even break the 0.9 barrier by just changing that.

User avatar
CodeJoe

Nice nice, thank you! You might as well as ensemble several random states. Or even train per country. Null values reduce drastically when you do that.

User avatar
J0NNY

Got it. Will try those. Thanks!