💰 Data Talk: 🚀 [0.8955 Public LB] Full End...

data.org Financial Health Prediction Challenge

Helping Eswatini, Lesotho
and 2 other countries

Eswatini
Lesotho
Zimbabwe
Malawi
Scroll to see more

$1 500 USD

Completed (3 months ago)

Skills you will learn

Prediction

Machine Learning

1774 joined

894 active

Info Data Chat Leaderboard

Start

Dec 12, 25

Mar 15, 26

Reveal

Mar 16, 26

J0NNY

🚀 [0.8955 Public LB] Full End-to-End Ensemble Pipeline & Solution Repository

14 Mar 2026, 12:40 · 5

Hi everyone,

I've just open-sourced my complete machine learning pipeline for this challenge, currently scoring 0.8955 on the public leaderboard.

The approach focuses heavily on mathematical rigor, handling structural zeros, and contextualizing the MSME data across different regional economies. Here are a few highlights from the pipeline:

Robust Feature Engineering: Custom composite metrics (financial access, informal access, and stress scores) and cross-interactions like stress_x_expense_ratio.
Country-Specific Context: Financial features are log-transformed and scaled dynamically by the mean/std of each business's operating country, alongside country-percentile rankings. Segmented models are also spawned for countries with sufficient sample sizes.
Calibrated Ensemble: A soft voting classifier combining heavily tuned LightGBM, XGBoost, and HistGradientBoosting models, balanced with SMOTE and calibrated via Isotonic Regression.

You can check out the full Google Colab notebook and preprocessing strategy here:

🔗 https://github.com/JONNY-ME/data.org-Financial-Health-Prediction

If you find the code or feature engineering strategies helpful for your own models, please consider dropping a ⭐ on the repo!

Good luck to everyone in the final stretch!

Discussion 5 answers

jyz

upvote

14 Mar 2026, 16:21

Upvotes 1

CodeJoe

Amazing @J0NNY ! Thank you big man! A star from me⭐.

14 Mar 2026, 18:16

Upvotes 1

J0NNY

Thank you @CodeJoe

Playing with the RANDOM_STATE gives a huge difference in the score. You can even break the 0.9 barrier by just changing that.

replied to CodeJoe14 Mar 2026, 19:42

Upvotes 1

CodeJoe

Nice nice, thank you! You might as well as ensemble several random states. Or even train per country. Null values reduce drastically when you do that.

replied to J0NNY14 Mar 2026, 19:46

Upvotes 0

J0NNY

Got it. Will try those. Thanks!

replied to CodeJoe14 Mar 2026, 20:39

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status