💰 Trending Now: 🚀 [0.90416 Public LB - Top...

data.org Financial Health Prediction Challenge

Helping Eswatini, Lesotho
and 2 other countries

Eswatini
Lesotho
Zimbabwe
Malawi
Scroll to see more

$1 500 USD

Completed (3 months ago)

Skills you will learn

Prediction

Machine Learning

1774 joined

894 active

Info Data Chat Leaderboard

Start

Dec 12, 25

Mar 15, 26

Reveal

Mar 16, 26

J0NNY

🚀 [0.90416 Public LB - Top 25 Solution] OOF F1-Macro Optimization & Ensembling

15 Mar 2026, 10:03 · 2

Hi everyone,

I've just pushed a major update to my open-source Colab pipeline for this challenge, boosting my score to 0.90416 on the public leaderboard.

While my initial approach relied heavily on custom feature engineering (composite stress/access scores and country-specific scaling), this new leap came from optimizing the post-processing and ensembling stages.

Here are the key techniques added in this update:

OOF Probability Thresholding: Instead of a standard argmax on the ensemble's probability outputs, I used scipy.optimize.minimize (Nelder-Mead) on out-of-fold predictions to find optimal class multipliers. This specifically forces the model to maximize the F1-macro metric.
Unicode Text Canonization: Cleaned up several categorical columns using NFKC normalization to handle hidden characters and consolidate scattered "Don't know" variants.
Multi-Seed Ordinal Ensembling: The final submission averages predictions from multiple random seeds (1, 78, 93, 225) across two pipeline variants. Because the target is ordinal (Low, Medium, High), the ensemble maps these to [0, 1, 2], calculates the mean across the seeds, and rounds to the nearest integer.

You can explore the threshold optimization code and the multi-seed blending script in the updated repository here:

🔗 https://github.com/JONNY-ME/data.org-Financial-Health-Prediction

If you find the F1-macro thresholding script or the feature engineering useful for your own models, please consider dropping a ⭐ on the repo!

Discussion 2 answers