π [0.90416 Public LB - Top 25 Solution] OOF F1-Macro Optimization & Ensembling
Hi everyone,
I've just pushed a major update to my open-source Colab pipeline for this challenge, boosting my score to 0.90416 on the public leaderboard.
While my initial approach relied heavily on custom feature engineering (composite stress/access scores and country-specific scaling), this new leap came from optimizing the post-processing and ensembling stages.
Here are the key techniques added in this update:
-
OOF Probability Thresholding: Instead of a standard argmax on the ensemble's probability outputs, I used scipy.optimize.minimize (Nelder-Mead) on out-of-fold predictions to find optimal class multipliers. This specifically forces the model to maximize the F1-macro metric.
-
Unicode Text Canonization: Cleaned up several categorical columns using NFKC normalization to handle hidden characters and consolidate scattered "Don't know" variants.
-
Multi-Seed Ordinal Ensembling: The final submission averages predictions from multiple random seeds (1, 78, 93, 225) across two pipeline variants. Because the target is ordinal (Low, Medium, High), the ensemble maps these to [0, 1, 2], calculates the mean across the seeds, and rounds to the nearest integer.
You can explore the threshold optimization code and the multi-seed blending script in the updated repository here:
🔗 https://github.com/JONNY-ME/data.org-Financial-Health-Prediction
If you find the F1-macro thresholding script or the feature engineering useful for your own models, please consider dropping a ⭐ on the repo!
Wow wow, amazing push!
Update on private score ?