Primary competition visual

data.org Financial Health Prediction Challenge

Helping Eswatini, Lesotho
and 2 other countries
  • Eswatini
  • Lesotho
  • Zimbabwe
  • Malawi
  • Scroll to see more
$1 500 USD
Under code review
Prediction
Machine Learning
1686 joined
898 active
Starti
Dec 12, 25
Closei
Mar 15, 26
Reveali
Mar 16, 26
User avatar
rogue_26
Sharing my Repo! (Focus on Pipeline Rigor & model Interpretability)
19 Mar 2026, 16:44 · 2

I just made my final repository for the SME Financial Health challenge public : https://github.com/Julie-Montague/Financial_Health_Prediction_Challenge

I focused heavily on building a robust pipeline and cracking open the "black box" to understand the actual economic drivers of SME distress.

Key highlights in the repo:

  • Unsupervised Feature Engineering: Used K-Means & PCA to cluster SMEs into behavioral archetypes before feeding them to the tree models.
  • Weighted Ensembling: Built an Optuna-optimized Soft Voting classifier (Extratrees, Random Forest) to dynamically assign trust weights.
  • Business Interpretability: Ran a targeted SHAP deep-dive to prove the model learned actual macroeconomic logic (e.g., how digital inclusion and insurance access shield SMEs from distress).
  • Prioritizing Reliability over Thresholding: While manual thresholding is a common suggestion for highly imbalanced tasks, I chose to maintain the default thresholds using 'argmax'. I wanted to ensure the output remained unbiased and consistent for actual risk assessment.
  • Auditing Noisy Data with Cleanlab: Survey data is notoriously messy. I used Cleanlab to identify "Label Noise" - cases where a business's health score didn't match their actual financial behavior. Even though cleaning this noise improved the local validation score massively, it did not boost the leaderboard score (which is often graded against that same noise). However, it allowed me to build features that reflect real economic behavior rather than memorizing inconsistent data.

I built this pipeline with modularity in mind, so I genuinely hope the core architecture can serve as a helpful guide for anyone's future projects.

I learned a massive amount from this competition and would love any feedback on the code structure and implementations. Happy to answer any questions about the implementations!

Discussion 2 answers
User avatar
RP NGOMA COLLEGE

Thanks alot,

May this guide serve as more helpful in future

20 Mar 2026, 07:37
Upvotes 0

Great.. we have to create a team together and let try the next comptetion

follow me on github (https://github.com/DemisoDaba)

20 Mar 2026, 08:52
Upvotes 0