I just made my final repository for the SME Financial Health challenge public : https://github.com/Julie-Montague/Financial_Health_Prediction_Challenge
I focused heavily on building a robust pipeline and cracking open the "black box" to understand the actual economic drivers of SME distress.
Key highlights in the repo:
-
Unsupervised Feature Engineering: Used K-Means & PCA to cluster SMEs into behavioral archetypes before feeding them to the tree models.
-
Weighted Ensembling: Built an Optuna-optimized Soft Voting classifier (Extratrees, Random Forest) to dynamically assign trust weights.
-
Business Interpretability: Ran a targeted SHAP deep-dive to prove the model learned actual macroeconomic logic (e.g., how digital inclusion and insurance access shield SMEs from distress).
-
Prioritizing Reliability over Thresholding: While manual thresholding is a common suggestion for highly imbalanced tasks, I chose to maintain the default thresholds using 'argmax'. I wanted to ensure the output remained unbiased and consistent for actual risk assessment.
-
Auditing Noisy Data with Cleanlab: Survey data is notoriously messy. I used Cleanlab to identify "Label Noise" - cases where a business's health score didn't match their actual financial behavior. Even though cleaning this noise improved the local validation score massively, it did not boost the leaderboard score (which is often graded against that same noise). However, it allowed me to build features that reflect real economic behavior rather than memorizing inconsistent data.
I built this pipeline with modularity in mind, so I genuinely hope the core architecture can serve as a helpful guide for anyone's future projects.
I learned a massive amount from this competition and would love any feedback on the code structure and implementations. Happy to answer any questions about the implementations!
Thanks alot,
May this guide serve as more helpful in future
Great.. we have to create a team together and let try the next comptetion
follow me on github (https://github.com/DemisoDaba)