Hi everyone,
I've just open-sourced my complete machine learning pipeline for this challenge, currently scoring 0.8955 on the public leaderboard.
The approach focuses heavily on mathematical rigor, handling structural zeros, and contextualizing the MSME data across different regional economies. Here are a few highlights from the pipeline:
You can check out the full Google Colab notebook and preprocessing strategy here:
🔗 https://github.com/JONNY-ME/data.org-Financial-Health-Prediction
If you find the code or feature engineering strategies helpful for your own models, please consider dropping a ⭐ on the repo!
Good luck to everyone in the final stretch!
upvote
Amazing @J0NNY ! Thank you big man! A star from me⭐.
Thank you @CodeJoe
Playing with the RANDOM_STATE gives a huge difference in the score. You can even break the 0.9 barrier by just changing that.
Nice nice, thank you! You might as well as ensemble several random states. Or even train per country. Null values reduce drastically when you do that.
Got it. Will try those. Thanks!