Congrat to the winners.
It's an interesting one. Adjusting the prediction did well for me here due to the different distributions between the train/test data, most adjustment are made with the mean of the target in mind but i made mine with the max after dropping the outilier.
SVD did the major work for me in terms of feature engineering.
feat 1: round column 3:end to 2 decimal places + create 5 SVD features
feat 2: round column 3:end to 1 decimal + create 4 SVD features
feat 3: multiply total household with all the percentages + create 4 SVD features
feat 4: target encoding of lln_01,dw_01,psa_00,dw_07,dw_08 after rounding to 2 decimal
in all i ended up with 131 features after dropping feature with no variance.
Single LightGBM 5 FOLD
Before Adjustment LB 3.765 Private 3.769
After: LB 3.710 Private 3.69
https://github.com/horlar1/Zindi-SA-Hack
Looking forward to the Top Solutions
Thanks a lot for sharing the solutions
Thank you so much for sharing. I think sharing solutions is something we should really encourage on Zindi so people can learn and improve.
I ended up trying two solutions, 1. Using Regularized Generalized Linear Models with PCA on Polynomial Features and 2. Using Multivariate Adaptive Regression Splines.
You can find my solution on my Github at: https://github.com/marcusinthesky/Zindi-ZA-COVID19-Vulnerablity-Map
Thanks a lot. Please find my solution here.
Just a humble try at using Catboost. Only boiler plate code. No feature engineering. Stood at 79th place.
https://anindabitm.github.io/anindadslog/2020/04/06/Zindi_Hack.html
Thanks Holar, great idea to use SVD for featuring engineering given all the correlations.
Can you give a little bit more information about how you scaled the target?
Thanks.
i adjusted my prediction to look like my target after removing the outlier. target max was 54.8 and mine was 52.
prediction * 1.04 moves my prediction closer to the target max.
Thanks for sharing everyone!
I've uploaded my solution here: https://github.com/Rendiere/zindi-sa-covid-19-vulnerability-hackathon
Thanks!
Here, you can find mine! https://colab.research.google.com/drive/1Lv0ecoSoTGun2kFSUSj6lcn2-Ike3XhO
Thanks a lot....Your solution is very straightforward and easy to understand
Hi Holar! First of all, thank you for sharing with us your code. Some of us are new in ML, and we certainly need some tweaks and tricks from a guru like you. I would like to ask if you don't mind your source code from data processing to the implementation of the solution. Specifically, I would like to know more about feature engineering and the use of SVD to generate the features.
And for others, sharing the code, please indicates the CV and LB for your solution. Thank you all.
https://github.com/horlar1/Zindi-SA-Hack
thanks for sharing :)