South African COVID-19 Vulnerability Map
Can we infer important COVID-19 public health risk factors from outdated data?
Prize
\$300 USD
Time
Ended almost 3 years ago
Participants
178 active · 319 enrolled
Helping
Africa
Good for beginners
Prediction
Government
My Solution
Notebooks · 6 Apr 2020, 06:48 · edited ~11 hours later · 11

Congrat to the winners.

It's an interesting one. Adjusting the prediction did well for me here due to the different distributions between the train/test data, most adjustment are made with the mean of the target in mind but i made mine with the max after dropping the outilier.

SVD did the major work for me in terms of feature engineering.

feat 1: round column 3:end to 2 decimal places + create 5 SVD features

feat 2: round column 3:end to 1 decimal + create 4 SVD features

feat 3: multiply total household with all the percentages + create 4 SVD features

feat 4: target encoding of lln_01,dw_01,psa_00,dw_07,dw_08 after rounding to 2 decimal

in all i ended up with 131 features after dropping feature with no variance.

Single LightGBM 5 FOLD

Before Adjustment LB 3.765 Private 3.769

After: LB 3.710 Private 3.69

https://github.com/horlar1/Zindi-SA-Hack

Looking forward to the Top Solutions

Thanks a lot for sharing the solutions

Thank you so much for sharing. I think sharing solutions is something we should really encourage on Zindi so people can learn and improve.

I ended up trying two solutions, 1. Using Regularized Generalized Linear Models with PCA on Polynomial Features and 2. Using Multivariate Adaptive Regression Splines.

You can find my solution on my Github at: https://github.com/marcusinthesky/Zindi-ZA-COVID19-Vulnerablity-Map

Thanks a lot. Please find my solution here.

Just a humble try at using Catboost. Only boiler plate code. No feature engineering. Stood at 79th place.

Thanks Holar, great idea to use SVD for featuring engineering given all the correlations.

Thanks.

i adjusted my prediction to look like my target after removing the outlier. target max was 54.8 and mine was 52.

prediction * 1.04 moves my prediction closer to the target max.

Thanks for sharing everyone!

I've uploaded my solution here: https://github.com/Rendiere/zindi-sa-covid-19-vulnerability-hackathon

Thanks!

Here, you can find mine! https://colab.research.google.com/drive/1Lv0ecoSoTGun2kFSUSj6lcn2-Ike3XhO

Thanks a lot....Your solution is very straightforward and easy to understand

Hi Holar! First of all, thank you for sharing with us your code. Some of us are new in ML, and we certainly need some tweaks and tricks from a guru like you. I would like to ask if you don't mind your source code from data processing to the implementation of the solution. Specifically, I would like to know more about feature engineering and the use of SVD to generate the features.

And for others, sharing the code, please indicates the CV and LB for your solution. Thank you all.