South African COVID-19 Vulnerability Mapby #ZindiWeekendz
Can we infer important COVID-19 public health risk factors from outdated data?
$300 USD
Ended almost 3 years ago
178 active · 319 enrolled
Good for beginners
My Solution
Notebooks · 6 Apr 2020, 06:48 · edited ~11 hours later · 11

Congrat to the winners.

It's an interesting one. Adjusting the prediction did well for me here due to the different distributions between the train/test data, most adjustment are made with the mean of the target in mind but i made mine with the max after dropping the outilier.

SVD did the major work for me in terms of feature engineering.

feat 1: round column 3:end to 2 decimal places + create 5 SVD features

feat 2: round column 3:end to 1 decimal + create 4 SVD features

feat 3: multiply total household with all the percentages + create 4 SVD features

feat 4: target encoding of lln_01,dw_01,psa_00,dw_07,dw_08 after rounding to 2 decimal

in all i ended up with 131 features after dropping feature with no variance.

Single LightGBM 5 FOLD

Before Adjustment LB 3.765 Private 3.769

After: LB 3.710 Private 3.69

Looking forward to the Top Solutions

Discussion 11 answers

Thanks a lot for sharing the solutions

6 Apr 2020, 06:57
Upvotes 0

Thank you so much for sharing. I think sharing solutions is something we should really encourage on Zindi so people can learn and improve.

I ended up trying two solutions, 1. Using Regularized Generalized Linear Models with PCA on Polynomial Features and 2. Using Multivariate Adaptive Regression Splines.

You can find my solution on my Github at:

6 Apr 2020, 07:08
Upvotes 0

Thanks a lot. Please find my solution here.

Just a humble try at using Catboost. Only boiler plate code. No feature engineering. Stood at 79th place.

Thanks Holar, great idea to use SVD for featuring engineering given all the correlations.

Can you give a little bit more information about how you scaled the target?

6 Apr 2020, 07:37
Upvotes 0


i adjusted my prediction to look like my target after removing the outlier. target max was 54.8 and mine was 52.

prediction * 1.04 moves my prediction closer to the target max.

Thanks for sharing everyone!

I've uploaded my solution here:

6 Apr 2020, 09:24
Upvotes 0


Here, you can find mine!

6 Apr 2020, 11:06
Upvotes 0

Thanks a lot....Your solution is very straightforward and easy to understand

Hi Holar! First of all, thank you for sharing with us your code. Some of us are new in ML, and we certainly need some tweaks and tricks from a guru like you. I would like to ask if you don't mind your source code from data processing to the implementation of the solution. Specifically, I would like to know more about feature engineering and the use of SVD to generate the features.

And for others, sharing the code, please indicates the CV and LB for your solution. Thank you all.

6 Apr 2020, 14:42
Upvotes 0