Primary competition visual

South African COVID-19 Vulnerability Map by #ZindiWeekendz

Helping Africa
$300 USD
Challenge completed over 5 years ago
Prediction
319 joined
177 active
Starti
Apr 03, 20
Closei
Apr 05, 20
Reveali
Apr 05, 20
Adverserial Validation
Help · 5 Apr 2020, 16:40 · 9

I'm guessing I'm not the only one who found that the train and tests sets are very seperable.

So wanted to hear what you guys think could be a good way to use that information?

So far, I've tried weighing the samples in the training set by the probability of a sample being in the training (LGBM, XGBoost etc have functionality for this), but it doesn't seem to make much of an impact.

I've also tried selecting the train samples with the lowest probability of belonging to the train set and used those for validation, but also doens't help with generalizing to the leaderboard.

So... I'm stuck >_< - can't think of a way to use this information other than those.

Any ideas?

Would be great to revisit this after the close if you guys are reluctant to share your tactics.

Discussion 9 answers
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

yes,

5 Apr 2020, 16:47
Upvotes 0
User avatar
Tkay

Count me in for revisiting

5 Apr 2020, 17:04
Upvotes 0
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

tactics and code will surely be open source on this.

5 Apr 2020, 19:09
Upvotes 0

Like I said, I have researched the most common tactics and found they did not work as expected on this dataset. Hence the open "discussion".

Adverserial Validation did't work for me either.I then tried 2 approaches which didn't work as i expected either.First ,for each single model i tried a bit of hyparameter tunning and applied cross validation with 5 folds and trained the models on the whole set of features in the first layer.Then Analysis of pairwise correlations between out-of-fold test set predictions generated by first layer models shows LGBM has some diversity of being the least correlated with other models.In the final layer consist of LGBM and XGBOOST.I'm then choosing the predictions of the first level models as their input features.After tunning their parameters,I took their weighted average as the final predictor.This gives a lb of 3.98..

2nd Approach.Trained a single LGBM,Adaboost and Catboost.Blend their scores gives lb 3.93...

5 Apr 2020, 19:35
Upvotes 0

Will be happy to hear fwhat the rest are doing

5 Apr 2020, 19:38
Upvotes 0
User avatar
Enigma
Obafemi awolowo university ile-ife

A single Catboost model after manually tuning some parameters gave my best score so far, 3.87. for me It's quite difficult to improve LB score . Very anxious to hear from the top guys when the hackathon is over

User avatar
Lone_Wolf
University of ghana

will the winning solutions be posted online this time?

12 Apr 2020, 12:06
Upvotes 0

Hey, I think most of the top solutions have been shared in the "My Solution" discussion thread and the top 3 were shared in their own separate threads.