📜 Trending Now: Adverserial Validation

South African COVID-19 Vulnerability Map by #ZindiWeekendz

Helping Africa

$300 USD

Completed (almost 6 years ago)

Skills you will learn

Prediction

319 joined

177 active

Info Data Chat Leaderboard

Start

Apr 03, 20

Apr 05, 20

Reveal

Apr 05, 20

RenierBotha

Adverserial Validation

Help · 5 Apr 2020, 16:40 · 9

I'm guessing I'm not the only one who found that the train and tests sets are very seperable.

So wanted to hear what you guys think could be a good way to use that information?

So far, I've tried weighing the samples in the training set by the probability of a sample being in the training (LGBM, XGBoost etc have functionality for this), but it doesn't seem to make much of an impact.

I've also tried selecting the train samples with the lowest probability of belonging to the train set and used those for validation, but also doens't help with generalizing to the leaderboard.

So... I'm stuck >_< - can't think of a way to use this information other than those.

Any ideas?

Would be great to revisit this after the close if you guys are reluctant to share your tactics.

Discussion 9 answers

Raheem_Nasirudeen

The polytechnic ibadan

yes,

5 Apr 2020, 16:47

Upvotes 0

Tkay

Count me in for revisiting

5 Apr 2020, 17:04

Upvotes 0

Raheem_Nasirudeen

The polytechnic ibadan

tactics and code will surely be open source on this.

5 Apr 2020, 19:09

Upvotes 0

RenierBotha

Like I said, I have researched the most common tactics and found they did not work as expected on this dataset. Hence the open "discussion".

replied to Raheem_Nasirudeen5 Apr 2020, 19:31

Upvotes 0

Lawrence_Moruye

Adverserial Validation did't work for me either.I then tried 2 approaches which didn't work as i expected either.First ,for each single model i tried a bit of hyparameter tunning and applied cross validation with 5 folds and trained the models on the whole set of features in the first layer.Then Analysis of pairwise correlations between out-of-fold test set predictions generated by first layer models shows LGBM has some diversity of being the least correlated with other models.In the final layer consist of LGBM and XGBOOST.I'm then choosing the predictions of the first level models as their input features.After tunning their parameters,I took their weighted average as the final predictor.This gives a lb of 3.98..

2nd Approach.Trained a single LGBM,Adaboost and Catboost.Blend their scores gives lb 3.93...

5 Apr 2020, 19:35

Upvotes 0

Lawrence_Moruye

Will be happy to hear fwhat the rest are doing

5 Apr 2020, 19:38

Upvotes 0

Paul_Okewunmi

Obafemi awolowo university ile-ife

A single Catboost model after manually tuning some parameters gave my best score so far, 3.87. for me It's quite difficult to improve LB score . Very anxious to hear from the top guys when the hackathon is over

replied to Lawrence_Moruye5 Apr 2020, 20:13

Upvotes 0

Lone_Wolf

University of ghana

will the winning solutions be posted online this time?

12 Apr 2020, 12:06

Upvotes 0

RenierBotha

Hey, I think most of the top solutions have been shared in the "My Solution" discussion thread and the top 3 were shared in their own separate threads.

replied to Lone_Wolf13 Apr 2020, 07:10

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status