Primary competition visual

Xente Fraud Detection Challenge

Helping Uganda
$4 500 USD
Completed (over 6 years ago)
Classification
2031 joined
545 active
Starti
May 20, 19
Closei
Sep 22, 19
Reveali
Sep 23, 19
Wisdom of the Crowd
Data ยท 16 Sep 2019, 21:32 ยท 1

So it's just 6 days to the end of this competition, and I believe hearing some people's opinion (if not all) could help us build better models that would identify fraud pretty well! I'm only a newbie.

1.Local CV Scores Versus Public Leaderboard Scores

There is a wide gap between my local CV scores and public leaderboard scores. At times I get a CV score of 0.8+ but a public ledaerboard score of 0.567+ ---- this comes after I engineered (several) new features. What could be the reason? Who else experiences this? Can I still sit and chill, hoping for a jump in the private leaderboard? Definitely not?

2. StratifiedKFold Versus TimeSeriesSplit for Cross Validation

Which CV technique would work best among both of them, and maybe how many splits for this dataset? I am a bit skeptical of StratifiedKFold as my CV technique because if I split the training dataset(that given by Zindi) into a training and validation set of ratio 80 to 20, then my CV score may not tell me the actual score of my model ----- the training set given by Zindi spans from November 2018 to February 2019, the test dataset spans from March 2019 to April 2019. There is no overlap between both datasets. Any idea on this?

3. Feature Engineering

Just mere building a mini model for this competition could earn one a score of 0.66+ on the public leaderboard. After hyperparameter optimization, I got to my present score - 0.77+. But on performing feature engineering, my local CV score jumped up while my public ledaerboard score went really down (0.567+). Can feature engineering actually cause one's public leaderboard score to reduce in such manner? Will it be fine if I trust my local CV score? (Kagglers???)

4. Are there any other new ideas that one could try to implement in this competition?

Discussion 1 answer

One advice, if you have carefully designed your cv strategy, then rely on it dearly. The only exception is when you create samples, then you shouldnt put much faith on cv scores

16 Sep 2019, 21:54
Upvotes 0