Primary competition visual

Alvin Smart Money Management Classification Challenge

Helping Kenya
$3 000 USD
Challenge completed ~3 years ago
Classification
497 joined
220 active
Starti
Jun 22, 22
Closei
Jul 24, 22
Reveali
Jul 24, 22
Downside of random split
Data · 30 Jun 2022, 04:31 · 2

The purpose of this competition is to predict the payment type for new transactions. The popular way to split the labled set is to separate the train set and test set by user or by time dimension. However it is not the case here. The consequence is that we (as competitors) need to spend time on both modelling and leakage exploiting.

How to exploit the leak, simply 1) concat train, test, and extra set. 2) sort by user, timestamp. Let's take an example.

ID_1I8XYBWK 2022-03-16 13:05:51.851102+00 TELKOM KENYA LIMITED Data & WiFi 100 2019-07-03 07:31:00+00 True

ID_1I8XYBWK 2022-03-16 13:08:19.703288+00 TELKOM KENYA LIMITED 100 2019-07-03 07:31:00+00 True

ID_1I8XYBWK 2022-03-16 13:08:19.703288+00 SAFARICOM HOME Data & WiFi 2900 2019-07-10 04:46:00+00 False

The first and the last rows are in the train set. The middle is in the test set. It is almost certain that the class is also Data & Wifi.

Happy Zinding (similar to old days of Happy Kaggling).

Discussion 2 answers

@amyflorida626 what are your thoughts on this?

30 Jun 2022, 09:18
Upvotes 0

How did it improve your results?

1 Jul 2022, 18:28
Upvotes 0