Alvin Smart Money Management Classification Challenge
Can you classify purchases recorded on Alvin into different categories?
$3 000 USD
Ended 6 months ago
220 active · 455 enrolled
Financial Services
Downside of random split
Data · 30 Jun 2022, 04:31 · 2

The purpose of this competition is to predict the payment type for new transactions. The popular way to split the labled set is to separate the train set and test set by user or by time dimension. However it is not the case here. The consequence is that we (as competitors) need to spend time on both modelling and leakage exploiting.

How to exploit the leak, simply 1) concat train, test, and extra set. 2) sort by user, timestamp. Let's take an example.

ID_1I8XYBWK 2022-03-16 13:05:51.851102+00 TELKOM KENYA LIMITED Data & WiFi 100 2019-07-03 07:31:00+00 True

ID_1I8XYBWK 2022-03-16 13:08:19.703288+00 TELKOM KENYA LIMITED 100 2019-07-03 07:31:00+00 True

ID_1I8XYBWK 2022-03-16 13:08:19.703288+00 SAFARICOM HOME Data & WiFi 2900 2019-07-10 04:46:00+00 False

The first and the last rows are in the train set. The middle is in the test set. It is almost certain that the class is also Data & Wifi.

Happy Zinding (similar to old days of Happy Kaggling).

Discussion 2 answers

@amyflorida626 what are your thoughts on this?

30 Jun 2022, 09:18
Upvotes 0

How did it improve your results?

1 Jul 2022, 18:28
Upvotes 0