The purpose of this competition is to predict the payment type for new transactions. The popular way to split the labled set is to separate the train set and test set by user or by time dimension. However it is not the case here. The consequence is that we (as competitors) need to spend time on both modelling and leakage exploiting.
How to exploit the leak, simply 1) concat train, test, and extra set. 2) sort by user, timestamp. Let's take an example.
ID_1I8XYBWK 2022-03-16 13:05:51.851102+00 TELKOM KENYA LIMITED Data & WiFi 100 2019-07-03 07:31:00+00 True
ID_1I8XYBWK 2022-03-16 13:08:19.703288+00 TELKOM KENYA LIMITED 100 2019-07-03 07:31:00+00 True
ID_1I8XYBWK 2022-03-16 13:08:19.703288+00 SAFARICOM HOME Data & WiFi 2900 2019-07-10 04:46:00+00 False
The first and the last rows are in the train set. The middle is in the test set. It is almost certain that the class is also Data & Wifi.
Happy Zinding (similar to old days of Happy Kaggling).
@amyflorida626 what are your thoughts on this?
How did it improve your results?