Of all the given datasets I only used the one user.csv. Then I made the usual preprocessings. To deal with the unbalanced data, I split the data into multiple balanced data. Then I trained LGBM on .
No. I'm only using the Users.csv because I couldn't find any ID_User from Test.csv in the other datasets. So except User.csv, the other datasets do not contain any information about the users contained in Test.csv
Hello @sky_179,
Of all the given datasets I only used the one user.csv. Then I made the usual preprocessings. To deal with the unbalanced data, I split the data into multiple balanced data. Then I trained LGBM on .
don't you think that there will a data leakage
No. I'm only using the Users.csv because I couldn't find any ID_User from Test.csv in the other datasets. So except User.csv, the other datasets do not contain any information about the users contained in Test.csv