Laduma Analytics Football League Winners Prediction Challenge
Can you predict the outcome of a football match based on historical data?
$2 000 USD
Ended 24 days ago
154 active · 698 enrolled
Good for beginners
Winning Solutions
Notebooks · 4 Sep 2022, 23:50 · 1

As the competition comes to an end, Please can the top 10 drop their winning solutions/approach, for others to learn from, this was a very tough competition for me, i couldn't think of any way to reduce my Logloss score, from F.E to Modelling. Your solutions would be of great help thanks,

My approach: I groupedby Game_ID to know the total stats for each Unique Player_ID, on the Actions column, the total count of each action recorded for each Unique Player_ID, then i renamed each so we had player_0 to player_9 for each home and away team for a unique Game_ID, then i created a pivot table, with Game_ID, home_team, away_team as indexes, each unique action such as 'Accurate Passes','Accurate crosses','Accurate Keypasses', as columns, and each player_0 to player_10, as multilevel under each unique Game_ID, but since we aren't predicting during a match, and using those stats for that exact match caused data leakage, so i grouped by each Home Team and Away Team and shifted by values of 1,2,3,-1,-2,-3, then using an i reshaped my data into having of timestep of 10, so as to feed to an LSTM model, tried using Bi-dir LSTMs, tried using LSTM autoencoders to extract the most important features, tried using embeddings, but nothing seemed to work, couldn't go below a logloss of 1.15, My best scores were with a catboost model, after dropping duplicate values on Game_ID, and sorting by date. would love to see what techniques others used.

Discussion 1 answer

You've put a lot of effort in this approach. Thank you for sharing.