Winning Solutions

Laduma Analytics Football League Winners Prediction Challenge

Helping Africa

$2 000 USD

Challenge completed ~3 years ago

Skills you will learn

Prediction

732 joined

154 active

Info Data Chat Leaderboard

Start

Jun 06, 22

Sep 04, 22

Reveal

Sep 04, 22

Emms

Winning Solutions

Notebooks · 4 Sep 2022, 23:50 · 1

As the competition comes to an end, Please can the top 10 drop their winning solutions/approach, for others to learn from, this was a very tough competition for me, i couldn't think of any way to reduce my Logloss score, from F.E to Modelling. Your solutions would be of great help thanks,

My approach: I groupedby Game_ID to know the total stats for each Unique Player_ID, on the Actions column, the total count of each action recorded for each Unique Player_ID, then i renamed each so we had player_0 to player_9 for each home and away team for a unique Game_ID, then i created a pivot table, with Game_ID, home_team, away_team as indexes, each unique action such as 'Accurate Passes','Accurate crosses','Accurate Keypasses', as columns, and each player_0 to player_10, as multilevel under each unique Game_ID, but since we aren't predicting during a match, and using those stats for that exact match caused data leakage, so i grouped by each Home Team and Away Team and shifted by values of 1,2,3,-1,-2,-3, then using an i reshaped my data into having of timestep of 10, so as to feed to an LSTM model, tried using Bi-dir LSTMs, tried using LSTM autoencoders to extract the most important features, tried using embeddings, but nothing seemed to work, couldn't go below a logloss of 1.15, My best scores were with a catboost model, after dropping duplicate values on Game_ID, and sorting by date. would love to see what techniques others used.

Discussion 1 answer

Ernest_P

You've put a lot of effort in this approach. Thank you for sharing.

5 Sep 2022, 22:03

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status