Catch up with seasoned data science competitor and Product Manager at Prevision.io, Mathurin Aché (mathurin) as he shares a few of his secrets to winning the AI4D Predict the Global Spread of COVID-19 challenge.
Hi Mathurin, please introduce yourself to the Zindi community.
My name is Mathurin Aché (mathurin), I am 39 years old, and I live in France.
Tell us a bit about your data science journey.
I have been a data scientist for 15 years. I am currently Product Manager at Prevision.io, publisher of machine learning machine software. I have participated in more than 200 competitions on Kaggle (around 20th in the world ranking) and other data science platforms. See my profile here.
What do you like about competing on Zindi?
When I participate in datascience contests, I have 2 objectives:
So I like that Zindi sets up competitions with tangible human objectives.
Tell us about the solution you built for the AI4D Predict the Global Spread of COVID-19 challenge.
I discovered the AI4D Predict the Global Spread of COVID-19 contest just two days before the end. I had just taken part in an equivalent competition on Kaggle, with some differences:
Since the data have different ranges, from a few deaths to several tens of thousands of deaths, it is usual to work with log (Fatalities) rather than raw data.
In terms of external data, I used data from the "country_codes.csv" metadata.
In terms of explanatory variables, I manually created the lags at 1, 3 and 7 days before sliding.
In terms of my algorithm, I took the average of 6 xgboost models. Each xgboost model was trained with a weighting equal to 1. / days ** WEIGHT_NORM with a value between 0.15 and 0.3, and a DECAY of 0.99. I also used some variants of the following parameters: min_child_weight, eta, colsample_bytree, max_depth, subsample, NROUND.
What do you think set your approach apart?
I was mainly inspired by the solutions proposed by the winners of the Kaggle COVID forecasting contest.