UmojaHack Africa brought more than 1000 data science students from across Africa to the Zindi platform on March 21 2020. Out of 383 data scientists from across the continent that signed up for the Xente Purchase Prediction Challenge, 40 made it onto the leaderboard. Only the best of the best made it to the top.
The goal of this challenge was to create a machine learning model to predict what and when individuals will purchase next, based on their purchase history. The resulting models and solutions will help Xente with target marketing and catering to their customers’ needs better. For Xente, this may result in improved profitability and financial sustainability; while customers receive an app that is tailored to them.
The winners of this challenge are: Team Statistically Significant (Taru, CHNPAT005, IvanJericevich, ehsaan) from South Africa in 1st place, team system error (belaziz, klaimo, MJ_MAH) from Tunisia in 2nd place and team CoviData (melkeor, karimcossentini) also from Tunisia in 3rd place.
A special thank you to the 1st place winners for sharing some insights into how they succeeded in this challenge.
Name: Ivan Jericevich (1st place)
Zindi handle: Team Statistically Significant (Taru, CHNPAT005, IvanJericevich, ehsaan)
Where are you from? South Africa
Tell us a bit about yourself?
I am currently pursuing a Masters degree in statistics. My primary interests in the field are statistical finance and machine learning. I'm a keen learner determined to improve myself in the field.
Tell us about the approach you took.
The Xente Dataset was extremely challenging. Our first ideas to solve this problem were to use catboost, xgboost or a multivariate D-type Hawkes process for this. However these methods raised problems, including few covariates and sparsity.
Therefore we resorted to two simple methods next. We began with some EDA and realized that airtime data was the most purchased item for all individuals, therefore we tried a simple predictive model by saying that everyone who bought airtime in-sample, would continue to buy the same airtime out of sample. The second method we tried was Association Rule Mining (ARM), where the results pointed to the fact that if someone purchased something, they would also buy airtime data, however it was found that purchases were too frequent and too clustered around the top 3 most popular items. Although ARM produced reasonable results, it was again not obvious as to how a temporal factor could be included.
By the time we got to here, time was running out, therefore coding and calibrating a Hawkes process purely for airtime data (which is feasible) was simply not possible. Therefore we resorted to the next best thing – by finding the median inter-arrival time between purchases for all customers, for all items they purchased, and predicted based on that. This allowed us to include the user, item and temporal aspect in the model. That was our winning model.
What were the things that made the difference for you that you think others can learn from?
For this particular data set, classical methods would not be helpful since it was not obvious what method would cater for almost no covariates as well as the time component. Therefore, we were required to get creative with different experimental techniques. In the end applying creative statistical methods was what made the difference.
What are the biggest areas of opportunity you see in AI in Africa over the next few years?
Currently I think that many businesses in Africa can benefit immensely from data-driven solutions. More specifically, the retail, finance and online industries in Africa have huge potential to benefit from AI.
What are you looking forward to most about the Zindi community?
This being my first data science competition, I look forward to attempting more competitions. Furthermore, in the short amount of time I was able to learn alot about the real life applications of my studies. For this reason I would like to learn more by seeing what Zinid has to offer in the future.
This competition was hosted by XENTE.
What are your thoughts on our winners' feedback? Engage via the Discussion page or leave a comment on social media.