25 Jan 2019, 11:48

Meet the winners of the Social Media Challenge!

Get insights from challenge winners! We are pleased to present the second installment of our winners blog. A special thank you to the winners for their generous feedback.

Closing on 26 November 2018, the Social Media Challenge was the second Zindi challenge to close. This competition was sponsored by insight2impact, a resource centre supporting the use of data for decision-making, with a focus on financial and economic inclusion. i2i is hosted by FinMark Trust and Cenfri in South Africa and funded by the Bill and Melinda Gates Foundation in partnership with The Mastercard Foundation.

The challenge was to predict the number of retweets a tweet would get. We pulled tweets from 38 major banks and mobile network operators across Africa. This challenge attracted over 200 data scientists from across the continent and around the world, of whom over 28 made submissions and entered the leaderboard.

We are happy to introduce the TOP TWO winners of the competition: Mohammed Salam Jedidi of Tunisia and Aniruddha Raghavan of the United States!

Name: Mohamed Salam Jedidi

Zindi handle: mohamed_salam_jedidi

Where are you from? Tunisia

Tell us a bit about yourself.

I am an AI engineer, graduated from Higher School of Communications of Tunis - Sup’Com-Tunisia. Before graduation, I developed a great interest in machine learning and deep learning which manifested through multiple academic and personal projects along with my internships and part time jobs as a data scientist. My first full-time job as data science at Instadeep, an AI african startup, gave me access hands-on experience in industrial AI applications.

Tell us about the approach you took.

The winning solution was an assembling of two models LGBM and XGBOOST. The mentioned algorithms are scalable and guarantee high performance. The algorithm only works well with meaningful hand crafted features (e.g. the number of hashtags mentioned in a tweet, the number of Urls mentioned in a tweet and how many words in the text ...). Techniques such as Tf-IDF and PCA helped me create about 20 additional meaningful features extracted from the tweets.

What were the things that made the difference for you that you think others can learn from?

Analyzing and understanding the data allowed me to create meaningful features. Furthermore, parameter tuning was key for better results. Finally, average method used for the final solution improved my score drastically.

Name: Aniruddha Raghavan

Zindi handle: anirag

Where are you from? Originally from India, currently working in USA

Tell us a bit about yourself.

I work in a data science team for a Software company focussing on ERP solutions like healthcare,CRM etc... I participate in data science competitions and hackathons to learn new techniques, approaches and get exposure to various datasets/problems.

Tell us about the approach you took.

I came to know of this competition late and didn't have much time to spend on it. So didn't experiment with many models or ensemble. My final model is a 5 fold Lightgbm model run with 2 different seeds and the results were just averaged. I also didn't tune any hyper parameter just used what has worked for me previously for similar datasets. I really liked that dataset was tweet json, learned a lot about tweet objects and processing which could be of use in many other problems. I decided to focus on feature engineering and came up with three main groups : user features, tweet features and others like time etc.. Whenever I had time, I tried to come up with one feature under these groups. I crafted around 50-60 such features. Then, I focussed on the tweet text and added TFIDF (word model) , TFIDF (char model) with ngram range 1-5. Also , using the text I built a topic model and then added 3 most dominant topics for each tweet. I tried adding word2vec features but didn't get any improvement. I was not able sync up my local CV with public leaderboard. So I had to rely on public leaderboard score for improvement. I tried NN was able to achieve top 3 results but Lightgbm gave better results. If I had time, I would have tuned my models and focussed on ensembles.

What were the things that made the difference for you that you think others can learn from?

I am not sure what others have come up with. I am listing down things that improved my score below: -- hand crafted features like user_activity (how active the user is based on number of statuses so far) , user_reliability (ratio of followers and friends) , tweet informativeness (how much information this tweet has). Using such features gave me really a high score near top 5 on public leaderboard. -- TFIDF char model -- Topic modeling features. For example: one topic was focused on service tweets, other topic had words like apology, inconvenience, response etc.

A few final words of wisdom from our winners...

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

Mohamed: Logistics and transportation- the quality of transportation- land, rail, road water or air- is crucial for a developed economy. Its impact reaches the efficiency of trade and commerce, touristic appeal of Africa and its population’s life-quality. AI could enhance infrastructure strategies on one hand and trip planning on the other hand in order to ensure a cost-effective solutions for a modern transportation. Such impactful projects are the core priority of InstaDeep, the startup I work for and the future African AI hub.

Aniruddha: Healthcare. I believe Predictive analytics and AI has truly vast potentials in healthcare but lags in many countries.

What are you looking forward to most about the Zindi community?

Mohamed: Zindi has succeeded to gather and create an online space through its platform where Africans can meet, compete and share their knowledge. I expect Zindi to create bridges between African countries by organising offline events and offers opportunities to African talents in AI to collaborate on common projects in order to build, together, the Africa of tomorrow.

Aniruddha: I strongly believe in learning by community. I am happy to see Zindi is bringing data scientists closer to form a great community. Also I am looking forward to work on problems pertaining to Africa which would be interesting and exciting.

What are your thoughts on our winners' feedback? Engage via the Discussions page or leave a comment on social media.