6 Dec 2018, 17:18

Meet the winners of the Devex Sustainable Development Goals Challenge!

We’ve asked our winners a few questions to introduce themselves and highlight their particular approach to this challenge. Get insights, engage in discussion, and continue to learn from each other in the Zindi community! A special thank you to the winners for their generous feedback.

Closing on 13 November 2018, the Devex Sustainable Development Goals #3 challenge was the first Zindi challenge to close. This was a multi-label natural language processing challenge whose aim was to classify text content by the 27 indicators of the United Nations’ Sustainable Development Goal #3. This competition was launched on 10 September 2018, and attracted over 200 data scientists from across the continent and around the world, of whom 50 made submissions and entered the leaderboard.

The Hamming Loss metric was used to evaluate accuracy. The competition was fierce to the very end. Olamide Oyediran took first place and Anshu Kumar took second, both with a loss of only 0.03837 (their scores were tied, but the tie breaker was the time of submission). Steve Oni came in third, hot on their heels. Without further adieu, here are the our Top 3!

Name: Olamide Oyediran

Zindi handle: Olamide

Where are you from? Oyo State, Nigeria

Tell us a bit about yourself.

My first degree was in Crop Protection and Environmental Biology from the University of Ibadan, Nigeria. I am a machine learning and artificial intelligence enthusiast, started building my machine learning skills in 2017 and code primarily in python. My interests include building predictive models, computer vision and natural language processing.

Tell us about the approach you took.

My winning solution was an ensemble of 6 models. A simple averaging of two different models or an ensemble of different models have been proven to perform better than a single model. Python sklearn library and ligbtgbm were used.

The train and test data set (Text column) were cleaned of html tags.

Sklearn TfidfVectorizer was used with an ngram range of 1,3. Tfidfvectorizer was fitted on the train data. The fitted train data was used to transform both the train and test data set.

The transformed train data was fitted on the classifiers and used to predict class probabilities of the transformed test data for each of the classifier.

What were the things that made the difference for you that you think others can learn from?

Predicting the class probabilities of the test data set for each of the 27 indicators rather than predicting the class itself made the difference. The class probabilities was used to ensemble the models.

K fold cross validation also made a difference. The training data is small. K fold ensures that the algorithm trains on all the data points.

Finally, ensemble of models. A simple average of my two best models was good enough for 2nd place on the leaderboard.

Name: Anshu Kumar

Zindi handle: Akgeni

Where are you from? Bangalore, India

Tell us a bit about yourself.

I am a learner who wants to solve challenging problems by using AI in Food-Tech and Medical Imaging.

Tell us about the approach you took.

My approach was pretty straight forward using text-features and ensemble of different Gradient Boosted Trees. Used different features pipelines for different models. Finally did a weighted sum of predictions.

What were the things that made the difference for you that you think others can learn from?

I guess, DL approaches using (RNNs) does not make much sense given the size of data.

See if feature-engineering is aligned with models you are using.

Name: Steve Oni

Zindi handle: Steveoni

Where are you from? Lagos,Nigeria

Tell us a bit about yourself.

I am an undergraduate student of physics.

Tell us about the approach you took.

At the start of the competition, I tried to used LSTM, since they are good for test classification, but my score was stocked at 0.05 and I tried other deep learning models but didn’t improve. LaterI found out that one of the competitors used cnn to achieve a high score.

I tried a normal machine learning algorithm, LinearSVM, I got this insight from Jeremy from fast.ai about a paper that show that NBSVM can serve a strong baseline for text classification, using this algorithm give me the first 0.04. The main idea of the algorithm is to help svm select features naively (bayes).

It calculates the probability of 1's and 0's for each label and multiply the TFIDF by this probability, in the TFIDF I used 2gram at first then 3gram.

But the score also get stocked in the range of 0.042, I checked out ensembles, especially blend ensembles, which used 4 different models, and the score increased to 0.040,

Since I had submission files with high and low scores, I thought of it in terms of trees algorithm, I checked correlation between them,those that gave low correlation was combined and boosted the score to 0.03.

What were the things that made the difference for you that you think others can learn from?

The ensembles and TFIDF features.

A few final thoughts from our winners.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

Olamide: Agriculture and Healthcare. An example in Agriculture is Nuru app developed by the International Institute for Tropical Agriculture that uses Artificial Intelligence and Machine Learning to diagnose, in real time, crop pests and diseases. In Healthcare, AI can be used for better diagnostics and detection and also to improve healthcare delivery.

Anshu: Health Care Intelligence and Smart Logistics.

Stephen: The biggest opportunity is in agriculture, poverty alleviation, and optimizing public transport.

What are you looking forward to most about the Zindi community?

Olamide: I look forward to seeing a vibrant community of data scientists, collaborating to use AI to solve problems particular to Africa.

Anshu: Zindi is a great community. It would be nice if Zindi could get more data scientists on-board to have even more intense and healthy competitions. Zindi points and Cash Prizes are good engagement. I also liked discussions, specially Pawel_Morawiecki who has been helping us to get started with starter codes.

Stephen: To be like kaggle.

What are your thoughts on our winners' feedback? Engage via the Discussions page or leave a comment on social media.