7 Jan 2021, 11:20

Meet the winners of GIZ AI4D Africa Language Challenge

Join GIZ AI4D Africa Language Challenge winners Ari Ramkilowan (South Africa), Lawrence Adu-Gyamfi (Ghana) and Joyce Nakatumba-Nabende (Uganda) as we talk about their winning African language datasets for machine learning.

Hi Ari, please introduce yourself to the Zindi community.

My name is Ari Ramkilowan (Iam_Ari). I'm yet another physicist-turned-data-scientist, and by day I do NLP research at Praekelt.com. When I'm not coding, I like cooking and all things sport. I believe that what makes a difference in my work, that I think others can learn from, is showing my work to those around me, getting feedback from others, and making sure that I don’t work in a silo.

Tell us about your solution for the GIZ AI4D Africa Challenge.

The problem with using existing tools to align corpora is that you'd first have to familiarise yourself with the tool, and then convince yourself that it will work for your corpora. Even after this, you sometimes end up with misalignments, which are detrimental to any attempt at neural machine translation. It is also hard to notice a misalignment if you don't understand the languages you're aligning. So instead, I looked at the document structure and hand-crafted rules to align the text. I also chose to exclude pieces of text that did not align well. These were saved separately so that we can still aim to align this text using alternative approaches.

Hi Lawrence, please introduce yourself to the Zindi community.

I am Lawrence Adu-Gyamfi (GhanaNLP). Currently I spend a lot of my time working on topics for GhanaNLP, where we are working on NLP tasks related to local languages in Ghana. I was introduced to machine learning and AI during my Masters degree from UAB in Barcelona, which focused on modelling for science and engineering with a specialiszation in data science, from UAB in Barcelona. My Bachelor's degree was in Aerospace Engineering from KNUST in Ghana, and my growth in data science and machine learning was largely due to an internship with the Barcelona Supercomputing Centre where we researched application of machine learning in turbulence modelling.

Tell us about your solution for the GIZ AI4D Africa Language Challenge.

The solution developed by GhanaNLP for collecting the data we presented was part of efforts to generate data for building our own machine translation models. We built a form that was shared publicly on several social media platforms to ask for English sentences and their accompanying Twi translations.

What were the things that made the difference for you that you think others can learn from?

As part of GhanaNLP’s efforts, there is a team specifically dedicated to data collection, storage and preprocessing. As a result, a lot of time and resources have been spent in developing this form for collection of data, as well as, cleaning the data that was collected. This cleaning process has been done by multiple people to ensure the translations were as accurate as possible.

What do you like about competing on Zindi?

I like that the platform centralises so many machine learning - and artificial intelligence - related competitions, with a huge focus on the contribution to African development. And there is a good blend between those meant for knowledge and those that have additional prizes!

Hi Joyce, please introduce yourself to the Zindi community.

I am Dr. Joyce Nakatumba-Nabende (MUK_Luganda), a lecturer in the Department of Computer Science at Makerere University and the head of the Makerere Artificial Intelligence and Data Science Research Lab. I hold a PhD in Computer Science from Eindhoven University of Technology in the Netherlands. Currently, I am leading a team of researchers applying machine learning, computer vision and natural language processing techniques to solve problems in the areas of agriculture and health in the developing world.

Tell us about your solution for the GIZ AI4D Africa Language Challenge.

Our solution was a high-quality open source Luganda dataset, consisting of both text and speech data. The Luganda text has a corpus consisting of only Luganda sentences, and a parallel corpus for Luganda to English sentence pairs collected from the various online sources and through a collaboration with the Department of African Languages at Makerere University. The Luganda speech data consisted of Luganda sentences collected under the public domain (CC-0) license) and speech pairs for these sentences. Our approach was to collaboratively collect Luganda resources from several online with the support of the Department of African Languages.

What were the things that made the difference for you that you think others can learn from?

We collaborated with the Department of African Languages as we collected and verified the Luganda resources. We also focused on the downstream tasks for the data (i.e. speech recognition, sentiment analysis, machine translation) and this enabled us to appropriately collect and curate the Luganda data.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

One of the biggest opportunities is the growing interest within the African machine learning community, which paves way for further investment to harness these skills and build AI capacity. The unique needs and challenges within the African context across several domains like health, agriculture, transport, languages, education, and public services creates opportunities for innovativeness in addressing these challenges.

This challenge was hosted in partnership with GIZ and the FAIR Forward initiative, and the Artificial Intelligence for Development Africa (AI4D-Africa) Network.

What are your thoughts on our winners' feedback? Engage via the Discussion page or leave a comment on social media.