DataDrive2030 Early Learning Predictors Challenge
Can you identify which features of an early learning programme predict better learning outcomes for children?
$3 000 USD
~1 month to go
185 active · 665 enrolled
South Africa
Good for beginners

Data from multiple programmes and projects who used the ELOM tools were collated, spanning from 2019-2022. You can view the different data sources and collection methods in a PDF in the download section.

There are 8 665 children in the train and 3 600 in test.

In this competition, we aim to use machine learning techniques to identify factors of early learning programmes that contribute to better learning outcomes in children. While predicting the child’s ELOM score and the top 15 predictors for each child.

The final merged dataset consisted of 12 265 children across 2 217 facilities. Table X below provides a summary of the data included in the meta-dataset. The first column indicates the data source, and the remainder of the columns show the different types of tools or data collected and the number of children we have data for across these sets of variables. An “X” indicates that the data was not collected at all.

How to use Colab on Zindi

How to mount a drive on Colab

Definitions of the variables in test and train.
Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the child_id must be correct.
Train contains the target. This is the dataset that you will use to train your model.
Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
Information on how the data was collected.
This is a starter notebook to help you make your first submission. If the file open weirdly you can ctrl-S and it will save to your download folder.