Primary competition visual

Womxn in Big Data South Africa: Female-Headed Households in South Africa

Helping South Africa
$5 000 USD
Challenge completed over 5 years ago
Prediction
1161 joined
204 active
Starti
Nov 25, 19
Closei
Feb 23, 20
Reveali
Feb 24, 20
About

South Africa is divided into over 4,000 wards. We have aggregated the target indicator and the other predictive variables from the 2011 census across all the households within each ward to create an aggregated value of each indicator per ward.

The target variable of interest is the percentage of households per ward that are both female-headed and earn an annual income that is below R19,600 (approximately $2,300 USD in 2011). For context, the poverty line in South Africa is defined as R1,183 per month per person and the average individual salary in South Africa is R20,860 per month.

The objective of this challenge is to accurately model the target indicator using the predictive variables provided in the datasets. For the purposes of this competition, we have split the wards into a train and test set. You will train your model on the 2,822 wards in the train set and apply your model to the 1,013 wards in the test set.

Your model can be enhanced using GIS data provided by HERE Technologies (www.here.com).

Note: If you are not comfortable using GIS data, you can still build a good model and make a submission without using any GIS data.

Take note that the variables:

  • "lat" and "lon" are the locations of the CENTER POINTS of the wards.
  • "ADM4_PCODE" is the ward code used to find the ward polygon in the shapefile available here.
  • "NL" is the nightlights value for the center point and surrounding area. The values come from the Global Radiance-Calibrated Nighttime Lights database here.

To further improve your model, you are able to use tools and data available on from HERE Technologies. You can create a free account here.

How to get started with HERE:

HERE APIs: her.is/mea

HERE has a selection of over 20 location based APIs that can be accessed using the link above. Sign in is free under the FREEMIUM PLAN and the following resources can be used to further enhance and understanding of the APIs:

HERE XYZ: Map visualization

HERE XYZ enables you to build your own maps for free and visualize them in a way that suits you. (Please login to HERE XYZ using the same credentials you’ve used to access APIs, Mapcreator etc.)

HERE Mapcreator: online mapping platform where you can make edits to the map (edit roads, places etc.), as well as have access to global mapping. Link

HERE WeGo App - This offline navigation map allows users to download different country maps and navigate for free-using no data or airtime). HEREWeGo link.

Mapillary - For access to Street Level Imagery please follow this link.

Files
Description
Files
Train contains the target. This is the dataset that you will use to train your model.
Full list of variables and their explanations.
This notebook will help you make your first submission to the leaderboard.
Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
This shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv and the ‘target’ column containing your predictions. The order of the rows does not matter, but the names of the ID must be correct.