Mobile Money and Financial Inclusion in Tanzania Challenge
Cash and prizes worth $2,250 USD
Predict who is more likely to use mobile money or other financial services
534 data scientists enrolled, 163 on the leaderboard
Financial ServicesPredictionStructured
Tanzania
26 March 2019—1 July 2019

You are allowed to use only the datasets that are provided here by Zindi and any features extracted from the contextual layers data accessed from the Africa GeoPortal described below.

Financial Inclusion Survey Data

The main dataset contains demographic information and what financial services are used by approximately 10,000 individuals across Tanzania. This data was extracted from the FSDT Finscope 2017 survey and prepared specifically for this challenge. More about the Finscope survey here.

The data have been split between training and test sets. The test set contains all information about each individual except for what types of financial services he or she uses.

Your goal is to accurately classify each individual into four mutually exclusive categories:

  1. No_financial_services: Individuals who do not use mobile money, do not save, do not have credit, and do not have insurance
  2. Other_only: Individuals who do not use mobile money, but do use at least one of the other financial services (savings, credit, insurance)
  3. Mm_only: Individuals who use mobile money only
  4. Mm_plus: Individuals who use mobile money and also use at least one of the other financial services (savings, credit, insurance)

Financial Access Map

This dataset is the geospatial mapping of all cash outlets in Tanzania in 2012. Cash outlets in this case included commercial banks, community banks, ATMs, microfinance institutions, mobile money agents, bus stations and post offices. This data was collected by FSDT. More about this dataset here.

ArcGIS & Africa GeoPortal

To enrich and validate your models with location and spatial context you have access to a number of additional capabilities via Esri’s ArcGIS Technology and The Africa GeoPortal.

  • To create additional features for individuals in the survey dataset, you can map the location of the survey respondents using the GPS coordinates provided per individual (note that these are not exact GPS coordinates of the respondent for privacy reasons, but in the approximate region).
  • You can overlay the respondents’ locations with the Access Map dataset provided and other contextual layers available on The Africa GeoPortal, these include background imagery, basemaps, regional demographic data and many more.
  • Users can evaluate and validate ingoing and outcoming spatial data by using ArcGIS Online via Africa GeoPortal.
  • You can leverage the power of ArcGIS capabilities and Jupyter Notebooks

Here is some useful information on how to achieve the above:

  • To access the ArcGIS Technology, free accounts can be made here www.africageoportal.com. Further instructions can be found here
  • CSVs with location tags in them such as addresses or coordinates can be quickly displayed via drag & drop – how to is here
  • Other useful datasets such as imagery, population data covering Tanzania can also be accessed for free via Africa GeoPortal – see here http://www.africageoportal.com/pages/africa-living-atlas
  • For data scientists, integration between ArcGIS and Jupyiter Notebooks may be of great value. More info here, tools here and set up here
  • If users want ArcGIS desktop products, you can download a trial here. Trials are 21days but esri has agreed to extend your trial period for the purposes of this competition. Please send an email to zindi@zindi.africa.

The files for download are:

  • train.csv is the dataset that you will use to train your model. This dataset includes 7,094 randomly selected individuals.
  • test.csv is the dataset to which you will apply your model to test how well it performs. Use your model and this dataset to predict in which of the four classifications the person is likely in (no financial services, mobile money only, other services only, or both mobile money and other financial services). The test set contains 2,365 individuals. This dataset includes the same fields as train.csv except for the last FIVE columns. Note that the target is mobile_money_classification, which is actually just a composite of mobile_money, savings, borrowing, and insurance.
  • sample_submission.csv is an example of what your submission file should look like. Note that this is a table of probabilities across the four categories (no financial services, mobile money only, other services only, or both mobile money and other financial services).
  • Variable Codebook provides definitions of the variables found in test.csv and train.csv
  • FSDT_FinAccessMapping.zip provides GPS coordinates of all the "cash outlets" in Tanzania (in 2013), i.e. commercial banks, community banks, ATMs, microfinance institutions, mobile money agents, bus stations and post offices