UNICEF Arm 2030 Vision #1: Flood Prediction in Malawi
$10,000 USD
Predict flood extent caused by storms in southern Malawi
1606 data scientists enrolled, 480 on the leaderboard
2 December 2019—17 May 2020

Southern Malawi experienced major flooding in 2015 and again in 2019 with cyclone Idai. Approximate dates of impact are 13 January 2015 and 14 March 2019, respectively.

We have broken up the map of southern Malawi into approximately 1 km sq rectangles. Each rectangle has a unique ID. Each rectangle has been assigned a "target" value which is the fraction (percentage) of that rectangle that was flooded in 2015.

For this competition, the training data is the flood extent in 2015 in southern Malawi, however, you are encouraged to source other flood data for other nearby regions and other historic floods to train your model. (Just be sure to propose any new datasets that are not listed here to Zindi at zindi@zindi.africa for approval.)

The test data to measure the accuracy of your model is the flood extent in southern Malawi in 2019.

Each unique rectangle also has some additional features that we have already extracted for you. Although we encourage you to add more yourself, these features are included as a starting point. They are:

  • Elevation. Mean elevation over the rectangle, based on this dataset in Google Earth Engine.
  • Dominant Land Cover Type. Most areas are predominantly grasslands, savannah or cropland. You can find the full list of land cover types here in the ‘LC_Type1 Class Table’.
  • Weekly Precipitation. Historical rainfall data for each rectangle, for 18 weeks beginning 2 months before the flooding. Rainfall estimates from this dataset in Google Earth Engine.

Train.csv has the target variable for 2015, along with the above features (including rainfall for both the 2015 and 2019 flood events). The submission file should have the predicted target for 2019 and cover the same locations as Train. The X, Y coordinates given represent a rectangle 0.01 degrees on each side, centered on that X-Y location.

The target is the percentage of the given rectangle that was flooded, with a value between 0 and 1.

In addition to the features we have provided in the train and test CSV, you are free to extract additional datasets and features from the sites listed below:

Think about features such as land cover, elevation and slope, soil properties etc. that will affect how water moves in the environment. You may also use data on weather and rainfall leading up to and during the flooding.

Note that you cannot use images to detect the actual flood extent in the test data. In other words, this is not a computer vision challenge for identifying actual flooding. Any solutions that use models to detect actual flood extent from actual flood images in southern Malawi in 2019 will be disqualified. However, you may use imagery from before the flood events (imagery must be from at least one month before the flooding) to extract features you think might be useful to your model.

Finally, you can also propose other publicly-available datasets or data sources to us. We will review and approve your proposals and add them to the official list of accepted datasets above.

The files for download are:

  • Train.csv - has the target variable and other features for each rectangle for the flood in 2015. You will use this data to train your model.
  • Test.csv - there is no test file as you will be using the same coordinates as the Train file.
  • SampleSubmission.csv - is an example of what your submission file should look like. The order of the rows does not matter, but the names of the Square_ID must be correct. Where Square is a rectangle.

To propose additional datasets, email zindi@zindi.africa. New data sets will not be accepted after 8 May 2020.

Additional datasets

Please note that you cannot use data during the 2019 cyclones or afterward. If you are using rainfall data you can use it for 18 weeks beginning 2 months before the 2019 cyclone.

Landsat imagery

Historic rainfall and temperature data

Malawi geospatial data

Soil data

Landcover data

Other data sites

Please document all data sets used.