UmojaHack #3: Hotspots (BEGINNER) by UmojaHack Africa
$400 USD
Predicting fires in the DRC
624 data scientists enrolled, 156 on the leaderboard
ConservationAgriculturePredictionStructured
Africa
21 March 09:00 (8 hours)

We’ve split up the DRC into ~3800 equal areas, centered around the locations provided in the lat/lon columns. The target variable, burn_area, is the percentage of the area that has been burned in a given month. Due to the way it’s measured, there may be some overlap of burned areas for two successive months, and so total burned area over a time period isn’t necessarily equal to the sum of the ‘burn_area’ figures for all months. You are not permitted to use external data in this competition.

You do NOT need to use GIS data to solve this challenge.

In order to make access to the data easier for all participants, we have provided download links. We recommend you download the data before the challenge. The data is password protected, and we will share the password to all universities as well as on the livestream when the competition opens.

Folder codes will be shared on the day at 09:00 GMT on the University rep WhatsApp groups.

Data for download

  • Train.csv - is the dataset that you will use to train your model. This provides the details for 3821 area squares across the DRC for each month of the year starting from 1 April 2000 to 1 December 2013.
  • Test.csv - is the dataset on which you will apply your model to. This dataset contains the same variables as the test data except there is no target (‘burn_area’). This is what you are predicting. The test sets covers 3821 area squares across the DRC for each month of the year starting from 1 January 2014 to 31 December 2016.
  • SampleSubmission.csv - is an example of what your submission file should look like. The order of the rows does not matter, but the names of the IDs must be correct. The IDs take the form of [area ID]_yyyy-mm-dd. There are 3821 area squares each with a unique ID ranging from 0 to 3820.
  • VariableDescription.csv - Descriptions of the variables in the train and test set. You can find the sources of the variables in the below links.
  • StarterNotebook.ipynb - this starter notebook will help you make your first submission onto the leaderboard. Here is a link to the Google Colab Notebook.
  • StarterNotebook.R - this starter notebook will help you make your first submission onto the leaderboard.

The additional data included in the test and train files is as follows: