The objective of this challenge is to build an epidemiological model that predicts the spread of COVID-19 throughout the world. The target variable is the cumulative number of deaths caused by COVID-19 in each country by each date.
We have selected the cumulative number of fatalities rather than the number of reported infections as the target variable because the real number of infections is unknown and will perhaps never be known. The number of reported cases is understood to be underestimated and largely biased by the availability of tests, which varies from location to location and country to country.
We encourage participants to engage with the literature available on approaches and considerations when modelling the spread of diseases.
For this competition, we have used the publicly-available data from the Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU), which is updated on a daily-basis.
For your reference:
For this competition, you can suggest other publicly-available datasets to use in your model. Please post them on the discussion forum for approval. We will update this page to include new datasets as they are suggested. Because you are predicting the future, virtually any dataset will be allowed as long as everyone has equal access to it.
This challenge is to build a model that actually looks into the future. Recognising that all of the data is publicly-available and grows every day, we have structured this challenge a bit differently from other Zindi challenges:
The Public Leaderboard will be updated once a week with the most recent seven days of actual data and scores will be recalculated. While the competition is open, the Public Leaderboard will rank the submitted solutions by the accuracy score they achieve on only the most recent seven days. Once submissions are closed and no longer accepted, the most recent Public Leaderboard will remain visible until the final close of the competition. Upon the final close of the competition, the Private Leaderboard will be revealed which gives an accuracy score on only the data from the time submissions closed until the time the competition closed. This will be the final ranking for the competition.
Files available for download
Other learning resources from the community:
Two relevant learning opportunities from the Johns Hopkins Bloomberg School of Public Health.
Additional datasets from the community:
Population density mapping from Facebook:
Join the largest network for
data scientists and AI builders