Primary competition visual

Sustainable Development Goals (SDGs): Text Classification Challenge

Helping Africa
$1 000 USD
Challenge completed almost 7 years ago
Natural Language Processing
Classification
412 joined
50 active
Starti
Sep 05, 18
Closei
Nov 12, 18
Reveali
Nov 13, 18
About

The data has been split into a test and training set.

Notes on the data: The text contains some HTML mark-up. Dealing with this “dirtiness” of the text is part of the challenge. All the text is in English. The training data was manually classified by Devex analysts and subject-matter experts.

Variables in devex_train.csv

  • ID: Unique ID of text to be classified
  • Type: The type or source of the text. (Contract=Names of contracts, News=Titles of news articles, Organization=Text about an organization, Open Opp=Description of an opportunity)
  • Text: text to be classified
  • Label_1 through Label_12: These columns are populated starting at Label_1 increasing incrementally until all relevent classifications are populated, to a maximum of 12 Labels. The remaining Labels are left blank.

The 27 possible SDG 3 indicators are (In the dataset, you will use only the indicator's code, e.g. "3.1.1"):

  1. 3.1.1 - Maternal mortality ratio
  2. 3.1.2 - Proportion of births attended by skilled health personnel
  3. 3.2.1 - Under-5 mortality rate
  4. 3.2.2 - Neonatal mortality rate
  5. 3.3.1 - Number of new HIV infections per 1 000 uninfected population, by sex, age and key populations
  6. 3.3.2 - Tuberculosis incidence per 100 000 population
  7. 3.3.3 - Malaria incidence per 1 000 population
  8. 3.3.4 - Hepatitis B incidence per 100 000 population
  9. 3.3.5 - Number of people requiring interventions against neglected tropical diseases
  10. 3.4.1 - Mortality rate attributed to cardiovascular disease, cancer, diabetes or chronic respiratory disease
  11. 3.4.2 - Suicide mortality rate
  12. 3.5.1 - Coverage of treatment interventions (pharmacological, psychosocial and rehabilitation and aftercare services) for substance use disorders
  13. 3.5.2 - Harmful use of alcohol, defined according to the national context as alcohol per capita consumption (aged 15 years and older) within a calendar year in litres of pure alcohol
  14. 3.6.1 - Death rate due to road traffic injuries
  15. 3.7.1 - Proportion of women of reproductive age (aged 15–49 years) who have their need for family planning satisfied with modern methods
  16. 3.7.2 - Adolescent birth rate (aged 10–14 years; aged 15–19 years) per 1 000 women in that age group
  17. 3.8.1 - Coverage of essential health services (defined as the average coverage of essential services based on tracer interventions that include reproductive, maternal, newborn and child health, infectious diseases, non-communicable diseases and service capacity and access, among the general and the most disadvantaged population)
  18. 3.8.2 - Proportion of population with large household expenditures on health as a share of total household expenditure or income
  19. 3.9.1 - Mortality rate attributed to household and ambient air pollution
  20. 3.9.2 - Mortality rate attributed to unsafe water, unsafe sanitation and lack of hygiene (exposure to unsafe Water, Sanitation and Hygiene for All (WASH) services)
  21. 3.9.3 - Mortality rate attributed to unintentional poisoning
  22. 3.a.1 - Age-standardized prevalence of current tobacco use among persons aged 15 years and older
  23. 3.b.1 - Proportion of the target population covered by all vaccines included in their national programme
  24. 3.b.2 - Total net official development assistance to medical research and basic health sector
  25. 3.b.3 - Proportion of health facilities that have a core set of relevant essential medicines available and affordable on a sustainable basis
  26. 3.c.1 - Health worker density and distribution
  27. 3.d.1 - International Health Regulations (IHR) capacity and health emergency preparedness
Files
Description
Files
Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.
Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct.
Train contains the target. This is the dataset that you will use to train your model.