Sustainable Development Goals (SDGs): Text Classification Challenge 🌍

Sustainable Development Goals (SDGs): Text Classification Challenge

Helping Africa

$1 000 USD

Completed (over 7 years ago)

Skills you will learn

Natural Language Processing

Classification

414 joined

50 active

Info Data Chat Leaderboard

Start

Sep 05, 18

Nov 12, 18

Reveal

Nov 13, 18

About

The data has been split into a test and training set.

Notes on the data: The text contains some HTML mark-up. Dealing with this “dirtiness” of the text is part of the challenge. All the text is in English. The training data was manually classified by Devex analysts and subject-matter experts.

Variables in devex_train.csv

ID: Unique ID of text to be classified
Type: The type or source of the text. (Contract=Names of contracts, News=Titles of news articles, Organization=Text about an organization, Open Opp=Description of an opportunity)
Text: text to be classified
Label_1 through Label_12: These columns are populated starting at Label_1 increasing incrementally until all relevent classifications are populated, to a maximum of 12 Labels. The remaining Labels are left blank.

The 27 possible SDG 3 indicators are (In the dataset, you will use only the indicator's code, e.g. "3.1.1"):

3.1.1 - Maternal mortality ratio
3.1.2 - Proportion of births attended by skilled health personnel
3.2.1 - Under-5 mortality rate
3.2.2 - Neonatal mortality rate
3.3.1 - Number of new HIV infections per 1 000 uninfected population, by sex, age and key populations
3.3.2 - Tuberculosis incidence per 100 000 population
3.3.3 - Malaria incidence per 1 000 population
3.3.4 - Hepatitis B incidence per 100 000 population
3.3.5 - Number of people requiring interventions against neglected tropical diseases
3.4.1 - Mortality rate attributed to cardiovascular disease, cancer, diabetes or chronic respiratory disease
3.4.2 - Suicide mortality rate
3.5.1 - Coverage of treatment interventions (pharmacological, psychosocial and rehabilitation and aftercare services) for substance use disorders
3.5.2 - Harmful use of alcohol, defined according to the national context as alcohol per capita consumption (aged 15 years and older) within a calendar year in litres of pure alcohol
3.6.1 - Death rate due to road traffic injuries
3.7.1 - Proportion of women of reproductive age (aged 15–49 years) who have their need for family planning satisfied with modern methods
3.7.2 - Adolescent birth rate (aged 10–14 years; aged 15–19 years) per 1 000 women in that age group
3.8.1 - Coverage of essential health services (defined as the average coverage of essential services based on tracer interventions that include reproductive, maternal, newborn and child health, infectious diseases, non-communicable diseases and service capacity and access, among the general and the most disadvantaged population)
3.8.2 - Proportion of population with large household expenditures on health as a share of total household expenditure or income
3.9.1 - Mortality rate attributed to household and ambient air pollution
3.9.2 - Mortality rate attributed to unsafe water, unsafe sanitation and lack of hygiene (exposure to unsafe Water, Sanitation and Hygiene for All (WASH) services)
3.9.3 - Mortality rate attributed to unintentional poisoning
3.a.1 - Age-standardized prevalence of current tobacco use among persons aged 15 years and older
3.b.1 - Proportion of the target population covered by all vaccines included in their national programme
3.b.2 - Total net official development assistance to medical research and basic health sector
3.b.3 - Proportion of health facilities that have a core set of relevant essential medicines available and affordable on a sustainable basis
3.c.1 - Health worker density and distribution
3.d.1 - International Health Regulations (IHR) capacity and health emergency preparedness

Files

Description

Files

Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to.

Is an example of what your submission file should look like. The order of the rows does not matter, but the names of the "ID" must be correct.

Train contains the target. This is the dataset that you will use to train your model.

Join the largest network for
data scientists and AI builders

About FAQs

Status