Primary competition visual

SUA Outsmarting Outbreaks Challenge

Helping Tanzania, United Republic of
$12 500 USD + AWS credits
Completed (~1 year ago)
Prediction
815 joined
395 active
Starti
Dec 06, 24
Closei
Jan 31, 25
Reveali
Feb 01, 25
User avatar
AI_Maven
University of Benin
Column Clarification
Help · 18 Jan 2025, 19:01 · 5

I'm confused, why is the number of unique categories in the 'Category_Health_Facility_UUID' for the train data 4 while that of the test data is 107?

Discussion 5 answers

Interesting observation. One experiment to conduct is to do away with the column

18 Jan 2025, 19:14
Upvotes 5

Because the end values for one of them is incremental for some reason. Split by '-' and get rid of the last section before joining back, and you'll get the same 4 categories in the train set.

18 Jan 2025, 19:27
Upvotes 4
User avatar
MICADEE
LAHASCOM

Valid !

User avatar
AI_Maven
University of Benin

Yeah, thanks. You were right.

The dataset is just confusing to me, some rows have the same values for all the columns. The ID,Location,Disease,Category_Health_Facility_UUID, latitude,longitude,month,year all have the same values for all the columns

User avatar
CodeJoe

Great Observation @da_. Category could indicate a classification or grouping, possibly related to the type or category of the health facility (e.g., hospital, clinic, dispensary).