Primary competition visual

SUA Outsmarting Outbreaks Challenge

Helping Tanzania, United Republic of
$12 500 USD + AWS credits
Completed (~1 year ago)
Prediction
815 joined
395 active
Starti
Dec 06, 24
Closei
Jan 31, 25
Reveali
Feb 01, 25
Discrepancies in Disease Occurrence Records for the Same Hospital and Time Period
Help · 19 Jan 2025, 00:28 · 2

Hello!

I know is a bit late but I would like to understand something. Look at the code bellow.

mask = train['ID'] == 'ID_704a38c1-35ca-4e81-ab81-02fcf41d1f72_9_2019_Diarrhea'
train.loc[mask, :]

Based on my understanding, I interpret the resulting dataset as follows:

In September 2019, at a hospital with the ID "ID_704a38c1-35ca-4e81-ab81-02fcf41d1f72_9_2019_Diarrhea" located at the coordinates "Transformed_Latitude" and "Transformed_Longitude," there were 10.0 occurrences of diarrhea, and also 2 instances of 0 occurrences. This doesn't make sense to me! How can there be different counts of the same disease recorded at the same hospital during the same time period?

Below is a code snippet that shows this happens multiple times.

id = train['ID'].sample(1).values[0]
print(id)
mask = train['ID'] == id
train.loc[mask, :]
Discussion 2 answers

Yea everyone is facing the same problem , just try find a solution for that because there is no optimal one i think

19 Jan 2025, 12:09
Upvotes 0
User avatar
Mohamed_Elnageeb
University of khartoum

Have you found an effective way to handle it? All of my attempts so far have significantly worsened the score