Hello!
I know is a bit late but I would like to understand something. Look at the code bellow.
mask = train['ID'] == 'ID_704a38c1-35ca-4e81-ab81-02fcf41d1f72_9_2019_Diarrhea'
train.loc[mask, :]
Based on my understanding, I interpret the resulting dataset as follows:
In September 2019, at a hospital with the ID "ID_704a38c1-35ca-4e81-ab81-02fcf41d1f72_9_2019_Diarrhea" located at the coordinates "Transformed_Latitude" and "Transformed_Longitude," there were 10.0 occurrences of diarrhea, and also 2 instances of 0 occurrences. This doesn't make sense to me! How can there be different counts of the same disease recorded at the same hospital during the same time period?
Below is a code snippet that shows this happens multiple times.
id = train['ID'].sample(1).values[0]
print(id)
mask = train['ID'] == id
train.loc[mask, :]
Yea everyone is facing the same problem , just try find a solution for that because there is no optimal one i think
Have you found an effective way to handle it? All of my attempts so far have significantly worsened the score