Because the end values for one of them is incremental for some reason. Split by '-' and get rid of the last section before joining back, and you'll get the same 4 categories in the train set.
The dataset is just confusing to me, some rows have the same values for all the columns. The ID,Location,Disease,Category_Health_Facility_UUID, latitude,longitude,month,year all have the same values for all the columns
Great Observation @da_. Category could indicate a classification or grouping, possibly related to the type or category of the health facility (e.g., hospital, clinic, dispensary).
Interesting observation. One experiment to conduct is to do away with the column
Because the end values for one of them is incremental for some reason. Split by '-' and get rid of the last section before joining back, and you'll get the same 4 categories in the train set.
Valid !
Yeah, thanks. You were right.
The dataset is just confusing to me, some rows have the same values for all the columns. The ID,Location,Disease,Category_Health_Facility_UUID, latitude,longitude,month,year all have the same values for all the columns
Great Observation @da_. Category could indicate a classification or grouping, possibly related to the type or category of the health facility (e.g., hospital, clinic, dispensary).