Primary competition visual

SUA Outsmarting Outbreaks Challenge

Helping Tanzania, United Republic of
$12 500 USD + AWS credits
Completed (~1 year ago)
Prediction
815 joined
395 active
Starti
Dec 06, 24
Closei
Jan 31, 25
Reveali
Feb 01, 25
Linkage process
Help · 29 Dec 2024, 19:29 · 15

Idk why the starter notebook gives a higher score eventhough the linkage process in it is totally wrong , when i corrected it i found that it got lower score , any help ?

Discussion 15 answers
User avatar
Koleshjr
Multimedia university of kenya

What do you mean the linkage process is totally wrong?? I think the approach used in the starter notebook is ssound enough? Given that there are no matching lat lon pairs the only other logical way is to use the closest lat lon pair or can you explain why it is wrong?

30 Dec 2024, 11:30
Upvotes 0

Yes, but that is not enough. Consider looking for an example like ('ID_3a11929e-3317-476d-99f7-1bd9fb58f018_12_2022_Dysentery') for month == 12 and year == 2022. It has been linked with waste_Month_Year = 5_2021, toilet_Month_Year = 4_2020, and water_Month_Year_lat_lon = 12_2023_-8.62966_68.23589, which seems like a mess. While it is true that these places are close to each other, they provide incorrect information regarding the true date, which should be month == 12 and year == 2022. If I am wrong about anything, please explain where my mistake is.

User avatar
Koleshjr
Multimedia university of kenya

Ah great! Thank youuu!! Also tbh you can get 6 score without the linkage process. So you should try that too. I am yet to find a good way to use the additional dataset to improve my score. I don't know if the 5 guys are using the additional datasets?

yes my best score is without the additional datasets i removed them until i find a better approach , I think the 5 guys are using them but with another linkage approach

User avatar
Koleshjr
Multimedia university of kenya

okay good luck , let's hope to reach the 5 guys

We have now 3. XD

User avatar
Koleshjr
Multimedia university of kenya

damnnnnn that's crazzyyyyyy

User avatar
CodeJoe

I have given up😭😭

User avatar
Koleshjr
Multimedia university of kenya

I mean don't, fight till the end. Never every give up. You don't know if you will find something creative along the way

User avatar
CodeJoe

@Koleshjr That's true, I think I'd continue. Thank you so much for the encouragement.

When you say without the linkage process, you mean just using the train.csv file and ignoring the waste, and toilet csvs? I feel like there isnt enough info in this one file in order to build a good model but I may be missing something

User avatar
CodeJoe

Yes but it gives a good score.

Using the raw data alone or would you recommend ensemble learning with different GB / tree models?

I also feel like theres something more we are missing. MAE scores offline dont seem to correlate well to LB scores lol

User avatar
CodeJoe

It can correlate if you group location before training. However you might get a score around 6. Honestly you can even get a 5 score with just one model from catboost to lightGBM. Xgboost did not really give me a good score. Good luck!