Idk why the starter notebook gives a higher score eventhough the linkage process in it is totally wrong , when i corrected it i found that it got lower score , any help ?
What do you mean the linkage process is totally wrong?? I think the approach used in the starter notebook is ssound enough? Given that there are no matching lat lon pairs the only other logical way is to use the closest lat lon pair or can you explain why it is wrong?
Yes, but that is not enough. Consider looking for an example like ('ID_3a11929e-3317-476d-99f7-1bd9fb58f018_12_2022_Dysentery') for month == 12 and year == 2022. It has been linked with waste_Month_Year = 5_2021, toilet_Month_Year = 4_2020, and water_Month_Year_lat_lon = 12_2023_-8.62966_68.23589, which seems like a mess. While it is true that these places are close to each other, they provide incorrect information regarding the true date, which should be month == 12 and year == 2022. If I am wrong about anything, please explain where my mistake is.
Ah great! Thank youuu!! Also tbh you can get 6 score without the linkage process. So you should try that too. I am yet to find a good way to use the additional dataset to improve my score. I don't know if the 5 guys are using the additional datasets?
yes my best score is without the additional datasets i removed them until i find a better approach , I think the 5 guys are using them but with another linkage approach
When you say without the linkage process, you mean just using the train.csv file and ignoring the waste, and toilet csvs? I feel like there isnt enough info in this one file in order to build a good model but I may be missing something
It can correlate if you group location before training. However you might get a score around 6. Honestly you can even get a 5 score with just one model from catboost to lightGBM. Xgboost did not really give me a good score. Good luck!
What do you mean the linkage process is totally wrong?? I think the approach used in the starter notebook is ssound enough? Given that there are no matching lat lon pairs the only other logical way is to use the closest lat lon pair or can you explain why it is wrong?
Yes, but that is not enough. Consider looking for an example like ('ID_3a11929e-3317-476d-99f7-1bd9fb58f018_12_2022_Dysentery') for month == 12 and year == 2022. It has been linked with waste_Month_Year = 5_2021, toilet_Month_Year = 4_2020, and water_Month_Year_lat_lon = 12_2023_-8.62966_68.23589, which seems like a mess. While it is true that these places are close to each other, they provide incorrect information regarding the true date, which should be month == 12 and year == 2022. If I am wrong about anything, please explain where my mistake is.
Ah great! Thank youuu!! Also tbh you can get 6 score without the linkage process. So you should try that too. I am yet to find a good way to use the additional dataset to improve my score. I don't know if the 5 guys are using the additional datasets?
yes my best score is without the additional datasets i removed them until i find a better approach , I think the 5 guys are using them but with another linkage approach
okay good luck , let's hope to reach the 5 guys
We have now 3. XD
damnnnnn that's crazzyyyyyy
I have given up😭😭
I mean don't, fight till the end. Never every give up. You don't know if you will find something creative along the way
@Koleshjr That's true, I think I'd continue. Thank you so much for the encouragement.
When you say without the linkage process, you mean just using the train.csv file and ignoring the waste, and toilet csvs? I feel like there isnt enough info in this one file in order to build a good model but I may be missing something
Yes but it gives a good score.
Using the raw data alone or would you recommend ensemble learning with different GB / tree models?
I also feel like theres something more we are missing. MAE scores offline dont seem to correlate well to LB scores lol
It can correlate if you group location before training. However you might get a score around 6. Honestly you can even get a 5 score with just one model from catboost to lightGBM. Xgboost did not really give me a good score. Good luck!
Thanks !