I think that ~9 is achievable. It is based on my estimation that using a leak I'm able to get ~6.8 and the leader is 0.7 better than me. And without a leak, my score is like ~10.5.
But also my test with label-leak shows that it is more crucial to predict where is `0` rather than the extent for DR. I mean that I try to label if the extent is 0 or not (not using the damage column, just the extent) and having 90% accuracy is not enough, I need ~like 95%.
1. Exactly this is what I'm doing. Non-DR get always 0 as extent
2. Just my hypothesis, I'm getting 90%. And just looking into data, a lot of them are just incorrectly labeled data (my own evaluation). I'm thinking about removing such kind of data from training and see if it would
so you get almost a 4 point decrease in RMSE? do you see the same 4 pt decrease on CV rmse?
interesting. Incorrectly labeled other than the 0 zero inflated portion? that's fair, some of the labels quite bad. some sort of psuedo labeling might be interesting.
There are some other approaches as well I'm considering to deal with low quality labels
I think that ~9 is achievable. It is based on my estimation that using a leak I'm able to get ~6.8 and the leader is 0.7 better than me. And without a leak, my score is like ~10.5.
But also my test with label-leak shows that it is more crucial to predict where is `0` rather than the extent for DR. I mean that I try to label if the extent is 0 or not (not using the damage column, just the extent) and having 90% accuracy is not enough, I need ~like 95%.
Thank you for your insight!
1. are you doing something other than setting predictions to 0 for damage type != DR? I'm not seeing as big of a decrease in rmse by using the leak
2. you can get to 95%? or that's just a theoretical bound for where you think it becomes useful
1. Exactly this is what I'm doing. Non-DR get always 0 as extent
2. Just my hypothesis, I'm getting 90%. And just looking into data, a lot of them are just incorrectly labeled data (my own evaluation). I'm thinking about removing such kind of data from training and see if it would
so you get almost a 4 point decrease in RMSE? do you see the same 4 pt decrease on CV rmse?
interesting. Incorrectly labeled other than the 0 zero inflated portion? that's fair, some of the labels quite bad. some sort of psuedo labeling might be interesting.
There are some other approaches as well I'm considering to deal with low quality labels
About lokal CV, yes, I see the same thing here.
Hey!
Thank you for sharing your experience!
I am a little bit confused about using damage type during the training procedure - are you using it?
Because from the official message it's not clear, if it is allowed for train
No, I don't use Damage column for training. I used it just for inference to check the leak. But I'm not gonna select this submission.
I try to re-create damage column by learning a classifier if extent is 0 or non-0.
Sorry for the confusion. I double-checked my 6.x score and it was obtained differently from what I described.
To get 6.x score I learned a model only using entries from DR (drought). During prediction only predict DR, all others are 0.
If I learn a model on entire data and then zero out non-DR, I get a score ~8.5 from ~10.5 (no leak prediction).
oh thank you for updating! I had suspected there was two "levels" to the leak, as I was only seeing 8.x scores when postprocessing non-DR to be 0