RMSE makes the assumption that conditional distribution is symmetric and can range between -infinity and + infinity. Our per cent vulnerable values are bounded between 0 and 1. This should have been like the Deviance on the Beta Distribution or we would have been predicting the log(percent vulnerable). This is going to bias any insights we get from our models. Maybe this is for future reference.
@marcusinthesky i agree with you RMSE assigns a higher weight to larger errors meaning it is more useful when large errors are present
Did anyone train with MAE?
RMSE is the right metrics; if you log the target variable, low RMSE is obtained
logging values that are zeros and too close to zero doesn't correlate well with lb.I think it will e such better to use log when most values are not too close to zero
Noted! buddy...
I partially agree with @Engineer, RMSE(log target) would be a better assumption though 1. It is not the current metric and 2. it still suffers from scoring us against distribution which is not the same as our data. @DrFad I did not use MAE as MAE is not the metric used in the competition, MAE is also not differentiable so you cannot optimize directly on it.