It is indeed messed up but maybe that is not the best example to use as that is the right behavior. Higher is better . So you should be on top of him but @micadee for example has lower scores and in second . That shouldnt be the case. So it is indeed confusing!
I’m getting higher RMSE and MAE scores than him, which should normally correspond to a lower performance score, right? Conversely, if I had lower RMSE and MAE, I would expect a higher score.
At the moment, the leaderboard seems quite off, which is really confusing. I think I’ll wait for the last two weeks before drawing any conclusions, when the results will be more meaningful.
All metric scores are normalised before being shown on the leaderboard. This ensures fairness when a challenge includes both metrics you want to maximise (such as Accuracy) and metrics you want to minimise (such as Log Loss).
So the score has already been normalized, I guess.
But still that doesn't mean the lb is not messed up , it is but in a mixed way. The calculation is somehow wrong I dont know
even if we are being evaluated on dummy data , then there should not be a mixup for example. If higher is better then that should be applied consistently Or @AJoel If I was to advise , it is better rescoring the LB and retaining those 2weeks score until the next two weeks(so even for these two weeks we are predicting for week 50 and 51) we still get evaluated based on week 48 and 49 until the next rolling values come out and so on. That way , we won't be evaluated on dummy data. Or better yet(as a competitor), ignore the LB completely and use the published 48 and 49 values already published to check your scores locally.
Somehow i disagree with your claim higher is better but i stand to be corrected .... lets say for example we have 3 true values and have two candidates A&B
The leaderboard has been mysterious since day 1 ...i personally even got frustrated and stopped submitting like 20 days ago cause locally i was getting mae below .5 everytime but when i submit i was seeing wonders😂.
please read the article I just linked. They do normalization before they show the scores on the Leaderboard. In the traditional sense both your explanations make sense but then they normalize the scores to maximize the scores even for the minimization objectives.
The article clearly states that they do normalization so that all metrics are maximized, meaning higher is better!
🌾 Join the Buzz: Clarification on leaderboard s... - 272 Views
Watch it carefully.
For example @keystats has an RMSE of 1.4545 and an MAE of 1.1750
and I have an RMSE of 3.4738 and an MAE of 2.573709476.
So it means @keystats must be ahead of me on the leaderboard even with the formula from @J0NNY in that chat.
Am I missing something?
It is indeed messed up but maybe that is not the best example to use as that is the right behavior. Higher is better . So you should be on top of him but @micadee for example has lower scores and in second . That shouldnt be the case. So it is indeed confusing!
I’m getting higher RMSE and MAE scores than him, which should normally correspond to a lower performance score, right? Conversely, if I had lower RMSE and MAE, I would expect a higher score.
At the moment, the leaderboard seems quite off, which is really confusing. I think I’ll wait for the last two weeks before drawing any conclusions, when the results will be more meaningful.
I mean in the traditional sense, Yes but the score you are seeing on the LB is not the actual mae/ rmses . Those have already been normalized.
💻 Introducing Multi-Metric Evaluation, or One Metric to Rule them All
Read the above article. It says:
All metric scores are normalised before being shown on the leaderboard. This ensures fairness when a challenge includes both metrics you want to maximise (such as Accuracy) and metrics you want to minimise (such as Log Loss).
So the score has already been normalized, I guess.
But still that doesn't mean the lb is not messed up , it is but in a mixed way. The calculation is somehow wrong I dont know
I think I get you now. The recent scores are on the new dummy validation scores from the starter notebook?
That would make sense then. I just tested that. I resubmitted an old result and it gave a better score than the previous submission
even if we are being evaluated on dummy data , then there should not be a mixup for example. If higher is better then that should be applied consistently Or @AJoel If I was to advise , it is better rescoring the LB and retaining those 2weeks score until the next two weeks(so even for these two weeks we are predicting for week 50 and 51) we still get evaluated based on week 48 and 49 until the next rolling values come out and so on. That way , we won't be evaluated on dummy data. Or better yet(as a competitor), ignore the LB completely and use the published 48 and 49 values already published to check your scores locally.
True, the leaderboard makes our local cross validation difficult to even trust😅
Somehow i disagree with your claim higher is better but i stand to be corrected .... lets say for example we have 3 true values and have two candidates A&B
True values
✅ Candidate A predictions
A₁ = 39.5 A₂ = 38.6 A₃ = 37.4
Step 1 — Compute individual errors
🔹 For T₁ = 40, A₁ = 39.5
Error = 39.5 − 40 = −0.5 Absolute Error = |−0.5| = 0.5 Squared Error = (−0.5)² = 0.25
🔹 For T₂ = 39, A₂ = 38.6
Error = 38.6 − 39 = −0.4 Absolute Error = |−0.4| = 0.4 Squared Error = (−0.4)² = 0.16
🔹 For T₃ = 38, A₃ = 37.4
Error = 37.4 − 38 = −0.6 Absolute Error = |−0.6| = 0.6 Squared Error = (−0.6)² = 0.36
⭐ Candidate A summary of individual errors
PointErrorAbs ErrorSq Error1−0.50.50.252−0.40.40.163−0.60.60.36
Step 2 — Calculate MAE for A
MAE = (0.5 + 0.4 + 0.6) / 3 = 1.5 / 3 = 0.5
Step 3 — Calculate RMSE for A
MSE = (0.25 + 0.16 + 0.36) / 3 = 0.77 / 3 = 0.256666…
RMSE = √0.256666… RMSE ≈ 0.5066
✅ Final scores for Candidate A
✅ Candidate B predictions
B₁ = 38.7 B₂ = 37.9 B₃ = 36.8
Step 1 — Compute individual errors
🔹 For T₁ = 40, B₁ = 38.7
Error = 38.7 − 40 = −1.3 Absolute Error = |−1.3| = 1.3 Squared Error = (-1.3)² = 1.69
🔹 For T₂ = 39, B₂ = 37.9
Error = 37.9− 39 = −1.1 Absolute Error = |−1.1| = 1.1 Squared Error = (−1.1)² = 1.21
🔹 For T₃ = 38, B₃ = 36.8
Error = 36.8 − 38 = -1.2 Absolute Error = |−1.2| = 1.2 Squared Error = (−1.2)² = 1.44
Step 2 — Calculate MAE for B
MAE = (1.3 + 1.1 + 1.2) / 3 = 3.6 / 3 = 1.2
Step 3 — Calculate RMSE for B
MSE = (1.69 + 1.21 + 1.44) / 3 = 4.34 / 3 = 1.446667…
RMSE = √1.446667… RMSE ≈ 1.20277
✅ Final scores for Candidate B
this shows lower is better🤔
The leaderboard has been mysterious since day 1 ...i personally even got frustrated and stopped submitting like 20 days ago cause locally i was getting mae below .5 everytime but when i submit i was seeing wonders😂.
Trust them they will pay off eventually if they are good
please read the article I just linked. They do normalization before they show the scores on the Leaderboard. In the traditional sense both your explanations make sense but then they normalize the scores to maximize the scores even for the minimization objectives.
The article clearly states that they do normalization so that all metrics are maximized, meaning higher is better!
📷
closer to zero tops the lb
but the normalization doesn't seem right in this case
That is the problem from the start. I think they haven't renewed the board yet for older submissions. Try posting a submission you posted before.