Hi Zindians,
Thank you for raising concerns about the evaluation metrics, these are always helpful in continuously improving challenge design.
As stated on the challenge Info page, under the evaluation section:
The values in TargetMBE and TargetRMSE should be identical for each corresponding entry of the submission. This format is required for multi-metric evaluation.
1. The expectation is that the values in the TargetMBE and TargetRMSE columns are identical for each submission entry. However, because the two targets are evaluated by separate metric, this is not enforce. This is consistent with how multi-metric evaluation has always been handled on the platform. 2. We cannot rule out that some submissions on the current leaderboard may contain differing values across the two columns. However participants aware they should NOT be. 3. During evaluation, we will inspect the submission file generated directly from each participant’s code and verify that the values in the two columns are identical. Any submission where the values are not identical will be flagged, which will automatically affect the ranking. Attempts to manually adjust values outside the model's genuine output, without sound reason, will also be identified and considered during the review. 4. We will not be dropping the MBE metric or making further changes to the evaluation criteria at this stage of the competition. However, we will certainly check score consistency when considering only RMSE during evaluation. We are confident that the code review process provides a robust safeguard, and that it will ensure a fair final outcome for all participants.
Happy coding
Thanks for the clarifications! Much appreciated
Thanks for this
"Attempts to manually adjust values outside the model's genuine output, without sound reason, will also be identified and considered during the review."
curios to see how many participants will be removed by this rule.
Congratz to the winners that didn'y try to move means by probing.