Hi every one, I hope you're enjoying this challenge. But It seems that there is a bug in the scoring script.
I went through the eval script. (https://github.com/Lelapa-AI/zindi-inkuba-notebook/blob/main/utils/eval.py) And I notice two bugs:
1/ As it is, it only scores the translation, all others (NLI and sentiement analysis) will always be scored zero because of type mismatch. In the script, the groudth truth is maped via a dictionnary, but the original prediction will remain string so that the F1 score is always 0. The score will always be zero despite a right prediction.
For this bug we may replace the line 26 of the script by :
predicted_label = int(row["Response"])
2/ Again , the weights are not equal by task, the weight of machine translation is 300/302 and the others have weights 1/302 while the expected weight by task is 1/3. In the code, the 300 chrf scores are gathered for the translation part, then we append the f1 score of NLI and sentiment analysis. To solve it, juste reiniitialize the list scores of line 10 by its own mean before the line 44.
Great find! Now I get why your score is not 0.9 :P
No @snow 🙏. It means that my score should be greather than 0.1. For the time being I only worked on translation and sentiment. My Local CV for translation is 0.32 and 0.45 for sentiment. So I expected a score around (0.32 + 0.45)/3 = 0.25xxxx.
I also aggree that it is giving more score to translation task and less to other tasks.
Yes, I also observed these two bugs in the scoring script yesterday.
Great idea
For anyone who hasn't seen already: This has been patched. You can see the commit that introduced the fix here:
https://github.com/Lelapa-AI/zindi-inkuba-notebook/commit/25eda434e171ea7b2fcd1e3920cb8e859816172c
can the grader be ran to see the updated scores?
Hello! We'll be updating the scoring early next week. Will keep you posted!
Thanks.
is it fixed? i still get no change of scores