Dear competitors,
Thank you for your patience while we worked on the error metric. It proved harder than we initially thought due to the diacritics and different characters.
We have implemented the Rouge Score, reporting the F-measure. This error metric was implemented on 5 May 2021 and the leaderboard rescored.
The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm calculates the similarity between a candidate document and a collection of reference documents. Use the ROUGE score to evaluate the quality of document translation and summarization models [ref].
Once again, thank you for your patience and perseverance during this challenge.
Hi, Thanks for the update :)
But which ROUGE Score is used - ROUGE-L or ROUGE-1 ??
Hi, it is ROUGE-N (N-gram) scoring (Rouge1), reporting the F-measure.
Ok got it !
Is it possible to publish any starter code with this Rouge Score, used for evaluating and model training - for me as beginner is not clear how to use it?
Hello @Zindi, are punctuations important for the translation, are they taken into account for the ROUGE score?
Update - I see ROUGE Score ignores the punctuations when I try the python metric. Thanks
Yes, the diacritics and accents are taken into account.
Thank you for the clarification