Dear competitors,
Thank you for your patience while we worked on the error metric. It proved harder than we initially thought due to the diacritics and different characters.
We have implemented the Rouge Score, reporting the F-measure. This error metric was implemented on 5 May 2021 and the leaderboard rescored.
The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scoring algorithm calculates the similarity between a candidate document and a collection of reference documents. Use the ROUGE score to evaluate the quality of document translation and summarization models [ref].
Once again, thank you for your patience and perseverance during this challenge.
Is it possible to publish any starter code with this Rouge Score, used for evaluating and model training - for me as beginner is not clear how to use it?