We have re scored the leaderboard and we are using the NLTK BLEU library.
We have also extended this challenge by 2 weeks.
All the best and thank you for your patience.
Submitting Test.csv back yields a score of 0.439, which is higher than my trained models, lol. The scoring is still broken, sorry to be the bearer of bad news :)
Yes. The metric is broken