We found a bug in the BLEU error metric. We are fixing it this week and will re-score the leaderboard by Monday.
All the best,
The Zindi Team
Could you please check the BLEU error metric again? The scores that the system produce are very low. When the same solution is tested localy, the BLEU score on a held-out dataset defined as a subset of the training dataset you provided to us is much higher than what we get here.