AI4D Yorùbá Machine Translation Challenge
$2,000 USD
Can you translate Yorùbá to English?
445 data scientists enrolled, 63 on the leaderboard
4 December 2020—30 May 2021
Ends in 1 month
Metric & LB Broken
published 26 Mar 2021, 13:30

Can the organizer's look into the scoring again? The competition is no fun with a broken metric

Hello, can you please explain your concern.

We updated the BLEU metric with NLTK BLEU and it should be working.

The concern is the following: if we submit Test.csv as is, i.e. the Yoruba source sentences as targets, the resulting score is 0.4391, whereas if we do the same exercise with the Train set the average BLEU using NLTK implementation is 0.0057. This is the code I am using:

import pandas as pd
from nltk.translate.bleu_score import sentence_bleu

train = pd.read_csv('Train.csv')

metric = 0
for source, target in zip(train['Yoruba'], train['English']):
    metric += sentence_bleu([target.split()], source.split())
print(f'Average BLEU: {metric / len(train):.4f}')

Hi @Zindi. Would be great to get some feedback on this or a confirmation that it's being looked at, given that it's only 24 days left in the competition.

Also, would be great to get more information on your exact use of the NLTK BLEU. For instance, what score should we expect for these two sentences (I get 0.2187 with NLTK and 0.2131 with sacrebleu):

target = "A Disaster Relief Committee was formed to organize the long-term relief efforts."
prediction = "A Disaster Relief Committee was set up to look after the needs of our brothers in the area."

Hello, we are looking in to this. If needed we will extend the challenge.

Hello, @Zindi! What about problems with metric calculation? What about code with correct calculation for us?

Hello, we are still working on it. We are so sorry for this huge delay.