Primary competition visual

AI4D Yorùbá Machine Translation Challenge

Helping Nigeria
$2 000 USD
Completed (almost 5 years ago)
Machine Translation
683 joined
84 active
Starti
Dec 04, 20
Closei
May 30, 21
Reveali
May 30, 21
Metric & LB Broken
Help · 26 Mar 2021, 13:30 · 7

Can the organizer's look into the scoring again? The competition is no fun with a broken metric

Discussion 7 answers
User avatar
ZINDI

Hello, can you please explain your concern.

We updated the BLEU metric with NLTK BLEU and it should be working.

30 Mar 2021, 09:00
Upvotes 0

The concern is the following: if we submit Test.csv as is, i.e. the Yoruba source sentences as targets, the resulting score is 0.4391, whereas if we do the same exercise with the Train set the average BLEU using NLTK implementation is 0.0057. This is the code I am using:

import pandas as pd
from nltk.translate.bleu_score import sentence_bleu

train = pd.read_csv('Train.csv')

metric = 0
for source, target in zip(train['Yoruba'], train['English']):
    metric += sentence_bleu([target.split()], source.split())
print(f'Average BLEU: {metric / len(train):.4f}')

Hi @Zindi. Would be great to get some feedback on this or a confirmation that it's being looked at, given that it's only 24 days left in the competition.

Also, would be great to get more information on your exact use of the NLTK BLEU. For instance, what score should we expect for these two sentences (I get 0.2187 with NLTK and 0.2131 with sacrebleu):

target = "A Disaster Relief Committee was formed to organize the long-term relief efforts."
prediction = "A Disaster Relief Committee was set up to look after the needs of our brothers in the area."
User avatar
ZINDI

Hello, we are looking in to this. If needed we will extend the challenge.

Hello, @Zindi! What about problems with metric calculation? What about code with correct calculation for us?

User avatar
ZINDI

Hello, we are still working on it. We are so sorry for this huge delay.