Primary competition visual

Intron AfriSpeech-200 Automatic Speech Recognition Challenge

$5 000 USD
Challenge completed over 2 years ago
Automatic Speech Recognition
430 joined
41 active
Starti
Feb 17, 23
Closei
May 28, 23
Reveali
May 28, 23
Missing IDs
Data · 24 Feb 2023, 13:47 · 6

What data are we submitting from. I have been trying to submit my solution but it gives an error on several missing IDs. I have checked and the files being refered to are in the test dataset which is yet to be made available. PS. I am trying to submit on predictions from Afrispeech-dev/dev files

Discussion 6 answers

Can you provide a snippet of the code you are running?

24 Feb 2023, 22:25
Upvotes 0

Are you testing on data from huggingface or from the google drive download?

24 Feb 2023, 22:26
Upvotes 0
User avatar
Siwar_NASRI

@Brian_Macharia the dev_meta table and the dev/validation dataset contain 3227 samples/audios, the test_meta/test dataset that will be uploaded later contains 5064 samples, while the submission file should contain all IDs (8091). The easiest solution is to concatenate your dev_meta predictions with, a " " space for the test_meta transcripts.

25 Feb 2023, 20:50
Upvotes 2

@Siwar_NASRI this is what i ended up doing.

User avatar
alvinkimata
Machakos university

I ended up doing this but I still encountered the "missing entries for IDs" error. Here's the code snippet that I used.

dev_metadata = pd.read_csv('/kaggle/input/afrispeech-dev/dev_metadata.csv')
test_metadata = pd.read_csv('/kaggle/input/afrispeech-dev/test_metadata.csv')
dev_metadata['transcripts'] = transcriptions
columns = dev_metadata.columns.tolist()[:-2]
full_metadata = pd.concat([dev_metadata, test_metadata])
#Drop columns and remain with audio_id and transcripts columns. End result is a dataframe with 2 columns. 
full_metadata = full_metadata.drop(columns, axis = 1)
full_metadata = full_metadata.fillna("")
full_metadata.to_csv('first_submission.csv', index = False)

What could be the issue?