🚗 Join the Buzz: Special Characters

AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF

Helping Senegal

$2 000 USD

Completed (~5 years ago)

Skills you will learn

Classification

Automatic Speech Recognition

Natural Language Processing

374 joined

47 active

Info Data Chat Leaderboard

Start

Feb 12, 21

May 23, 21

Reveal

May 23, 21

Roman18

Special Characters

Connect · 17 Apr 2021, 08:56 · 0

Dear organisation team & fellow data scientists,

I have noticed that there are some special characters in the training set.

Amongst those, there are some that are inherit to the language e.g. - or ' but others are an artefact of human translators e.g. ( or ) or "

1) will this be removed in the future from the trainings and test set?

2) if not, I would assume that training and test set are taken from the same distribution meaning that we can expect similar special characters in the test set as in the training set.

Additionally, I have noted that the audio data in the test set with ID=e3a74a8998f03c320f5a4923272247485832b1cd803528f5eb5a50aef3d29a78b436b3ea37c47763e9b9be8b3ee53435b51d3466345217ce5d6fcb9b48a53c63

is empty.

Thanks a lot for settting up this very interesting challenge!

Discussion 0 answers

Join the largest network for
data scientists and AI builders

About FAQs

Status