Primary competition visual

Intron AfriSpeech-200 Automatic Speech Recognition Challenge

$5 000 USD
Challenge completed over 2 years ago
Automatic Speech Recognition
430 joined
41 active
Starti
Feb 17, 23
Closei
May 28, 23
Reveali
May 28, 23
User avatar
HackP
National School Of Computer Science (ENSI) - Tunisia
Huge Train data
Data · 18 Apr 2023, 17:30 · 1

Hello everyone, does anyone knows how I can extract only a subset of the train data ( not all the amount of data)? It is about 401G.O and it seems that Google Collab can not handle this amount of data.

Discussion 1 answer
User avatar
Siwar_NASRI

you can use the streaming mode:

load_dataset("tobiolatunji/afrispeech-200", "all", streaming=True)

or load only one accent, for exp the yoruba accent:

load_dataset("tobiolatunji/afrispeech-200", 'yoruba')
19 Apr 2023, 21:08
Upvotes 0