National School Of Computer Science (ENSI) - Tunisia
Huge Train data
Data ·18 Apr 2023, 17:30·1
Hello everyone, does anyone knows how I can extract only a subset of the train data ( not all the amount of data)? It is about 401G.O and it seems that Google Collab can not handle this amount of data.
you can use the streaming mode:
load_dataset("tobiolatunji/afrispeech-200", "all", streaming=True)or load only one accent, for exp the yoruba accent:
load_dataset("tobiolatunji/afrispeech-200", 'yoruba')