🎙️ This Week on Zindi: Huge Train data

Intron AfriSpeech-200 Automatic Speech Recognition Challenge

$5 000 USD

Completed (almost 3 years ago)

Skills you will learn

Automatic Speech Recognition

438 joined

41 active

Info Data Chat Leaderboard

Start

Feb 17, 23

May 28, 23

Reveal

May 28, 23

HackP

National School Of Computer Science (ENSI) - Tunisia

Huge Train data

Data · 18 Apr 2023, 17:30 · 1

Hello everyone, does anyone knows how I can extract only a subset of the train data ( not all the amount of data)? It is about 401G.O and it seems that Google Collab can not handle this amount of data.

Discussion 1 answer

Siwar_NASRI

you can use the streaming mode:

load_dataset("tobiolatunji/afrispeech-200", "all", streaming=True)

or load only one accent, for exp the yoruba accent:

load_dataset("tobiolatunji/afrispeech-200", 'yoruba')

19 Apr 2023, 21:08

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status