Intron AfriSpeech-200 Automatic Speech Recognition Challenge
Can you create an automatic speech recognition (ASR) model for African accents, for use by doctors?
Prize
$5 000 USD
Time
2 months to go
Participants
11 active · 193 enrolled
Advanced
Automatic Speech Recognition
Health
Media
Big Oh Notation: Time vs. memory
Help · 11 Mar 2023, 08:57 · 0

When I loaded and worked with WhisperFeatureExtractor and WhisperTokenizer as feature extractor and tokenizer, I saved 3 seconds on my CPU TIME (4 workers) for each sample (overall 3*58 759 = 49 hours)

However, when I worked with the feature extractor and tokenizer included in the WhisperProcessor (without loading the WhisperFeatureExtractor and WhisperTokenizer), I gained 2.G of RAM.

So it's up to you to choose between time and memory.

BN: Whisper only accepts audios shorter than 30s and shorter than 448 labels (tokenized transcripts), so good luck!

Discussion 0 answers