When I loaded and worked with WhisperFeatureExtractor and WhisperTokenizer as feature extractor and tokenizer, I saved 3 seconds on my CPU TIME (4 workers) for each sample (overall 3*58 759 = 49 hours)
However, when I worked with the feature extractor and tokenizer included in the WhisperProcessor (without loading the WhisperFeatureExtractor and WhisperTokenizer), I gained 2.G of RAM.
So it's up to you to choose between time and memory.
BN: Whisper only accepts audios shorter than 30s and shorter than 448 labels (tokenized transcripts), so good luck!