Primary competition visual

TechCabal Ewè Audio Translation Challenge

$1 000 USD
Challenge completed ~1 year ago
Classification
Automatic Speech Recognition
267 joined
80 active
Starti
Aug 26, 24
Closei
Sep 29, 24
Reveali
Oct 10, 24
User avatar
Amy_Bray
Zindi
Resource Restrictions
Platform · 28 Aug 2024, 16:16 · 8

This solution needs to be deployed on edge devices. This means we have some interesting resource restrictions to ensure this model is usable.

  • You may only use 1 CPU such as the ARM Cortex-A53 or similar.
  • No GPU or TPU support is allowed
  • Your maximum model size needs to be 10MB or less.
  • You can only train for a maximum of 6 hours.
  • Inference time needs to be 2 minutes or less. This is to simulate a ~50ms inference time which is needed for real-time edge applications.
  • You are not allowed to use pretrained models.

We can't wait to see you on the leaderboard!

Discussion 8 answers

I don't understand the language; it's too difficult for me to label audio data. How did you label the data to determine if the first audio corresponds to a hello or not.

28 Aug 2024, 16:26
Upvotes 0
User avatar
Origin

Hi @balla I don't think that you have to do it by yourself. The data is already labeled for you. When you load the train data you have the label in the class column. But in Test df there is no class column. And it makes sense. This is what you want to predict.

30 Aug 2024, 21:12
Upvotes 0

please write clearer. CPU is only for inference right ? We can train by GPU right ?. What do you mean not allowing to use pretrained models ? Can I fine-tuned, say, a public huggingface model or Whisper from OpenAI ?

31 Aug 2024, 15:13
Upvotes 1
User avatar
AkashPB

For this statement - "Inference time needs to be 2 minutes or less. This is to simulate a ~50ms inference time which is needed for real-time edge applications."

Does it mean that the inference includes feature engineering as well? If so, then does it mean for one audio file, it needs to be less than 2 minutes, or for the entirety of test data?

Or

Does it mean the model predictions should come in less than 2 minutes?

@Amy_Bray do help and clarify...

3 Sep 2024, 05:53
Upvotes 0
User avatar
Juliuss
Freelance

Are we allowed to use finetuned models @Amy_Bray?

3 Sep 2024, 08:40
Upvotes 0
User avatar
Amy_Bray
Zindi

Hi everyone,

Thanks for your questions! Let me clarify the resource restrictions for the challenge:

  1. CPU and Training Restrictions:Inference: Yes, the CPU restriction (e.g., ARM Cortex-A53) applies only to the inference stage. This means that during deployment on edge devices, the model should run efficiently on a single CPU with no GPU or TPU support. Training: You are allowed to use GPUs during the training phase. The 6-hour training time limit applies regardless of the hardware used for training.
  2. Pretrained Models:By "not allowing pretrained models," we mean that you cannot use any models that have been pretrained on external datasets. This includes models like those from Hugging Face or OpenAI's Whisper. The goal is to encourage the development of models trained from scratch using the provided dataset only. Fine-tuning of public models is also not allowed for this challenge. All participants must build their models from the ground up.
  3. Inference Time:The statement "Inference time needs to be 2 minutes or less" refers to the total time it takes for the model to make predictions on the entire test dataset (2,000 audio clips) during inference. This is intended to simulate the real-time constraint of a ~50ms inference time per audio clip on edge devices. Therefore, the total inference process should be completed in under 2 minutes.
  4. Feature Engineering:Yes, the inference time includes any feature engineering or preprocessing steps required to make predictions. The goal is to ensure that the entire process, from receiving raw audio to outputting predictions, is optimized for real-time deployment on edge devices.

I hope this clears things up! If you have any more questions, feel free to ask. Good luck to everyone, and we look forward to seeing your solutions on the leaderboard!

3 Sep 2024, 13:55
Upvotes 5
User avatar
Juliuss
Freelance

Thanks @Amy_Bray for the clarification.

Can I use open-source model from HuggingFace for feature extraction and use that feature to train my own model from scratch ?