Primary competition visual

Lacuna Masakhane Parts of Speech Classification Challenge

Helping Africa
$7 000 USD
Completed (over 2 years ago)
Classification
Natural Language Processing
472 joined
101 active
Starti
Jun 08, 23
Closei
Sep 17, 23
Reveali
Sep 17, 23
Using unlabelled news dataset for Luo and Tsn for language adaption part of language and task adapters
Data · 22 Jul 2023, 19:08 · 4

Hi,

The maskhne paper at https://arxiv.org/pdf/2305.13989.pdf discusses various mthodologies like FT Eval, MAD-X and LT-SFT. MAD-X and LT-SFT requires dataset in source and target langauges. The data source in target language (luo and tsn) can be labelled so that we can train the model for thelanguage part. The notebook does not contain unlabelled news dataset for lup and tsn languages. Can we download unlabelled news dataset for luo and tsn language and use it to train langauge part of MAD-X /LT-SFT.

Discussion 4 answers

the rule said that we can unfortunately use only the dataset they provided.

"You may use only the datasets provided for this competition."

22 Jul 2023, 19:19
Upvotes 1

Hi, that's a great question. Yes, you can use unlabelled data for the target languages. We have updated the readme of MasakhaPOS Github with the link to some monolingual data used for annotation. You are not restricted to this, please, feel free to use other monolingual data that you may find online.

24 Jul 2023, 08:09
Upvotes 2
User avatar
Krishna_Priya

@zindi, @amy can someone from the Zindi team confirm whether we are allowed to use data other than the 18 language folders?

26 Jul 2023, 18:01
Upvotes 1

Hello "Zindi Team"

According to what @Krishna_Priya said, We are all wating for your confirmation.

2 Aug 2023, 09:58
Upvotes 0