📚 Hot Topic: Using unlabelled news dataset ...

Lacuna Masakhane Parts of Speech Classification Challenge

Helping Africa

$7 000 USD

Completed (almost 3 years ago)

Skills you will learn

Classification

Natural Language Processing

470 joined

100 active

Info Data Chat Leaderboard

Start

Jun 08, 23

Sep 17, 23

Reveal

Sep 17, 23

rsadaph1

Using unlabelled news dataset for Luo and Tsn for language adaption part of language and task adapters

Data · 22 Jul 2023, 19:08 · 4

Hi,

The maskhne paper at https://arxiv.org/pdf/2305.13989.pdf discusses various mthodologies like FT Eval, MAD-X and LT-SFT. MAD-X and LT-SFT requires dataset in source and target langauges. The data source in target language (luo and tsn) can be labelled so that we can train the model for thelanguage part. The notebook does not contain unlabelled news dataset for lup and tsn languages. Can we download unlabelled news dataset for luo and tsn language and use it to train langauge part of MAD-X /LT-SFT.

Discussion 4 answers

Shiro

the rule said that we can unfortunately use only the dataset they provided.

"You may use only the datasets provided for this competition."

22 Jul 2023, 19:19

Upvotes 1

didelani

Hi, that's a great question. Yes, you can use unlabelled data for the target languages. We have updated the readme of MasakhaPOS Github with the link to some monolingual data used for annotation. You are not restricted to this, please, feel free to use other monolingual data that you may find online.

24 Jul 2023, 08:09

Upvotes 2

Krishna_Priya

@zindi, @amy can someone from the Zindi team confirm whether we are allowed to use data other than the 18 language folders?

26 Jul 2023, 18:01

Upvotes 1

kenyor

Hello "Zindi Team"

According to what @Krishna_Priya said, We are all wating for your confirmation.

2 Aug 2023, 09:58

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status