📚 Must-Read: Making use of Language column

Lacuna Masakhane Parts of Speech Classification Challenge

Helping Africa

$7 000 USD

Completed (almost 3 years ago)

Skills you will learn

Classification

Natural Language Processing

470 joined

100 active

Info Data Chat Leaderboard

Start

Jun 08, 23

Sep 17, 23

Reveal

Sep 17, 23

Making use of Language column

Data · 11 Sep 2023, 16:12 · 1

Here's the format of the Test.csv

Id Word Language Pos

Id00qog2f11n_0 Ne luo

Id00qog2f11n_1 otim luo

Id00qog2f11n_2 penj luo

I don't plan on doing it myself, as it is probably a bit more technical than I have time for....

but I was wondering if contestants are allowed to change the code to make use of the language field?

I imagine that this would be a pretty useful bit of info. But there's this instruction:PLEASE DO NOT MAKE ANY CHANGES IN THIS SECTION

which sounds official, but only says 'Please'. Doesn't say you can't.

Discussion 1 answer

JEANMPIA

Hey, according to the competiton description:

"It is important that only one solution be built for both languages as this is a step in creating a solution that can be applied to many different languages, instead of having to create a model for each language."

My understanding is that the model or the model training ideas will then be applied to bigger POS datasets and be infered on more languages it has not seen.

If you intend on doing it by feeding it to a FCNN as input with the text embeddings thats easily doable within 6 days but for new languages there will be a problem as the languages won't have a known representation for the model (not talking about luo and tsn but languages the model will then be used on after the comp).

They also specifically ask us not to do it, I think we should respect that.

12 Sep 2023, 00:25

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status