🦺 Hot Topic: What is pseudo-labelling?

Gender-Based Violence Tweet Classification Challenge

Helping Global

2000 Points

Completed (over 4 years ago)

Skills you will learn

Natural Language Processing

Classification

637 joined

140 active

Info Data Chat Leaderboard

Start

Aug 09, 21

Nov 14, 21

Reveal

Nov 14, 21

yukioandre

What is pseudo-labelling?

Help · 9 Nov 2021, 21:06 · 2

I was reading about pseudo-labelling in order to improve my model performance. I found some posts on the internet and I'm still not sure I fully understood it.

At first, I thought it would mean doing something like this: run a model, get the probabilities from the submission set and those probabilities above some threshold - maybe above 95%, for instance - I would aggregate to the training set. So I would have the some datapoints from the submission set and the training data. Is this the correct definition?

I also found that scikit-learn have semi-supervised algorithms which could be used here, such as SelfTraining. But I couldn't find a good tutorial on that. Is SelfTraining related to pseudo-labels? Does anyone have any material on this topic?

Best regards!

Discussion 2 answers

Professor

Your first thought is just about it @yukioandre!

9 Nov 2021, 21:58

Upvotes 0

yukioandre

Nice, thank you so much Professor!

replied to Professor9 Nov 2021, 22:02

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status