Primary competition visual

Gender-Based Violence Tweet Classification Challenge

Helping Global
2000 Points
Challenge completed almost 4 years ago
Natural Language Processing
Classification
634 joined
140 active
Starti
Aug 09, 21
Closei
Nov 14, 21
Reveali
Nov 14, 21
What is pseudo-labelling?
Help · 9 Nov 2021, 21:06 · 2

I was reading about pseudo-labelling in order to improve my model performance. I found some posts on the internet and I'm still not sure I fully understood it.

At first, I thought it would mean doing something like this: run a model, get the probabilities from the submission set and those probabilities above some threshold - maybe above 95%, for instance - I would aggregate to the training set. So I would have the some datapoints from the submission set and the training data. Is this the correct definition?

I also found that scikit-learn have semi-supervised algorithms which could be used here, such as SelfTraining. But I couldn't find a good tutorial on that. Is SelfTraining related to pseudo-labels? Does anyone have any material on this topic?

Best regards!

Discussion 2 answers
User avatar
Professor

Your first thought is just about it @yukioandre!

9 Nov 2021, 21:58
Upvotes 0

Nice, thank you so much Professor!