Swahili News Classification
Can you create a classification algorithm to identify Swahili news articles by category?
Prize
Knowledge
Time
Active
Participants
84 active · 507 enrolled
Helping
Tanzania
Classification
Media
accuracy
Data · 25 Sep 2020, 18:09 · 3

i am getting decent accuracy(83%) within the training set, but the moment the model is exposed to the test data, things go south(as evidenced by my rank!)..from gridseachcv the XGBClassifier gives the highest score. any pointers to the right direction?..code snipets maybe?

Discussion 3 answers

I recommend you try the cross-validation techniques to avoid overfitting.

26 Sep 2020, 15:27
Upvotes 0

how many folds would lead to realistic prediction accuracy?

You cant know the exact number of folds, you have to try different numbers, I recommend you try to use 5 to 10 folds.