Hi, here's a sample notebook for using the Simple Transformers library for text classification: https://github.com/FaatimahM/Tweet_classification_simpletransformers. This is by no means pro level, I'm relatively new to data science and programming in Python and I literally learnt about using this library when I enrolled in this hackathon. I hope that this will be useful to someone on their data science journey :)
hey @Mansoor.. thanks for sharing this
I'd like to know how did you handle all the unknowns in the dataset, and even violence phrases??
Also I noticed there was more than 1 language in the dataset, did you treat this as well or you fed them into the model?
Hey @ZzyZx using these models I found that applying no pre-processing produced the best results, i.e. feeding in the raw data into the model. The pre-processing which I tried for cleaning the text was removing 'RT', '&', '', '' and '#'. I chose not to remove stop words because I read somewhere that they may contain important info needed by the language models.
Just a general question, can anyone please advise what metrics/methods should be looked at when trying to interpret what a model is actually doing