Hi @Amy_Bray :)
It seems there are some discrepancies in the test data (csv). A significant amount of the tweets are not in gold-random-json , but rather they are training examples in different versions of the dataset. For example, the tweet with id: ID_1001154804658286592 (What is happening to the infrastructure in New England...)is actually a training example with labels as shown here
I haven't looked at the training data but judging from the number of tweets (~76k) it does look like it stretches all the different versions of IDRIS (total is approx. 77.5k) and not just the gold-random-json (20k). May you confirm which dataset we're required to use. I hope you'll also look into the test set as well. I suggest we use the test_unlabeled.jsonl for which we don't have access to the true labels.
they are all there , the 76k is because most of them have nulls
Yeah. My concern is by using all the tweets from all datasets they ended up having some tweets in test.csv which are actually part of the training data and have labels.