Primary competition visual

GeoAI Challege Location Mention Recognition from Social Media by ITU

1 000 CHF
Challenge completed ~2 years ago
Prediction
Natural Language Processing
150 joined
28 active
Starti
Jul 19, 23
Closei
Oct 22, 23
Reveali
Oct 22, 23
User avatar
HungryLearner
Sample Submission VS test tweet_id discrepancy
Data Ā· 5 Sep 2023, 15:46 Ā· 7

Dear Host,

The following important observation need to be addressed!!!

1. The number of available tweet_ids in the sample_submission is 2942 while the number of test tweets is 4066

2. Only 1861, of the 2942 sample_submission tweets exist in the test. The remaining 981 tweets are no where to be found.

3. 2205, out of 4066 test tweets do not take part in the sample_submission. 

It is highly recommended to double check.
Discussion 7 answers
User avatar
HungryLearner

@Zindi, it's 20 days since I made the above comment without any host comment or data update based on my query.

I really don't know why some competitions a simply dumped like this without any form of adequate monitoring of competitors query.

26 Sep 2023, 03:12
Upvotes 1
User avatar
Milind
Mumbai (Data Scientist)

As said in the competition, there can be upto 17 LM per post. But by analysing the data, I can only find 12 LM (['Neighborhood', 'Other locations', 'State', 'County', 'Continent', 'Human-made Point-of-Interest', 'Island', 'Natural Point-of-Interest', 'Road/street', 'City/town', 'District', 'Country'] in training data.

@HungryLearner @Zindi Can you please let me know if I am missing something here?

User avatar
HungryLearner

@Milind, the 17 LM does not refer to the LM types but rather the number of possibilities per sentence/statement.

During my EDA, I found that there is a particular training I D where there is exactly 17 LM annotations.

Hope that clarifies your query.

User avatar
Milind
Mumbai (Data Scientist)

Ohh okay got it. Thanks for the clarrification!

@HungryLearner, I didn't understand what you meant by 'the 17 LM.' It does not refer to the types of language models but rather to the number of possibilities per sentence/statement.

6 Oct 2023, 10:31
Upvotes 0
User avatar
HungryLearner

The number of LM types as mentioned by @Milind is 12.

However, the maximum number of LM to be predicted is 17 per tweet. In fact, it is mentioned that others should be filled with zeros if our model does not find up to 17 LM in a tweet.

That being said, the 17 LM mentioned is not the number of LM types. It is actually the maximum number of LM to be expected in a single tweet.

The necessity for using 17 however can be attributed to a particular tweet ID in the training set where we have 17 LM annotations. These 17 LMs however are just names of different cities or so. But since a list of cities is not a single location but a list, it was annotated as a list of LM with class type "City".

I know this point, but in the training phase, we should define the 17 LM. I found those 12 locations. If we don't have the location for this tweet, we assign the value of 0. You said that the 17 LM is the number of possibilities per sentence/statement.