🎥 Hot Topic: Sample Submission VS test twee...

GeoAI Challege Location Mention Recognition from Social Media by ITU

1 000 CHF

Completed (over 2 years ago)

Skills you will learn

Prediction

Natural Language Processing

151 joined

28 active

Info Data Chat Leaderboard

Start

Jul 19, 23

Oct 22, 23

Reveal

Oct 22, 23

HungryLearner

Sample Submission VS test tweet_id discrepancy

Data · 5 Sep 2023, 15:46 · 7

Dear Host,

The following important observation need to be addressed!!!

1. The number of available tweet_ids in the sample_submission is 2942 while the number of test tweets is 4066

2. Only 1861, of the 2942 sample_submission tweets exist in the test. The remaining 981 tweets are no where to be found.

3. 2205, out of 4066 test tweets do not take part in the sample_submission. 

It is highly recommended to double check.

Discussion 7 answers

HungryLearner

@Zindi, it's 20 days since I made the above comment without any host comment or data update based on my query.

I really don't know why some competitions a simply dumped like this without any form of adequate monitoring of competitors query.

26 Sep 2023, 03:12

Upvotes 1

Milind

Mumbai (Data Scientist)

As said in the competition, there can be upto 17 LM per post. But by analysing the data, I can only find 12 LM (['Neighborhood', 'Other locations', 'State', 'County', 'Continent', 'Human-made Point-of-Interest', 'Island', 'Natural Point-of-Interest', 'Road/street', 'City/town', 'District', 'Country'] in training data.

@HungryLearner @Zindi Can you please let me know if I am missing something here?

replied to HungryLearner26 Sep 2023, 06:30

Upvotes 0

HungryLearner

@Milind, the 17 LM does not refer to the LM types but rather the number of possibilities per sentence/statement.

During my EDA, I found that there is a particular training I D where there is exactly 17 LM annotations.

Hope that clarifies your query.

replied to Milind26 Sep 2023, 06:35

Upvotes 0

Milind

Mumbai (Data Scientist)

Ohh okay got it. Thanks for the clarrification!

replied to HungryLearner26 Sep 2023, 06:58

Upvotes 0

yessinezghal

Ensi

@HungryLearner, I didn't understand what you meant by 'the 17 LM.' It does not refer to the types of language models but rather to the number of possibilities per sentence/statement.

6 Oct 2023, 10:31

Upvotes 0

HungryLearner

The number of LM types as mentioned by @Milind is 12.

However, the maximum number of LM to be predicted is 17 per tweet. In fact, it is mentioned that others should be filled with zeros if our model does not find up to 17 LM in a tweet.

That being said, the 17 LM mentioned is not the number of LM types. It is actually the maximum number of LM to be expected in a single tweet.

The necessity for using 17 however can be attributed to a particular tweet ID in the training set where we have 17 LM annotations. These 17 LMs however are just names of different cities or so. But since a list of cities is not a single location but a list, it was annotated as a list of LM with class type "City".

replied to yessinezghal6 Oct 2023, 11:00

Upvotes 0

yessinezghal

Ensi

I know this point, but in the training phase, we should define the 17 LM. I found those 12 locations. If we don't have the location for this tweet, we assign the value of 0. You said that the 17 LM is the number of possibilities per sentence/statement.

replied to HungryLearner6 Oct 2023, 13:15

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status