🛡️ Data Talk: Is test harder or easier than...

AirQo African Air Quality Prediction Challenge

$3 000 USD

Completed (~2 years ago)

Skills you will learn

Prediction

1032 joined

513 active

Info Data Chat Leaderboard

Start

Mar 15, 24

Jun 16, 24

Reveal

Jun 16, 24

yanteixeira

Is test harder or easier than train?

Help · 25 May 2024, 20:56 · 2

I think it is common to assume that the test dataset will be harder than the data our model is trained on. But what if it is the opposite?

If we look at the cities in train, we have:

Lagos: With an estimated population of 15 million, it is the most populous urban area in Africa. It is a megacity with the fourth-highest GDP in Africa.
Kampala: With an estimated population of 1.6 million, it is the capital and largest city of Uganda. It is one of the fastest-growing cities in Africa.
Nairobi: With an estimated population of 4.3 million, it is the capital and largest city of Kenya.
Bujumbura: The smallest city in the training dataset, with an estimated population of 1.1 million. It is the economic capital, largest city, and main port of Burundi.

In the test dataset, we also have four cities, but only two are capitals. Accra seems to fit the description of the capitals in the training dataset, but Yaoundé is different. Not only is it not the biggest city in Cameroon, but also most of Yaoundé's economy is centered on the administrative structure of the civil service and the diplomatic services, which differ a lot from the previous cities.

The irony is that Accra and Yaoundé have so little data in the test set that even if the PM2.5 emissions of these two cities are close to the capitals in the training set, it does not matter much.

The real problem seems to be the cities for which we have considerable data in the test set. Kisumu (the 3rd largest city in Kenya) and Gulu (which seems to be a small city in Uganda) are both very different from the capitals we have seen so far.

Any thoughts on this?

Discussion 2 answers

marching_learning

Nostalgic Mathematics

I think that test is easier than train in the sense that i think that threre is less outliers in test than in train. Most outliers in the train set are from 2 sites of Lagos. Another think that hints at this is that my CV is always higher than LB

27 May 2024, 17:08

Upvotes 0

yanteixeira

my cv is also higher than LB.

replied to marching_learning27 May 2024, 18:03

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status