Primary competition visual

AirQo African Air Quality Prediction Challenge

$3 000 USD
Completed (over 1 year ago)
Prediction
1029 joined
514 active
Starti
Mar 15, 24
Closei
Jun 16, 24
Reveali
Jun 16, 24
User avatar
yanteixeira
Is test harder or easier than train?
Help · 25 May 2024, 20:56 · 2

I think it is common to assume that the test dataset will be harder than the data our model is trained on. But what if it is the opposite?

If we look at the cities in train, we have:

  • Lagos: With an estimated population of 15 million, it is the most populous urban area in Africa. It is a megacity with the fourth-highest GDP in Africa.
  • Kampala: With an estimated population of 1.6 million, it is the capital and largest city of Uganda. It is one of the fastest-growing cities in Africa.
  • Nairobi: With an estimated population of 4.3 million, it is the capital and largest city of Kenya.
  • Bujumbura: The smallest city in the training dataset, with an estimated population of 1.1 million. It is the economic capital, largest city, and main port of Burundi.

In the test dataset, we also have four cities, but only two are capitals. Accra seems to fit the description of the capitals in the training dataset, but Yaoundé is different. Not only is it not the biggest city in Cameroon, but also most of Yaoundé's economy is centered on the administrative structure of the civil service and the diplomatic services, which differ a lot from the previous cities.

The irony is that Accra and Yaoundé have so little data in the test set that even if the PM2.5 emissions of these two cities are close to the capitals in the training set, it does not matter much.

The real problem seems to be the cities for which we have considerable data in the test set. Kisumu (the 3rd largest city in Kenya) and Gulu (which seems to be a small city in Uganda) are both very different from the capitals we have seen so far.

Any thoughts on this?

Discussion 2 answers
User avatar
marching_learning
Nostalgic Mathematics

I think that test is easier than train in the sense that i think that threre is less outliers in test than in train. Most outliers in the train set are from 2 sites of Lagos. Another think that hints at this is that my CV is always higher than LB

27 May 2024, 17:08
Upvotes 0
User avatar
yanteixeira

my cv is also higher than LB.