🛡️ Data Talk: Small Data, Big Problems?

Yisakberhanu · Thu May 16 2024 09:04:57 GMT+0000 (Coordinated Universal Time)

Small Data, Big Problems? Limited training data can make models overfit, meaning they perform well on the competition set but struggle in the real world. One outlier in the test data could determine the winner - not ideal! Validation! Validating models with small datasets is tough. Test data is covered different locations or scenarios compared to training data, making it hard to assess how well a model generalizes. Especially with RMSE as the metric, outliers can have a big impact.

Zindi

Compete Jobs Learn Chat Leaderboard

For Business Partners Meet the team Press Case studies AI4EAC

AirQo African Air Quality Prediction Challenge

$3 000 USD

Completed (almost 2 years ago)

Skills you will learn

Prediction

1032 joined

514 active

Info Data Chat Leaderboard

Start

Mar 15, 24

Jun 16, 24

Reveal

Jun 16, 24

Yisakberhanu

wachemo university

Small Data, Big Problems?

Data · 16 May 2024, 09:04 · 4

Small Data, Big Problems? Limited training data can make models overfit, meaning they perform well on the competition set but struggle in the real world. One outlier in the test data could determine the winner - not ideal!
Validation! Validating models with small datasets is tough. Test data is covered different locations or scenarios compared to training data, making it hard to assess how well a model generalizes. Especially with RMSE as the metric, outliers can have a big impact.

Discussion 4 answers

marching_learning

Nostalgic Mathematics

Yes I call for the authors to change the metrics to MAE. Because as it stands, the winner will be the model with the luck of best performing on outliers.

16 May 2024, 09:18

Upvotes 1