💵 This Week on Zindi: Model implementation

Womxn in Big Data South Africa: Female-Headed Households in South Africa

Helping South Africa

$5 000 USD

Completed (over 6 years ago)

Skills you will learn

Prediction

1166 joined

204 active

Info Data Chat Leaderboard

Start

Nov 25, 19

Feb 23, 20

Reveal

Feb 24, 20

davarix

Model implementation

Help · 11 Feb 2020, 09:27 · 3

Hi all,

I have a quick question regarding the usability of the model we're trying to build. The task is to train and to predict on the Census 2011 data, however, the model presumably would be used to manage policies in between each census event.

Do you know whether the test dataset for the private board contains data from the later Census and should we be aware of time-related leakage of geo data features?

Many thanks

Discussion 3 answers

washier

Hi davarix,

The test set for the private leader board is on the same 2011 census data. In fact, as far as I can see, the train\test split is based on provinces, with 7 provinces in train and the other 2 in test.

11 Feb 2020, 10:00

Upvotes 0

davarix

Yep, I also see that. I just question it from a practical point of view. What's the point to predict on geo split if the goal mentioned in the description is to predict between events. It seems to be a purely ML skills training exercise per se. It is highly unlikely to run census on one part of the country but not the other.

replied to washier11 Feb 2020, 12:10

Upvotes 0

washier

Absolutely agree. A model in this arrangement is not usefull, unless the modeilling process yields some radical new insights. It is however a very interresting dataset, easy to overfit me thinks. I also feel it sheds light on an important issue.

replied to davarix11 Feb 2020, 13:15

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status