Primary competition visual

Absa Corporate Client Activity Forecasting Challenge

Helping South Africa
$5 000 USD
Completed (~3 years ago)
Forecast
150 joined
46 active
Starti
Nov 01, 22
Closei
Nov 27, 22
Reveali
Nov 27, 22
Issue with linear model
Data Ā· 8 Nov 2022, 17:37 Ā· 3

I attepted to start off with a linear model and continuously adjust the model. That is start with a modle that guesses the same event for all inputs and continuously refine it. Event 14 was the most prominent event in the training data, so i used this as the starting linear model. However, I obtained a test score of 0. I moved on to the second most frequent event, yet still obtained the same result. I eventually ended up trying all 25 events with the same result.

This would imply that none of the events in the Training data appear in the Test data. This would be quite odd.

I had interpreted the target variable to be the 'event' colomn an this to be categorial in nature (although the categories are represented by numbers). Is this correct?

Is the above interpretation correct. I'm unsure what I am missing here.

Thanks.

Discussion 3 answers

You just have to predct if each user in the test set "does" event 14 on the given date-time pair. So your prediction is a 1 or 0

8 Nov 2022, 20:06
Upvotes 1

Sorry. Replied to my own post in the discussion. That was meant to be here. I can send you the file so you can see what I mean.

@TheNoCodeMovement That is not what I was asking. about 40% of the training data has event = 14. I wanted to use the guess 'event = 14' as an initial model(this would be the best 'random' guess), then upgrade the model to one that select between event = 14 or 30 making the model slightly better. Continuously adding event to hopefully improve the model.

I submitted a file with this event = 14 'guess model'. Score came back as 0. So, I did the same for the other event 'event = x' to see which what the best 'guess model' to start from. All 25 files came back with a score of 0. But that is impossible because that would imply that we have data with target values that are not even in the test data.

I do see some players with a score on the leaderboard. See a bunch of other guys with 0 scores. But I don't get what I'm missing.

9 Nov 2022, 16:17
Upvotes 0