💳 Data Talk: Benchmark score

Cryptocurrency Closing Price Prediction

Helping Global

$1 000 USD

Completed (over 4 years ago)

Skills you will learn

Prediction

817 joined

354 active

Info Data Chat Leaderboard

Start

Jun 25, 21

Sep 19, 21

Reveal

Sep 19, 21

doziesixtus

Benchmark score

Help · 27 Aug 2021, 05:01 · 7

Greetings,

I've been on this challenge for a couple of days, getting to weeks, and it has seemed like a rat race in getting a suitable model. This isn't the first data science competition that I'm taking, but this particular one has made me question my practical knowledge of data science. I have gone through the data preprocessing and using the holy grail of machine learning models, xgboost, and my submission score is still very high - with my best score around RMSE: 4648. Which is terrible compared to the scores on the leaderboard.

After using some preprocessing steps like feature correlation, combining numerical features, imputing missing values, aggregation and features scaling, I still get the outrageous high submission score when I train the model with a Random Forest regressor and an Xtreme Gradient Boosting model.

What I'd like to know, that's if I'm not asking for much, is how to obtain the benchmark score of RMSE: 56 that is on the leaderboard. Because at this point, I'm beginning to question every thing that I have learnt on data science. What am I doing wrong? Is it from the preprocessing or from the model? Cos I know, assuming my knowledge are even valid anymore, these are the two major sources of the poor scores.

Discussion 7 answers

thealvinguy

Hi @doziesixtus, to obtain the benchmark score, run the starter_notebook. You will get the benchmark score.

27 Aug 2021, 05:09

Upvotes 0

doziesixtus

@thealvinguy thanks a lot. I just ran the starter notebook now and I got the benchmark score. I guess it would be easier to work from here. Thanks once again.

replied to thealvinguy27 Aug 2021, 06:22

Upvotes 0

ravinder

HI....running the starter notebook as it is gives me 62.....how you got 56?

replied to thealvinguy27 Aug 2021, 11:14

Upvotes 0

thealvinguy

Hi @ravinder, I think randomness is what is causing you to get a slightly off result. Make sure the random state in the train_test_split(random_state=42) is 42 as in the starter notebook. Otherwise, it should give you exactly 56 no matter how many times you run it.

replied to ravinder27 Aug 2021, 12:28

Upvotes 0

skaak

Ferra Solutions

Same over here ...

I've been throwing everything at this. Benchmark was easy, but getting a bit better was near impossible. I've tried every trick I know and also a few new ones. Used Cauchy distro assumption, beta regression, feature engineering ad nausium. These things did improve my score, but by just a tiny amount.

However ....

I think I know how to make a big leap, but too busy finishing up in credit which ends in a few hours. Next week I'll return to crypto and then I hope to nail this. By now, at least, I have a good pipeline and so do you I hope.

So next week I'll continue this discussion ... maybe I'll set up a zoom to discuss this in some detail for anybody interested and who are likewise stuck ... at least I can tell you what does not work ...

27 Aug 2021, 06:39 (edited 1 minute later)

Upvotes 0

aninda_bitm

Look forward to learning from you

replied to skaak27 Aug 2021, 06:58

Upvotes 0

skaak

Ferra Solutions

Thanks, would be great to see you there.

Learning from me ... yeah ... I can only tell what does not work! (He he he he rotfl)

But I do have a few ideas here that I think can help you, especially if you have something working already that you want to improve upon.

replied to aninda_bitm27 Aug 2021, 07:02

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status