Primary competition visual

Cryptocurrency Closing Price Prediction

Helping Global
$1 000 USD
Challenge completed ~4 years ago
Prediction
795 joined
354 active
Starti
Jun 25, 21
Closei
Sep 19, 21
Reveali
Sep 19, 21
Benchmark score
Help · 27 Aug 2021, 05:01 · 7

Greetings,

I've been on this challenge for a couple of days, getting to weeks, and it has seemed like a rat race in getting a suitable model. This isn't the first data science competition that I'm taking, but this particular one has made me question my practical knowledge of data science. I have gone through the data preprocessing and using the holy grail of machine learning models, xgboost, and my submission score is still very high - with my best score around RMSE: 4648. Which is terrible compared to the scores on the leaderboard.

After using some preprocessing steps like feature correlation, combining numerical features, imputing missing values, aggregation and features scaling, I still get the outrageous high submission score when I train the model with a Random Forest regressor and an Xtreme Gradient Boosting model.

What I'd like to know, that's if I'm not asking for much, is how to obtain the benchmark score of RMSE: 56 that is on the leaderboard. Because at this point, I'm beginning to question every thing that I have learnt on data science. What am I doing wrong? Is it from the preprocessing or from the model? Cos I know, assuming my knowledge are even valid anymore, these are the two major sources of the poor scores.

Discussion 7 answers

Hi @doziesixtus, to obtain the benchmark score, run the starter_notebook. You will get the benchmark score.

27 Aug 2021, 05:09
Upvotes 0

@thealvinguy thanks a lot. I just ran the starter notebook now and I got the benchmark score. I guess it would be easier to work from here. Thanks once again.

HI....running the starter notebook as it is gives me 62.....how you got 56?

Hi @ravinder, I think randomness is what is causing you to get a slightly off result. Make sure the random state in the train_test_split(random_state=42) is 42 as in the starter notebook. Otherwise, it should give you exactly 56 no matter how many times you run it.

User avatar
skaak
Ferra Solutions

Same over here ...

I've been throwing everything at this. Benchmark was easy, but getting a bit better was near impossible. I've tried every trick I know and also a few new ones. Used Cauchy distro assumption, beta regression, feature engineering ad nausium. These things did improve my score, but by just a tiny amount.

However ....

I think I know how to make a big leap, but too busy finishing up in credit which ends in a few hours. Next week I'll return to crypto and then I hope to nail this. By now, at least, I have a good pipeline and so do you I hope.

So next week I'll continue this discussion ... maybe I'll set up a zoom to discuss this in some detail for anybody interested and who are likewise stuck ... at least I can tell you what does not work ...

Look forward to learning from you

User avatar
skaak
Ferra Solutions

Thanks, would be great to see you there.

Learning from me ... yeah ... I can only tell what does not work! (He he he he rotfl)

But I do have a few ideas here that I think can help you, especially if you have something working already that you want to improve upon.