🤝 Hot Topic: Automated tools

Ferra Solutions

Automated tools

Help · 25 Jul 2022, 10:39 · 29

It is interesting that Zindi does not like automated tools.

They abound, especially in this space.

What about automated imputers such as datawig?

Or time series forecasting using something like auto-arima?

I assume none of these are allowed.

Discussion 29 answers

welcome back to the land of the living @skaak

25 Jul 2022, 18:48

Upvotes 0

replied to wuuthraad26 Jul 2022, 15:43

Ferra Solutions

dragon!!!!!!! thanks ... yip, eskom's antics thawed my cryogenic sleep a bit, but I want to run away after dropping from 17 to ~30, even the median beating me atm ... good to see you did not slow down @wuuthraad

Upvotes 1

replied to skaak26 Jul 2022, 20:10 (edited less than a minute later)

Hahahaha dude!. Same on my side... I am just overthinking on what model to use for the competition and how to structure the dataset. Do not trust the LB trust your CV scores. So long as those are solid, you need not worry about the public LB

Upvotes 0

replied to wuuthraad28 Jul 2022, 16:34

Ferra Solutions

Structure the dataset ... yip, this more data cleaning than data science comp. CV so tricky here though, dirtry, gappy time series n'all - I just want some fun, so I make it easy (don't tell anybody) but I'm just using the LB atm.

Upvotes 0

replied to wuuthraad30 Jul 2022, 08:38

Ferra Solutions

@wuuthraad by now I've tried every model known to man ... that is perhaps my problem ... always wanted to do something with gompertz curves and this was my chance, so box ticked, but no cigar. I also relied heavily on arma stuff, which I (think I) know well, but I am doing something wrong as I get the strangest .... uhmmmm .... kinks in the arma forecasts.

You not subbed yet? Come on - comp won't bite, hit that button.

Upvotes 0

replied to skaak31 Jul 2022, 05:53

hahaha yeah @skaak I've just been swamped with a ton of stuff that's kind of why I haven't made a sub yet. Have you tried Lazyprophet ? It's what I feel like I'll use for the competition. Like you said earlier this competition seems more like a data cleaning comp but, a strong model always helps.

Upvotes 0

replied to skaak31 Jul 2022, 05:57

keep it simple where you can

Upvotes 1

replied to wuuthraad31 Jul 2022, 14:47

Ferra Solutions

@wuuthraad I've considered using datawig but shunned that and many of the automated tools given Zindi requirements as well in order to keep it simple. However, I think the top guys are perhaps using something like Lazyprophet to get such good scores.

Still, the data is so dirty and gappy I think some powerful tool will just overfit or get lost. At some stage I thought doing the imputation is the way to win this one, but the moment you impute, you add more sales and your sub go up to much (after few subs I have good idea of what average sales should be in a good sub).

So I think if you impute you have to do it in such a way that the row total remains the same.

I've tried and focussed on the decomp models a bit, have made some progress there, but, looking at the graphs, the seasonality is key. The trend and error are relatively small in comparison. This means your t+1 forecast is key, because, in November, you have biggest seasonal variations.

The other stuff e.g. price or discount and such does not help much. The top guys probably figured out how to include them, but they don't correlate well. Also, channels add a bit but not much, more is added by e.g. gender or other stuff.

Upvotes 0

replied to skaak31 Jul 2022, 15:09

I also thought seasonality would be key for forcasting and I feel a powerful model might be able to pick up subtle information, a model like ARIMA in my opinion might be better than lazyprophet or maybe even blending the preditions could help with more solid predictions.... I am just guessing at this point. I have to make my first sub to see if my hunch is correct

Upvotes 0

replied to skaak31 Jul 2022, 15:11

@skaak If I can ask are you planning on using any DNN? or maybe even LSTM? because I have tried using them and long story short, It failed.

Upvotes 1

replied to wuuthraad31 Jul 2022, 15:31

Ferra Solutions

Thanks for the warning!

No, I've not used any NN model yet. I like GRU and thought to use that, even if just to try that also, but my hunch is here it won't work.

I know I boasted that I've used all models - perhaps not really all, just time series based ones. I actually think perhaps one should use some NN based model but, given the state of the data, then it makes sense to have that inside an automated tool (e.g. Lazy) otherwise you'll end up fine tuning forever on this data.

I've used arima models a-plenty - limited success. I was hoping ARIMAX type model might allow me to get some kind of edge as I include the right X (e.g. gender) here. My best models are still some of the simpler ones based on logistic type curves and theta (see next) that only models trend, no X.

There is this one obscure but simple model, called theta model (I see it is in statsmodels). In 2001 it was winner of M3 competition and since then I've used it and luvvvv'd it. Here it is still one of my best models, but this comes after seasonal adjustment, so it does not solve the real problem.

Lazyprophet allows you to pick out multiple cycles - I think that might perhaps help. Of course you'd use 12 here, but perhaps if you find the right other cycle(s) (was wanting to draw spectrogram after looking at Lazy) then you can make a dent in the LB.

Upvotes 1

replied to skaak31 Jul 2022, 15:36

Ferra Solutions

I've used Fourier at some stage - very long ago. Fourier looks real good on paper but in practice I've never really got value out of it. I've also seen too many people selling some share forecasting package that just draws Fourier that I've stopped liking it long ago, but perhaps here one can use Fourier to get multiple cycles.

Upvotes 0

replied to skaak31 Jul 2022, 15:46

Checkout the M5 competition on kaggle, It is fairly recent and all(2 years ago). I saw a medium post on fourier but I feel it's a tad outdated for this competition, ecspecially with the nature of the dataset(large chunks missing). Like I said earlier, I believe a blend of lazyprophet and some other model might be the winning combo

Upvotes 0

replied to skaak31 Jul 2022, 15:48

Using fourier in my opinon would be like using LogisticRegression for any regression task... it's TERRIBLE

Upvotes 1

replied to skaak31 Jul 2022, 15:48

Thanks for the advice and information @skaak

Upvotes 1

Ferra Solutions

replied to wuuthraad1 Aug 2022, 16:01

Thanks also, at least I have few new ideas after this discussion.

Upvotes 1

Ferra Solutions

replied to wuuthraad4 Aug 2022, 07:04

I have real fancy model now, but still far behind on LB, and working hard on this one, so I am starting to gravitate towards the more complex models at the moment ... not yet NN but on my way there ...

Upvotes 1

replied to skaak4 Aug 2022, 08:36

DUDE!! My first submission flopped hard but then again it was a baseline approach with no feature engineering. Bruised my own ego there

Upvotes 0

replied to skaak4 Aug 2022, 08:38

@skaak you are my Obi-wan at this point

Upvotes 0

replied to wuuthraad4 Aug 2022, 09:55

To give you context I just used LazyProphet with the default hyper parameters and nothing else,not feature engineering on the data I just kept it as is and hoped for the best. I just wanted to see how well it performed and surprise surprise, It bombed

Upvotes 0

Ferra Solutions

replied to wuuthraad5 Aug 2022, 09:25

I'm no Obi-wan. I don't have a light saber ... I don't even have a robe :-(

Well, I converted my whole model to GBM based, used kaggle GPU to train and got a nice 28k for my effort (same as your lazy prophet).

Good thing I don't have a saber ...

Upvotes 1

replied to skaak5 Aug 2022, 16:53

Hahaha dude! Modest one aren't you... I'll just have to give it another whack.

Upvotes 0

Ferra Solutions

replied to wuuthraad22 Aug 2022, 05:10

well, long story short, I continued trying every timeseries model known to man ... more or less same result, albeit slowly just getting worse

so I gave the more complex stuff another try and finally got something that outperforms median model

so it seems you have to go more complex, but results are very mixed. the model I had most hopes on e.g. gave me a terrible 25k+ score while the simpler complex (!?) ones I tried just to get the pipeline going actually gave me nice sub20k scores

EDIT: back to OP, the more complex the models are in this space, the more automated they are - I don't think we are in automl territory yet but we are approaching it

EDIT 2 : would be nice if zindi either drop automl requirement or somehow clarify what is acceptible and what not

Upvotes 1

replied to skaak22 Aug 2022, 07:30

Glad to see you're winning . I on the other hand have had a terrible experience with this competition. My models are just the worst... feature engineering is failing. The problem is fairly straightforward and I understand what I need to do but the implementation is beyond me... I love a challenge but this one is a bit too much for me... well "too much" is an overstatement I'd say, I am inexperienced in the field of time-series forcasting. This challenge was just a testament to that. lastly do not overthink things, you've got this!.

Good luck @skaak hopefully you win this competition. I'll just try my luck on other competitions.

Upvotes 1

Ferra Solutions

You know, I was going to do just one sub just for fun ... now in too deep and I'd luv to get good result, but there is much ground to cover

replied to wuuthraad24 Aug 2022, 01:46

... do not overthink things ...

you can't imagine how much this helps .. had to make few big decisions with little time and recalled you shared something - rereading your post did it for me I must say.

Upvotes 1

replied to skaak25 Aug 2022, 22:07

Maybe I am overthinking things

Upvotes 0

replied to skaak29 Aug 2022, 05:52

@skaak I see the major improvement! Huge jump on the private LB!

Upvotes 0

replied to wuuthraad29 Aug 2022, 16:42

Ferra Solutions

Yes, it feels real good to have that, in fact, was real nice comp ending. I was done on Friday, just ensembled and relaxed for the rest of the weekend.

@wuuthraad - thanks for support, sorry, not winning, not even top 20, but I am glad, as I have such a huge ensemble, it will be painful to have to wrap it and hand it over ... thanks also, was nice having this discussion, will stay in touch.

Upvotes 1