Primary competition visual

Zimnat Insurance Recommendation Challenge

Helping Zimbabwe
$5 000 USD
Completed (over 5 years ago)
Prediction
Collaborative Filtering
1777 joined
612 active
Starti
Jul 01, 20
Closei
Sep 13, 20
Reveali
Sep 13, 20
Be careful catboost users
Notebooks · 27 Aug 2020, 08:16 · edited 1 minute later · 18

Catboost cannot reproduce the exact result (the metric floats all the time) on the GPU and it really seems to me that on the CPU the same (I checked only 1 time on Vast.ai I ordered Xeon because it takes a very long time to count on i7)

Discussion 18 answers

Да, согласен, какая-то дичь. Я по началу думал, что это проблемы с разбиением, а потом когда все проверил, пришло осознание, что что-то не так с катбустом, ибо лосс каждый раз разный, хз как это поправить, на цпу нереально выучить и то не факт что поможет, сто лет занимает даже 100 итераций....

27 Aug 2020, 08:32
Upvotes 0

Да я перешел на другой алгоритм там все ок

Дело не в катбусте а в специфике GPU - он не может гарантировать порядок вычислений c плавающей запятой, из-за этого есть небольшая вариабильность в результатах.

А, понял, не очень приятно, я бы сказал((

На цпу долго обучаться...

Но спасибо большое за информацию)))

Да спасибо уже подсказали на форуме, но мне кажется он и на CPU нестабилен так как пару раз удалось прогнать на Хеоне на удаленной машине и получил тоже разные скоры, дальше теститьне стал бо очень долго можно состарится в процессе.....может это особенности с мультиклассом и вероятностями

Lightgbm has some instability too (both cpu and gpu) but not too much. LGB much faster than cgb and usually has better accuracy

Could you recommend some great tips on preventing overfitting with LGB?

- cross-validation - attention to every folds - sometimes overall CV improvement is possible if only one fold has big gain - don't trust it. good generalization == gain for most of CV folds.

- high value for min data in leaf - help to kill weak splits

-avoid too precise parameters - 0.8 always better than 0.7744522

Thank you, appreciated! For some reason no matter how I tune LGB in this comp I can't beat my CB score, but I'm going to keep trying

I used stratified by target and kfolds by IDs (each row of ID in one fold) CV schemes, but results were unstable and often overfitting as well. What points should be paid attention to during CV?

It is only a small difference and I don't think Zindi will mind, it does not change the score that much.

See:

https://catboost.ai/docs/features/training-on-gpu.html

for more details.

27 Aug 2020, 16:25
Upvotes 0

Difference in score in my case was 20#people in rank table....so it would be sad when Zindi would check the code and result won't reproduce.....

hmm, it is unlikely the GPU non-determinism can cause such a difference, in my case a small difference can be in the 5th or 6th digit. So in short it shouldn't affect the your rank at all

I agree, have you fixed your random seed?

of course i now about random seed, difference in the 4th or 5th digit and in that competiton give diferences about 20#people in rank table.

anyway i choose another library which more stable and faster on cpu for that task, now its ok my result is reproduce