🤖 Hot Topic: @Zindi, Stratified split for p...

@Zindi, Stratified split for public/private?

Data · 9 Aug 2023, 17:02 · 22

Dear @Zindi and @AntonioDeDomenico

Have we done a stratified split after creating the binary target for the public/private leaderboard? or is it a random split?

Discussion 22 answers

yanteixeira

Hello, my friend

Could you please go offline for the next few days so we can catch up with you? xD

Kind regards

9 Aug 2023, 17:16

Upvotes 2

replied to yanteixeira9 Aug 2023, 17:25

lol. xD

Upvotes 0

replied to Krishna_Priya9 Aug 2023, 19:51

Multimedia university of kenya

😂😂

Upvotes 1

AntonioDeDomenico

Hi guys, i have not managed this step. However, i guess it is just a random split.

9 Aug 2023, 19:13

Upvotes 1

Juliuss

Freelance

Hello @AntonioDeDomenico 👋🏽,

Did you confirm this step finally?

replied to AntonioDeDomenico11 Aug 2023, 08:47

Upvotes 0

replied to Juliuss11 Aug 2023, 09:15

Multimedia university of kenya

@JuliusFx how's your cv/lb correlation looking like?

Upvotes 0

Juliuss

Freelance

Hi @Koleshjr. I don't know whether to trust what I am having. Looks odd at 0.7 cv VS 0.72 lb. As you say, we keep building. About 7 more days to explore better solutions

replied to Koleshjr11 Aug 2023, 09:55

Upvotes 1

Okay in that case trusting local CV makes more sense. But if test set 0/1 ratio will be different from train. Then we should brace ourselves for a lot of shuffling in private.

9 Aug 2023, 19:47

Upvotes 1

replied to Krishna_Priya9 Aug 2023, 19:53

Multimedia university of kenya

Wait the last time I checked you had 0.73 cv 0.72 lb does this mean you are at 0.76cv??😲

Upvotes 0

replied to Koleshjr9 Aug 2023, 19:56

no, the CV now is not correlated with LB.

CV: 0.74, LB: 0.75

CV: 0.75, LB: 0.74

That is why the concern.

Upvotes 2

replied to Krishna_Priya9 Aug 2023, 19:58

Multimedia university of kenya

Oh okay but damn that's still super impressive , 9 more days to figure out the trick 😅

Upvotes 0

replied to Krishna_Priya10 Aug 2023, 00:43

Inveniam

There'll definitely be a shake up here. I got scared when I saw the distribution of my best LB . The ratio of 0/1 is nearly 25/75, which seems abnormal for "anomaly detection"-like task. Because you'd expect the opposite (having more 0s)

Upvotes 2

replied to Muhamed_Tuo10 Aug 2023, 03:41

Yes, exactly.

Upvotes 0

Charrada

That is due to the metric f1-score

replied to Muhamed_Tuo10 Aug 2023, 11:52

Upvotes 1

Rakesh_Jarupula

National Institute of Technology Silchar

I am little confused....How you got 25/75 ratio? For me it's balanced.

replied to Muhamed_Tuo11 Aug 2023, 07:00

Upvotes 0

replied to Rakesh_Jarupula11 Aug 2023, 09:25

Multimedia university of kenya

Amazing, and how does that correspond to the LB? @Rakesh_Jarupula i.e cv vs lb?

Upvotes 0

replied to Rakesh_Jarupula11 Aug 2023, 09:34

Inveniam

Yeah for submissions at LB 0.69 (or under), it is balanced. But over 0.70, I tend to lose the balance. ( Don't overfit the LB :) )

I believe it to be same for most of the guys above 0.7

Upvotes 0

replied to Muhamed_Tuo11 Aug 2023, 09:36

Inveniam

Maybe the Top5 can relate to it.

Upvotes 0

replied to Muhamed_Tuo11 Aug 2023, 09:47

Multimedia university of kenya

I can relate to this completely. Subs above 0.70 for me are very random. Very sensitive to the number of folds you use as well, no stability whatsoever, but I believe some of the guys at >0.70 like @Krishna_Priya are not overfitting the lb, 0.73 cv 0.72 lb at first and now 0.74 cv 0.75 lb, thats stable very stable so Maybe there is just one thing that we are missing that he is capitalizing on, anyways we keep building

Upvotes 1

Rakesh_Jarupula

National Institute of Technology Silchar

Actually I was taking about the Train set labels, Not Test set prediction distribution.

Test: Same 25/75

CV is around 0.7, But my LB score is not at all stable. If I make any small change, CV is reasonable but LB score changes drastically.

replied to Koleshjr11 Aug 2023, 10:19

Upvotes 1