Primary competition visual

Expresso Churn Prediction Challenge

Helping Senegal
$1 000 USD
Completed (over 4 years ago)
Classification
Prediction
1378 joined
437 active
Starti
Aug 27, 21
Closei
Nov 28, 21
Reveali
Nov 28, 21
User avatar
Amy_Bray
Zindi
Final leaderboard scoring and placement
Platform · 10 Dec 2021, 11:49 · 32

Congratulations to all on the leaderboard! Winners will be contacted by the end of next week for the next steps. 

We have unfortunately removed a user from the leaderboard, even though they had an interesting approach. They were removed on the basis of the usefulness of the solution to the client and to Zindi.

We recommend for all future challenges that you do not use ID column as a feature. Zindi is interested in solutions that are useful and applicable to real-world problems, and building a model that uses column ID as a feature will never be useful in the real world.

Thank you to all those participants that chose not to use this column ID as a feature, and congratulations to the winners and everyone who took part in this challenge.

The Zindi team

Discussion 32 answers

So even if there were no particular rules about ID feature, even without checking my solution without this feature and even without additionaly contacting me to discuss it you decide just to remove me from LB?

Okay -_-

10 Dec 2021, 12:24
Upvotes 0

I mean, big ensembles or stackings are also not useful in production but for some reason they count as "okay" solutions

Quite unfair

I agree with bbb. You have to note in rules, which part of data that YOU provide, can't be used in competition before comp starts. It is too late to create rules after competition is done. Looks suspisious.

10 Dec 2021, 12:55
Upvotes 0
User avatar
AkashPB

Thanks, Zindi for being fair as always. To be honest usage of ID as a feature would have given a very wrong impression about 'proper feature creation' to many people who are new to this field.

We don't want people out there learning such things from this platform so that in the future they don't use Account ID to reject someone's loan while working in reputable organizations and make a mockery of themselves wherever they are working by telling that Account ID played a pivotal role in model development.

I wholeheartedly agree with Zindi that some coherence with the business logic must be always there so as to make sure models are of some use to the organizers.

Thank you Zindi so much. I express my gratitude for pointing out use of ID as a feature. Such use cases and approach sounds to be very irrational and ridiculous to be honest. This platform is inspiration for many young DS and we should not echo any incorrect approaches. DS is not only about hacking, it is about critical thinking also. Just doing something for getting rank would not benifit in long run. Thanks again!!

10 Dec 2021, 14:41
Upvotes 0

Well, finding this feature exactly was a critical thinking, in my opinion

User avatar
MICADEE
LAHASCOM

@ravinder Yes, I am so much surprised that so called big men (i mean experienced Data scientists) in this Data Science journey could still be debating something like this that's not debatable at all, still arguing on something that virtually every Junior Data Scientist must have known before now. Why using a unique feature like "user_id" as one of the features for modelling. I think we don't need to be told at all. In fact, no need to even press further. Seriously still in shock hearing all these.

You are sounding quite toxic, tbh

1) There are no rules about id feature

2) @zindi does not even contact me to discuss it

3) There were no additional checks of my solution without this feature

4) It's not a million-dollar competition

Competitions are not about "making good production solutions" (and if you think so, I have bad news for you). It's more about education and fun. If you can not prepare data properly as a competition host or if you can not provide complete rules as a platform - it's not a participant headache. To be honest, it seems like @zindi just can't say something like "okay, fine, we complete our rules and also provide prize to 4th place" and it is just easier to ban me as a participant.

I also spent some time competing here, working with data and models and it is just unfair to me. I mean, I not even in leaderboard now.

User avatar
AkashPB

"education and fun" ?? My friend, Zindi had previously provided many good data scientists in Africa and elsewhere and, to be honest, many people look up to Zindi to learn. If you are teaching people to use 'user_id' as a feature, I don't think it is in the best interest of anyone learning. Many young people who may not even be aware that using such features in their real jobs may be detrimental may learn this and suffer in the future.

"Competitions are not about making good production solutions" - Tell this thing to the organizers who trust Zindi with their problem statements and ask for resolutions from them. You are not the first one to get disqualified and I can fairly say with people using such practices in the future, you won't be the last.

Lastly, nobody is toxic to you or anyone over here. Everyone is stating facts. If you go to this link (https://zindi.africa/competitions/expresso-churn-prediction) and read properly you may find this written-

"Zindi is committed to providing solutions of value to our clients and partners. To this end, we reserve the right to disqualify your submission on the grounds of usability or value. This includes but is not limited to the use of data leaks or any other practices that we deem to compromise the inherent value of your solution."

Since we are here, nobody in Zindi is questioning your credibility as a data scientist/problem solver/hacker/whatever, just that you used something that was not in the best interest of the users and organizers, that's why Zindi took the decision. In fact, many people appreciated the other approaches you used as well. What is wrong needs to be pointed out and what's right must be appreciated and that's all we did.

Just take a lesson from it and move ahead as things tend to happen and we grow learning from it.

User avatar
MICADEE
LAHASCOM

Toxic!!!! Nahhh.... Not at all. Why? Though, I was surprised to be hearing this, I must say. Was only talking on what we should all see as norm in this Data Science before now. If I may ask you a question. Have you ever seen any such rule being explained to you before on wether to use feature like "user_id" on Kaggle or any other platforms like this?? I will like to know.

@MICADEE here are some of them (only from Kaggle and only what I can remember):

Kaggle Predicting Red Hat Business Value: "people_id", "group_id"

Mercedes-Benz Greener Manufacturing: "ID"

TalkingData Mobile User Demographics: "rowID"

ASHRAE - Great Energy Predictor III: "building_id"

I also know few local competiotions where ids were also used, but they are in Russian so I'll not provide them

@AkashPB

1) My solution without this feature is almost production-ready (after refactoring)

2) With such a loose rules you can say "he was using second column" or almost everything else. I also don't remember it, but I read rules twice when I found this feature and there were no restrictions about data they had provided. I'm very sorry that I did not take a screenshot, because it seems to me that the text was edited exactly after my submission.

User avatar
AkashPB

1. I did not say anything about your code being production-ready or not. Stop assuming.

2. Your this statement -

"I'm very sorry that I did not take a screenshot, because it seems to me that the text was edited exactly after my submission."

Well, what you are saying doesn't even make sense and now you are saying that text is edited after submission ?? Rules were pretty clear and I have been on this platform for over a year and these are the same set of rules I am seeing for a year. You can check all recently concluded competitions and all of them have the same rules, as far as I know.

1) You saying "Tell this thing to the organizers who trust Zindi with their problem statements and ask for resolutions from them." and my point is about it

2) I'm talking exactly about "or violated the spirit of the competition or the platform in any other way" and "disqualify your submission on the grounds of usability or value"

I want to clarify: I don't remember

All of this seems suspicious to me since they disqualified me. Try to put yourself in my position. I'm very upset about all of this.

User avatar
AkashPB

1. Why tell them after you submitted your final solution with user_id. If that is the norm, I can in the future submit an overfitted solution with data leaks and tell the organizers that - oh sorry that was an honest mistake, kindly remove these features and use it. So your statement makes no sense here as well.

2. It was there right from start .. just scroll down the page, you will find it. It is not our decision. It is Zindi's decision.

User avatar
MICADEE
LAHASCOM

@bbb Thank you for pointing this out. Because it was speculated there that ID must not be used and now that gives us grace to use "user_id" in this competition . Hmm..... I will have to stop here right now to avoid time wasting. We're deliberating too much on this 🤔.

But for clarity, you can check this out:

https://zindi.africa/competitions/indabax-nigeria-2021/discussions/7919frica/competitions/indabax-nigeria-2021/discussions/7919

Good luck to you next time bro.

Peace 👍

Link seems not working

But thank you (even though next time I'll probably compete on another platform)

Peace ✌️

@Micadee.....brother do not waste your time and energy in clarifying what seems to be very trivial and childish thing to me. If we state do not use ID in feature usage for any competition as a rule, it is just like we are giving a football to a kid and asking him explicitely to try hit only in the goal, but any decent kid would have that common sense to try hitting only in the goal, not at random place. We have clarified our opinion once-thats it. Use the same time to make one fruitful submission. Good day!!

User avatar
MICADEE
LAHASCOM

True. Thanks@ravinder. Also, I am tired of explaining things that's expected to be known over and all over again. Mistakes are inevitable even though we should avoid the costly ones as best as we can, but still we should learn to accept it, learn from it as well. No big deal. This is not the end of the world.

Peace 👍.

User avatar
MICADEE
LAHASCOM

@bbb You're welcome.

Just highlight the link and use your browser to open it. Pretty simple. The link is perfectly working.

https://zindi.africa/competitions/indabax-nigeria-2021/discussions/7919frica/competitions/indabax-nigeria-2021/discussions/7919

Inside, you will see all the messages I have been trying to pass across to you. No ill-feeling at all, I don't win any prize here as well. Let's move on.

Peace ✌️

I think that it is extremely unfair to remove a participant from the competition. The purpose of these events is not just to solve business cases of various companies for money. Each participant shares their knowledge, skills and abilities, interesting approaches, not only in order to get a couple of hundred bucks. The programming community loves something new and interesting to talk about. I am more than sure that the bbb solution was not done in excel with one id_column. He spent his time, gave you something useful (after all, you eventually watched his solution), and in the end was disqualified without the right to discuss it with the organizers. I hope Zindi will be loyal to these cases, because exploring similar data processing techniques can refresh people's points on view what information really is. You can also use salted hashes, if you want the participants in the future to be unable to do anything with your id :)

📷

User avatar
AkashPB

https://datahack.analyticsvidhya.com/discussions/amexpert-2021-machine-learning-hackathon/1808/

Look at this before making an assertion. Thanks!

10 Dec 2021, 20:04
Upvotes 0

Ya'll remember Netflix's million dollar competition in which they didn't use the winner's solution because it couldn't be used in production that easily, but they still rewarded them. As long as none of the rules were violated, there's no way a participant should be removed. I don't wanna say it but kaggle is better in this regard because cases like this are considered an exception as participant technically never violated any of the rules.

10 Dec 2021, 20:15
Upvotes 0

@AkashPB, to be honest, you such a passively agressive.

You trying to teach me something, staying in a strong position. Relax, I don't have a prize, I also haven't rating or leaderboard position now. Be kinder. I got nothing except spent time on writing my solution and 3 people talking how bad was my descicion to use this feature.

Thanks for all the people who supported me with kind words. This means a lot for me.

@AkashPB, You can overfit all you want as long as you think it's gonna help you in climbing the private leaderboard. ID column scenario wasn't mentioned in THIS competition's rules, you can't just plug other competition's rules just for the sake of morality or whatever I don't even know. Call it a hack or leak or whatever, it's legit in this case.

User avatar
AkashPB

I am not trying to be what you are saying. I am not the type of person you are thinking. I am just appreciating a decision that Zindi took which many platforms should take but they don't. I don't have any hard feelings for you or anyone over here.

Such data leaks happen everywhere but the truth is many hardworking people who build solutions from a point of view that organizers benefit suffer because of that.

I have suffered a lot because of such usage of IDs by people in hackathons and since I also work in this domain, I have a fair bit of understanding as to what should be done in real life and what should not be.

I won't even mind getting a lower rank with you being pushed back to the third rank and I will be more than happy to see you perform way better than me in other competitions (which you will for sure) and that's how the community grows. But if you intend to teach others to use leakages like this, do you think the purpose of organizing a hackathon where people learn is solved?

Chin up and you are a Champ!

User avatar
AkashPB

I strongly disagree with what you said and you can agree to disagree with me as well but if you see the rules properly, usability plays an important role as well as clearly mentioned Eventually, I am no one to decide that but what I feel is if competition is organized by some organizers, they have some purpose and they want something out of it (eventually the best solution). If the best solution doesn't add any value, what's the point?

Also, I am just pointing this competitions rule only. Check before commenting.

I hate to break this to you but the majority if not 100% of the top solutions are never used in production as is because they involve ensembling or stacking of 10s if not 100s of models. I'd say check out the top solutions of every competition there is and you'll find just that. Thing is to learn from the participant's approach and give credit where it's due, as long as it's within the bounds of the predefined rules. Otherwise, Zindi would only lose credibility since they can cancel anybody without even checking their code or talking to them or even checking how their solution performs after resolving whatever issue they have with the submitted solution.

User avatar
AkashPB

They checked the code btw.

True that. The least they could've done is test the solution after removing that feature since it didn't violate any rule, that's the least this guy deserves.

User avatar
MICADEE
LAHASCOM

@Kirrak Thank you for pointing out that second point of your message as regards using ID is not allowed. So because it was speculated there that ID must not be used and now that gives you grace to use "user_id" in this competition . Wow... Awesome... Hmm..... I understand better now. I will have to stop here right now to avoid time wasting. 🤔.

Check this out:

https://zindi.africa/competitions/indabax-nigeria-2021/discussions/7919frica/competitions/indabax-nigeria-2021/discussions/7919