Well - perhaps I was too quick in congratulating the winners, as the "real" winners were recently announced by Zindi.
The "real" winners are quite different from the leaderboard scores. Zindi notified me that I would not be getting a 3rd prize for this one, because my model is not that useful or the code is not readable or some nonsense like that. It is somewhere in the rules apparently. Some of the other competitors who also lost out as a result were told that an ABSA data scientist evaluated the code and chose the best model based on code appearance or something like that.
Whatever ... this is not right.
Sorry Zindi and ABSA, but I don't think you are playing fair.
How can you determine the winners so arbitrarily? If midway through a competition you changed the metric from say RMSE to MAE it would cause an outcry and invalidate a lot of the previous efforts. Of course, you can do that, just as you can do this, but it is still not right. How can you, zindi, simply hand over the selection of winners to an ABSA data scientist and not just use the leaderboard. Do you not have our interests at heart or are you just here for the hosts?
In case you missed it, the cash prize was awarded to #9 on the leaderboard. Sorry #9, I don't hate you or anything, but your model should win on its score and not on ABSA's review. As far as I can tell,
Wow! Here are the private leaderboard scores and what the selected winners scored. Get my drift.
So the person awarded the first prize scored worst by far. Another victim is LB#2 who, like me, fell out of the prize receivers but at least got something because she is a female competitor or something like that. So this evaluation is arbitrary, not based on scores.
This is just a slap in the face - I mean, my code is a real piece of art. There is nothing wrong with it. My model contains some innovative features and I worked very hard on this one - I submitted more than 200 times! My model alone, not data processing, is >1000 lines of dense Python code. The now #3 submitted a whopping 5 times. Not that hard work makes you win, a good score is what should make you win, but a good score requires hard work.
Zindi, you know, we are not your cattle in some weird IP farm. You should also watch out for us, since you are sort of the arbiter in cases like this. How can you let ABSA do this? Is this good practice? And how can ABSA deviate so much from the rules and get away with it? Zindi - you try to give a community driven image, do you realise that this damages that image? Do you realise that actions speak louder than words? Do you even care? I used to be one of your biggest fans and a huge supporter of all things zindi, now I can't help but wonder what on earth am I doing here.
You know, I can't recall exactly now, but if you just subbed the mid of the income bracket for this you'd score around 6k. I mentioned this, in a veiled way, to @SallyL who was looking to improve her score. You could build a model around the mid point e.g. and start with a score of 6k with no effort at all.
You had to work extremely hard to beat that, the mid point of the bracket. Now the winner and 3rd place solutions appear to be around there, so this is also damaging data science fwiw.
I think you should get a legal opinion on this and sue if possible...this is grossly unfair...shifting the goal posts and changing the criteria post competition...unbelievable yet true!!! take it further in the legal domain.
Well ... now that Zindi is a USA company they'll probably send in the drones or something.
This is perhaps the best route, and, you know, there are plenty of things I can think of that will be "juicy" in court, e.g. when you win, you sign away world-wide moral rights to your solution. This would be illegal in many jurisdictions. One can't help but wonder what happens to your winning model down the line ...
But tbh I have no appetite for this, or perhaps just no experience with this. I think the "problem" is not criminality per se, but Zindi's eagerness to satisfy the host rather than be the referee, let alone be the champion of the competitors.
For this to be a nice place to stay Zindi need to be a bit more protective of competitors ... and their rights, so to some extent I can not argue with your legal angle to this.
Zindi need to be a bit more balanced when adjudicating between hosts and competitors. Zindi need to be a bit more careful with e.g. moral rights. Zindi need to be a bit more appreciative of the amount of work going into a winning solution. Zindi need to be a bit more eager in dishing out prizes (we joke in our team that it is easier to escape out of prison than get your prize after you've won, but that is another story) ... I could add a few more.
Court cases don't fix these things ... deep and prolonged collaboration and discussion and interaction fix these things ... I hope ...
Perhaps, in the free market, the best thing to fix these things is competition! Just like the models here are of exceptional quality given the fierce competition, perhaps what Zindi need is a bit of healthy competition.
Anyhow, I just wanted to respond quick but this is actually a deep and fascinating topic (and discussion) - I'd love to hear your comments!
Hi Skaak, that was just an off the cuff response of me and perhaps a bit harsh. I agree with your view that meaningful discussion and interaction can fix this thing. Also perhaps tighter policies and governance procedures to ensure a satisfactory outcome for all parties(i.e., Zinzi, the hosts and the participants) concerned.
Hahaha someone gave them a one-liner regression package using a simple train_predict(). The Absa data scientist was like "Eish", this made only slighly worse accuracy than the leaders, and they finally understood the one line. Meanwhile back at the ranch the data set was almost as bad as the zindi inhouse challenge.
Yes, it is amusing also ... but in a bad way.
Pieterkiiiiiiiiiiiiii!!!!!!!!
Now I remember, we still need a nice zoom call to discuss this competition and our approaches. Pay no attention to Zindi's vile downplay of my excellent work, I have some really innovative features that I'd love to discuss with you. I started out trying my level best to get seq2seq to work here, I mean, I had all the code from the other ABSA comp lying around. To no avail ... I know you've been complaining about the data in this one quite a bit. I'd like to hear you explain that a bit more. Not that I differ from you, but I'd like to hear you explain why.
Anyhow, perhaps in the coming holidays I'll set up said zoom call. Watch this space!
For sure I am still keen. Would be nice to keep in contact.
Also, since Absa didn't pay me for this I guess it's fine to share: https://github.com/Pieter-Cawood/Bank-Customer-Income-Prediction/blob/main/code.ipynb
I forgot you can pad zeros to a recurrent net time-series to make up for missing values so I didn't approach that either. Instead the feature engineering used quantile values of certain transactions. The idea extended using simple mean values; statistically mean values might misinterpret a user's true behaviour because they might have outsiders etc. So using quantiles was to me at least a great idea. I also gave the models features of transaction counts, so that they might learn from the amount of samples that the quantiles are derived of.
And what I meant about the data. There was NO group 21 users in the private test set. I tested it by manually changing that group's predictions and the score stayed the same. So, how could anyone make a nice solution, the RMSE would have significant change for this group on public set because the incomes are R100K+, where the small income groups in private set are < R50K or something. Since we all had close scores, it was up to chance and social preference in the end.
Also this was looked over by Zindi. If they had known they would literally have removed group 21 users because in real life a bank would have that information, not give some random subset of data and expect machine learning to be magic. If this is how Absa does their inner workings I am actually glad I don't bank there haha.
Wow - you share so freely I have to reciprocate.
What helped me a lot was to add the extremes as features at some stage. I think if you just used a time series with max( positive amt ) per calendar month you would have a pretty decent model, so I can believe your quantile approach worked well. I suppose the max positive amount will be a good proxy of net income as is anyhow.
I looked at your code a bit - neat! I like what you do with catboost especially.
I had many other small little things in my features. At some stage I used some bond portfolio metrics, e.g. duration (so weight the transactions by the day of the month at which it takes place) and this helped a little bit. I also later on excluded some transaction "descriptions" and this also improved, so much, in fact, that I tried to randomly select different descriptions and see what it does.
Funny enough, I also spent quite some time to comment this one properly and neatly.
But you are right about ML magic. I attended the intro (virtually) and it seems ABSA does not use data science entirely correctly. It is not some holy grail, you still need good strategy to use this powerful tool productively, otherwise it is like giving a chainsaw to a toddler: powerful but messy. You need some finesse ... you still need to apply your mind ...
For me the dirtiness in the data was in how I believe individuals would declare income. It would be quite dirty in the sense that, unless you give specific instructions and active guidance, two people with the same income and transactions may declare vastly different amounts, as one might add additional incomes, one might calculate net quite differently to the other, one may treat tax differenrly etc. They will round and approximate differently. One of the two might have been unemployed for a few months and perhaps now they are the same, but they give different answers.
So you are modelling human responses and not hard financial data. This is fine for pshycology I suppose, but we are given financial data to model with.
I wonder what a better challenge would have looked like. Use the same data, and model how likely a customer is to require an overdraft or go into the red on the credit card or default on the mortgage or something like that.