We agree to split the prize equally if we would win, no matter if it is the overall 1st place or the special prize (or both - not sure about the rules).
I'm a dude, but I wonder, what would Zindi do if I say I identify as a transgender female? Would they discriminate against a minority or would they give me the money lmao.
Ha ha, you could ask her to create an account (you can't do it for her because it breaks the rules). Then merge with her => you are eligible to win the female-identified prize.
I dont think this is right - I mentioned these antics to my wife and she joked that maybe she should join me for the same reason, but it was just a joke ... the score you have now is the same you had before without your new team mate which means she did not contribute at all. Why don't you actually collaborate and the two of you try to come up with a better model rather than just trying to beat the system. I don't want to be the police here but what about those others who actually attempted to solve this problem? This maneouvre, even if allowed, seems to penalise those other contestants. Even if you just tried to do something together rather than just repeat your work in the new team it would be better. Congrats on your first position and I am sure you worked hard to obtain it, but you seem like a reasonable person and I hope it will make sense to you as well that, if you form a team, at least then compete as a team and don't just form a team because of the impact it has on the allocation of prize.
When I looked for a teammate I also looked for someone who could work together with me. My teammate profile seems to be good. She said she could spend 3 hours / day on this competition. I already shared my code and made a summary of my approach. However she prefered to work on her own solution and I don't mind about that. After teamming up she has been quite silent. She seems to be a data scientist so I don't want to keep an eye on her work - I have no idea if she is working or not.
In term of prize, I noticed that we were unable to get 2 prizes at the same time. In other words, if we would be on top, we would not get the femaled-identified prize. So there should not be a problem of penalise the others. Of course, if there is a shakeup or other teams pass us then it is different.
Thanks! I'm working hard on passing you - not successful yet!
I would love to see summary of your approach but of course only afterwards, especially since it feels like I've tried everything and still not even beating all 0s. Maybe your team mate is suffering from the same sickness.
I look at your score (of course, it is #1) but think if you have a working model this long before the end you have plenty of time to improve it and that you will submit better ones soon. On the other hand, if you were just 'lucky' it again confirms the idea that this data or challenge is not solvable and that zeros are the best and all else is random.
Anyhow - thanks for feedback and best wishes with the competition.
Luck is always important for any competitions like this. My model is not super great even my score is now on top. The score for each is quite different so I expect a small shake up.
What I feel good that CV and LB is somehow correlated. I am still trying to make my model more robust but it is super hard given the low resolution of the images (both format).
Wow - you are the only one saying that LB and CV are correlated!!!!!
I participate in at least two other discussions where the feeling is that this is pretty random and that LB is not a good reflection and not to be trusted.
I came to this after learning about Zindi at the Nvidia conference. I wanted to try some of the ideas I got from the conference and just picked the first competition - Lacuna at the time. I did not really notice the prize but by now it seems relatively generous - many other competitions give you only a prestigious certificate. However, it really pales in comparison to the difficulty of this problem.
At the time I just wanted to just explore a bit in a real world problem. That was before the extra data was added and given that the auxiliary data (then) was a bit suspect and the quality dubious, I used only Q1 data from train and ended up with something like 400 observations. This against the test set of 1600 and the LB of ???? and this being a type of labour of love so that I made the *fundamental* mistake of deciding to simply use the LB as my CV set and I've been in the dark ever since. By now I have fallen in so deep into this I forgot which way is up.
FWIW just this discussion, a few sentences at best, has been extremely helpful as has been all the others that I participated in. I do this to learn, and what better way to learn than in a competitive environment. Privately I am learning lot of technical stuff but publicly, here in the discussion boards, I am also learning a lot of practical stuff. This is not one way of course, I also share what I know.
So back to the original topic - you must tell your team mate to come to the table quick, even if just for her own benefit. Being in first place sort of puts the spotlight on you and she is sorely missed.
There is also a random factor during my training pipeline. The correlation is there but not 100% perfect - loss vs CV vs LB.
FYI, I split the Data by Year but the score for each fold varies from 0.25 to 0.40 (the zero-benchmark's score varies from 0.3 to 0.45). I suppose the distribution of the private set play an important role in the shakeup.
@Moto I just created my own local CV set and saw my best model collapse gloriously.
Now testing my 2nd best one. It will probably fail as well but at least I feel great, at least I am able to judge my models for myself and not dependent on LB. I wonder about those models I discarded - maybe I should have kept them around and tested them better ...
I don't think it will matter that much how you split it but do it randomly - if you split on some data property you may end up with samples that differ along the lines of the split. I split the data roughly in 1500 + 500 for CV and for all samples I get MAE of very close to 0.21 for just 0s for both the long (1500) and short (500) ones.
Those numbers you mention (0.25 - 0.40) sound funny - maybe we should use the year as a variable in the model as well as if you get them if you split on year then maybe it can explain something as well.
@moto ok ... at some stage I built these huge CNNs and added the year to the dense layer but I never got those to really solve this problem, but it was nice to use a concatenate layer to combine the convolutions and some metadata nonetheless.
My new setup with CV samples is working really well, although I have nothing to show for it yet, except for a bunch of discarded models. FWIW my best models used GRU and worked extremely well in-sample, with preciously few nodes, so I did not expect overfitting at all. These models, however, collapsed badly in the CV tests.
I am still testing a few architectures because it worked so well in sample, but mostly trying a new idea. This new one runs forever, but if it works ... will know in a day or so.
Well - the dust settled (and I see the mails have been sent) and you are still #1 by a wide margin.
Congratulations - this is a good way to settle with the private and public LBs not too different in the top few spots and, as you expected, quite a shake up further down.
At the same time the scores all went worse and the difference between them remain very thin, so it seems nobody really managed to get a firm grip on this problem, except perhaps you given your margin on both LBs. Also, it was nice to see you even improving that in the last bit of the competition FWIW.
Yeah - a week or so before the end I made a bit of a breakthrough and started beating the all 0 solution and just kept running with that until I reached #13 on public LB and sort of exhausted that particular technique. With benefit of hindsight I just overfitted LB and I remain perplexed as to how to approach this better.
But thanks - this was my 1st competition and the overall experience was great and my interactions with you was a real big part of that.
After all the hours and ideas I still have this nagging urge to keep on working on this but now I need to move on. But hey, did you note those Radiant Earth competitions! I think I can apply some notebooks almost as is to it. Of course, if you enter any of those I'll be bound to pick the other one just to improve my chances!
I would like to ask what exactly are we predicting with our models , GPS coordinates or displacement vectors?
from the the data page it says were giving a set of images and a list of its corresponding displacements xy. So my second question is are we to apply some kind of math function to compute the the needed x,y vectors the model is to learn from??
or we are to train on the raw x,y and perform the needed math function on test predictions???
@ZzyZx hi - I saw you ask this elsewhere as well if I remember correctly. The questions you ask you should be able to infer from the instructions and if there are any peculiarities the organisers should point this out, so I am hesitant to respond here. However, I suspect there is a more general need for guidance and will start a new discussion to address some of that. So keep an eye open for that discussion.
I would like to join your team
Many thanks for your interest. I already found a teammate.
My man be playin the odds 👏👏👏
We agree to split the prize equally if we would win, no matter if it is the overall 1st place or the special prize (or both - not sure about the rules).
I'm a dude, but I wonder, what would Zindi do if I say I identify as a transgender female? Would they discriminate against a minority or would they give me the money lmao.
Ha ha, what written in your ID card (or passport)?
A guy, but I can offer my sister though
Ha ha, you could ask her to create an account (you can't do it for her because it breaks the rules). Then merge with her => you are eligible to win the female-identified prize.
I'm still struggling to break into the top 10 lol though hahaha
That's I can't help LOL. Don't try to break the rules ;-) Note that Zindi could check the submissions correlation as well.
yeah yeah of course, I am totally joking lol, I was not gonna do that, and good work man! Keep it up
Thanks. You too. Keep working.
I dont think this is right - I mentioned these antics to my wife and she joked that maybe she should join me for the same reason, but it was just a joke ... the score you have now is the same you had before without your new team mate which means she did not contribute at all. Why don't you actually collaborate and the two of you try to come up with a better model rather than just trying to beat the system. I don't want to be the police here but what about those others who actually attempted to solve this problem? This maneouvre, even if allowed, seems to penalise those other contestants. Even if you just tried to do something together rather than just repeat your work in the new team it would be better. Congrats on your first position and I am sure you worked hard to obtain it, but you seem like a reasonable person and I hope it will make sense to you as well that, if you form a team, at least then compete as a team and don't just form a team because of the impact it has on the allocation of prize.
Hello, good remarks.
When I looked for a teammate I also looked for someone who could work together with me. My teammate profile seems to be good. She said she could spend 3 hours / day on this competition. I already shared my code and made a summary of my approach. However she prefered to work on her own solution and I don't mind about that. After teamming up she has been quite silent. She seems to be a data scientist so I don't want to keep an eye on her work - I have no idea if she is working or not.
In term of prize, I noticed that we were unable to get 2 prizes at the same time. In other words, if we would be on top, we would not get the femaled-identified prize. So there should not be a problem of penalise the others. Of course, if there is a shakeup or other teams pass us then it is different.
Thanks! I'm working hard on passing you - not successful yet!
I would love to see summary of your approach but of course only afterwards, especially since it feels like I've tried everything and still not even beating all 0s. Maybe your team mate is suffering from the same sickness.
I look at your score (of course, it is #1) but think if you have a working model this long before the end you have plenty of time to improve it and that you will submit better ones soon. On the other hand, if you were just 'lucky' it again confirms the idea that this data or challenge is not solvable and that zeros are the best and all else is random.
Anyhow - thanks for feedback and best wishes with the competition.
Luck is always important for any competitions like this. My model is not super great even my score is now on top. The score for each is quite different so I expect a small shake up.
What I feel good that CV and LB is somehow correlated. I am still trying to make my model more robust but it is super hard given the low resolution of the images (both format).
Wow - you are the only one saying that LB and CV are correlated!!!!!
I participate in at least two other discussions where the feeling is that this is pretty random and that LB is not a good reflection and not to be trusted.
I came to this after learning about Zindi at the Nvidia conference. I wanted to try some of the ideas I got from the conference and just picked the first competition - Lacuna at the time. I did not really notice the prize but by now it seems relatively generous - many other competitions give you only a prestigious certificate. However, it really pales in comparison to the difficulty of this problem.
At the time I just wanted to just explore a bit in a real world problem. That was before the extra data was added and given that the auxiliary data (then) was a bit suspect and the quality dubious, I used only Q1 data from train and ended up with something like 400 observations. This against the test set of 1600 and the LB of ???? and this being a type of labour of love so that I made the *fundamental* mistake of deciding to simply use the LB as my CV set and I've been in the dark ever since. By now I have fallen in so deep into this I forgot which way is up.
FWIW just this discussion, a few sentences at best, has been extremely helpful as has been all the others that I participated in. I do this to learn, and what better way to learn than in a competitive environment. Privately I am learning lot of technical stuff but publicly, here in the discussion boards, I am also learning a lot of practical stuff. This is not one way of course, I also share what I know.
So back to the original topic - you must tell your team mate to come to the table quick, even if just for her own benefit. Being in first place sort of puts the spotlight on you and she is sorely missed.
There is also a random factor during my training pipeline. The correlation is there but not 100% perfect - loss vs CV vs LB.
FYI, I split the Data by Year but the score for each fold varies from 0.25 to 0.40 (the zero-benchmark's score varies from 0.3 to 0.45). I suppose the distribution of the private set play an important role in the shakeup.
@skaak : After discussing with you, I do think I need to re-split the data. The variance among folds should not be too high.
@Moto I just created my own local CV set and saw my best model collapse gloriously.
Now testing my 2nd best one. It will probably fail as well but at least I feel great, at least I am able to judge my models for myself and not dependent on LB. I wonder about those models I discarded - maybe I should have kept them around and tested them better ...
I don't think it will matter that much how you split it but do it randomly - if you split on some data property you may end up with samples that differ along the lines of the split. I split the data roughly in 1500 + 500 for CV and for all samples I get MAE of very close to 0.21 for just 0s for both the long (1500) and short (500) ones.
Those numbers you mention (0.25 - 0.40) sound funny - maybe we should use the year as a variable in the model as well as if you get them if you split on year then maybe it can explain something as well.
@skaak, sorry to not reply sooner.
I don't find any valuable information from the year, even I use GroupKFold by Year.
If you plot the distribution of distance (from origin to (x,y)), there are some outliers. That might explain the high variance among my folds.
@moto ok ... at some stage I built these huge CNNs and added the year to the dense layer but I never got those to really solve this problem, but it was nice to use a concatenate layer to combine the convolutions and some metadata nonetheless.
My new setup with CV samples is working really well, although I have nothing to show for it yet, except for a bunch of discarded models. FWIW my best models used GRU and worked extremely well in-sample, with preciously few nodes, so I did not expect overfitting at all. These models, however, collapsed badly in the CV tests.
I am still testing a few architectures because it worked so well in sample, but mostly trying a new idea. This new one runs forever, but if it works ... will know in a day or so.
@Moto - finally, some new submissions. Great, hope you can improve - me, I need more hardware!
Oh yes, I am trying to overfit the LB.
> I need more hardware
Good luck to you. I am using both Colab and Kaggle kernel.
Well - the dust settled (and I see the mails have been sent) and you are still #1 by a wide margin.
Congratulations - this is a good way to settle with the private and public LBs not too different in the top few spots and, as you expected, quite a shake up further down.
At the same time the scores all went worse and the difference between them remain very thin, so it seems nobody really managed to get a firm grip on this problem, except perhaps you given your margin on both LBs. Also, it was nice to see you even improving that in the last bit of the competition FWIW.
So again, congratulations.
@skaak: Many thanks for your kind words.
It was a pity that you dropped from the top :-)
But I suppose you learn few good things thanks to the competition.
@Moto
Yeah - a week or so before the end I made a bit of a breakthrough and started beating the all 0 solution and just kept running with that until I reached #13 on public LB and sort of exhausted that particular technique. With benefit of hindsight I just overfitted LB and I remain perplexed as to how to approach this better.
But thanks - this was my 1st competition and the overall experience was great and my interactions with you was a real big part of that.
After all the hours and ideas I still have this nagging urge to keep on working on this but now I need to move on. But hey, did you note those Radiant Earth competitions! I think I can apply some notebooks almost as is to it. Of course, if you enter any of those I'll be bound to pick the other one just to improve my chances!
Ha ha, why don't we merge so we don't need to compete againts each other ?
Hi @skaak and @moto
Since this discussion thread seems active.
I would like to ask what exactly are we predicting with our models , GPS coordinates or displacement vectors?
from the the data page it says were giving a set of images and a list of its corresponding displacements xy. So my second question is are we to apply some kind of math function to compute the the needed x,y vectors the model is to learn from??
or we are to train on the raw x,y and perform the needed math function on test predictions???
next on the evaluation page it says
do we get the field center(x1,y1) by using the formular stated in the starter notebook? or it is considered as (0,0) as stated in the data page?
also was the point constant defined in the starter notebook just random or it is what is to used ?
@ZzyZx hi - I saw you ask this elsewhere as well if I remember correctly. The questions you ask you should be able to infer from the instructions and if there are any peculiarities the organisers should point this out, so I am hesitant to respond here. However, I suspect there is a more general need for guidance and will start a new discussion to address some of that. So keep an eye open for that discussion.
sure thanks @skaak