Lacuna - Correct Field Detection Challenge
$10,000 USD
Can you design a method to accurately find field locations?
518 data scientists enrolled, 100 on the leaderboard
26 March—4 July
Ends in 16 days
Looking for female-identified African teammates
published 6 Jun 2021, 12:14
edited 1 minute later

I am looking for teammates in order to increase the chance to get these prizes ;-)

  • 1st place African citizen currently residing in Africa: $2,000 USD.
  • The 1st place female-identified African citizen currently residing in Africa: $2,000 USD.

I would like to join your team

Many thanks for your interest. I already found a teammate.

My man be playin the odds 👏👏👏

We agree to split the prize equally if we would win, no matter if it is the overall 1st place or the special prize (or both - not sure about the rules).

I'm a dude, but I wonder, what would Zindi do if I say I identify as a transgender female? Would they discriminate against a minority or would they give me the money lmao.

Ha ha, what written in your ID card (or passport)?

A guy, but I can offer my sister though

Ha ha, you could ask her to create an account (you can't do it for her because it breaks the rules). Then merge with her => you are eligible to win the female-identified prize.

I'm still struggling to break into the top 10 lol though hahaha

That's I can't help LOL. Don't try to break the rules ;-) Note that Zindi could check the submissions correlation as well.

yeah yeah of course, I am totally joking lol, I was not gonna do that, and good work man! Keep it up

Thanks. You too. Keep working.

I dont think this is right - I mentioned these antics to my wife and she joked that maybe she should join me for the same reason, but it was just a joke ... the score you have now is the same you had before without your new team mate which means she did not contribute at all. Why don't you actually collaborate and the two of you try to come up with a better model rather than just trying to beat the system. I don't want to be the police here but what about those others who actually attempted to solve this problem? This maneouvre, even if allowed, seems to penalise those other contestants. Even if you just tried to do something together rather than just repeat your work in the new team it would be better. Congrats on your first position and I am sure you worked hard to obtain it, but you seem like a reasonable person and I hope it will make sense to you as well that, if you form a team, at least then compete as a team and don't just form a team because of the impact it has on the allocation of prize.

Hello, good remarks.

When I looked for a teammate I also looked for someone who could work together with me. My teammate profile seems to be good. She said she could spend 3 hours / day on this competition. I already shared my code and made a summary of my approach. However she prefered to work on her own solution and I don't mind about that. After teamming up she has been quite silent. She seems to be a data scientist so I don't want to keep an eye on her work - I have no idea if she is working or not.

In term of prize, I noticed that we were unable to get 2 prizes at the same time. In other words, if we would be on top, we would not get the femaled-identified prize. So there should not be a problem of penalise the others. Of course, if there is a shakeup or other teams pass us then it is different.

Thanks! I'm working hard on passing you - not successful yet!

I would love to see summary of your approach but of course only afterwards, especially since it feels like I've tried everything and still not even beating all 0s. Maybe your team mate is suffering from the same sickness.

I look at your score (of course, it is #1) but think if you have a working model this long before the end you have plenty of time to improve it and that you will submit better ones soon. On the other hand, if you were just 'lucky' it again confirms the idea that this data or challenge is not solvable and that zeros are the best and all else is random.

Anyhow - thanks for feedback and best wishes with the competition.

Luck is always important for any competitions like this. My model is not super great even my score is now on top. The score for each is quite different so I expect a small shake up.

What I feel good that CV and LB is somehow correlated. I am still trying to make my model more robust but it is super hard given the low resolution of the images (both format).

Wow - you are the only one saying that LB and CV are correlated!!!!!

I participate in at least two other discussions where the feeling is that this is pretty random and that LB is not a good reflection and not to be trusted.

I came to this after learning about Zindi at the Nvidia conference. I wanted to try some of the ideas I got from the conference and just picked the first competition - Lacuna at the time. I did not really notice the prize but by now it seems relatively generous - many other competitions give you only a prestigious certificate. However, it really pales in comparison to the difficulty of this problem.

At the time I just wanted to just explore a bit in a real world problem. That was before the extra data was added and given that the auxiliary data (then) was a bit suspect and the quality dubious, I used only Q1 data from train and ended up with something like 400 observations. This against the test set of 1600 and the LB of ???? and this being a type of labour of love so that I made the *fundamental* mistake of deciding to simply use the LB as my CV set and I've been in the dark ever since. By now I have fallen in so deep into this I forgot which way is up.

FWIW just this discussion, a few sentences at best, has been extremely helpful as has been all the others that I participated in. I do this to learn, and what better way to learn than in a competitive environment. Privately I am learning lot of technical stuff but publicly, here in the discussion boards, I am also learning a lot of practical stuff. This is not one way of course, I also share what I know.

So back to the original topic - you must tell your team mate to come to the table quick, even if just for her own benefit. Being in first place sort of puts the spotlight on you and she is sorely missed.

There is also a random factor during my training pipeline. The correlation is there but not 100% perfect - loss vs CV vs LB.

FYI, I split the Data by Year but the score for each fold varies from 0.25 to 0.40 (the zero-benchmark's score varies from 0.3 to 0.45). I suppose the distribution of the private set play an important role in the shakeup.

@skaak : After discussing with you, I do think I need to re-split the data. The variance among folds should not be too high.

@Moto I just created my own local CV set and saw my best model collapse gloriously.

Now testing my 2nd best one. It will probably fail as well but at least I feel great, at least I am able to judge my models for myself and not dependent on LB. I wonder about those models I discarded - maybe I should have kept them around and tested them better ...

I don't think it will matter that much how you split it but do it randomly - if you split on some data property you may end up with samples that differ along the lines of the split. I split the data roughly in 1500 + 500 for CV and for all samples I get MAE of very close to 0.21 for just 0s for both the long (1500) and short (500) ones.

Those numbers you mention (0.25 - 0.40) sound funny - maybe we should use the year as a variable in the model as well as if you get them if you split on year then maybe it can explain something as well.

@skaak, sorry to not reply sooner.

I don't find any valuable information from the year, even I use GroupKFold by Year.

If you plot the distribution of distance (from origin to (x,y)), there are some outliers. That might explain the high variance among my folds.

@moto ok ... at some stage I built these huge CNNs and added the year to the dense layer but I never got those to really solve this problem, but it was nice to use a concatenate layer to combine the convolutions and some metadata nonetheless.

My new setup with CV samples is working really well, although I have nothing to show for it yet, except for a bunch of discarded models. FWIW my best models used GRU and worked extremely well in-sample, with preciously few nodes, so I did not expect overfitting at all. These models, however, collapsed badly in the CV tests.

I am still testing a few architectures because it worked so well in sample, but mostly trying a new idea. This new one runs forever, but if it works ... will know in a day or so.

@Moto - finally, some new submissions. Great, hope you can improve - me, I need more hardware!

Oh yes, I am trying to overfit the LB.

> I need more hardware

Good luck to you. I am using both Colab and Kaggle kernel.

Hi @skaak and @moto

Since this discussion thread seems active.

I would like to ask what exactly are we predicting with our models , GPS coordinates or displacement vectors?

from the the data page it says were giving a set of images and a list of its corresponding displacements xy. So my second question is are we to apply some kind of math function to compute the the needed x,y vectors the model is to learn from??

or we are to train on the raw x,y and perform the needed math function on test predictions???

next on the evaluation page it says

For each field you must submit a displacement vector from the field center.

do we get the field center(x1,y1) by using the formular stated in the starter notebook? or it is considered as (0,0) as stated in the data page?

also was the point constant defined in the starter notebook just random or it is what is to used ?

@ZzyZx hi - I saw you ask this elsewhere as well if I remember correctly. The questions you ask you should be able to infer from the instructions and if there are any peculiarities the organisers should point this out, so I am hesitant to respond here. However, I suspect there is a more general need for guidance and will start a new discussion to address some of that. So keep an eye open for that discussion.