Did I understand correctly during the webinar (https://www.youtube.com/watch?v=cc42eKQXySw&t=2076s&ab_channel=ZindiAfrica at around 34:04) that one can take advantage of external data or was this statement made solely regarding feature engineering [...]?
Given the rules, I would say no:
> You may use only the datasets provided for this competition. Automated machine learning tools such as automl are not permitted.
> You may use pretrained models as long as they are openly available to everyone.
However, I wonder if pretrained model are useful here...
Generally, I agree. The competition seems originally to be set up as to not allow external data. However given the statement ~"this information can get you more data in order to do better predictions", I wonder if they changed their mind in this regard and just didn't update the competition texts. I think external data (obviously open/free) would make for a more interesting competition and better models (benefiting both sides) but it should be clearly communicated. As a 'side note' in an (optional) webinar this only adds to my personal confusion. Maybe there will be some official feedback. Let's see.
i want to add some examples to clarify if using additional free data is allowed or not.
cloud removal is a critical part of preprocessing. this quantity for train/val/test and test(competition) datasets are very unsimilar. mean value of about 2 and variance of about 120 for train/val/test and (9.184, 635) for competition dataset.
are we allowed to use data provided by gedi and sentinel 2, if free to use by copyright holder?
Generally, I think the splits weren't selected optimally. This becomes apparent when plotting the locations of the training data and the test (submission) data on a map - and even more striking when the respective surroundings/egetation are compared. Whether external data is allowed or not but this will result in a winning model that is most likely 'adjusted' to fit the out-of-distribution test data, but potentially won't generalize very well.
Here is a discussion created by 8th on leaderboard.
https://zindi.africa/competitions/africa-biomass-challenge/discussions/16335
I don't try that on my model yet. But if that is the case, the AGBD has a very loose relation with data provided by zindi.