I wanted to raise a concern: since the goal is to build models that rely on Indigenous Ecological Indicators, how do we handle the fact that over 90% of the indigenous ecological indicator values are missing? My worry is that if we impute them, the model might end up learning patterns from synthetic data rather than from real indicators, which could make the results non-representative of indigenous knowlegde on ecological indicators.
Hi datawhiz, the test and train are split equally regarding the indicators. Some farmers do not update their app so the indicators are missing. Try use them where and when you can.
sometimes missingness can produce some signal if you treat it right, you can look for patterns of the missing data and add indicators to make the model capture the signal from them, according to my cv indeed I found this helpful.