I wanted to raise a concern: since the goal is to build models that rely on Indigenous Ecological Indicators, how do we handle the fact that over 90% of the indigenous ecological indicator values are missing? My worry is that if we impute them, the model might end up learning patterns from synthetic data rather than from real indicators, which could make the results non-representative of indigenous knowlegde on ecological indicators.
sometimes missingness can produce some signal if you treat it right, you can look for patterns of the missing data and add indicators to make the model capture the signal from them, according to my cv indeed I found this helpful.