There are 155 site codes in train and SampleSubmission but they don't have the same exact codes.
In train, there is C1080 that isn't present in SampleSubmission. And there is C5076 that is in SampleSubmission but not in train
Edit: I can see that C5076 exists in the original dataset but only started distributing in September 2019. So it is understandable why it was removed from Zindi's train set as it would only appear in validation.
Should I remove both of them? How did you handle that?