Good day people,
I would like to pick your beautiful brains on the treatment of the disparity on occupation codes. It appears there are 233 unique values on train dataset and only 187 in test. I have also gone further to make enumerations which revealed 9 appear in test but not in train and 55 in train dataset but not in test. This means we have a total of 64 unique instances not found in both datasets.
is this is with other feature too because
It seems to be only perculiar to occupation codes. With sex it was an issue of case difference which I think is neglible in terms of effect.