I standardized and cleaned the raw train/test tables by normalizing categorical values (unifying “N/A / don’t know / doesn’t apply” variants), explicitly encoding missingness (missing-category tokens and numeric missing flags), and applying consistent numeric handling (log/ratio features plus robust centering where appropriate). I then verified train–test schema alignment, audited missingness and distribution shift (including country-level checks), and generated stable cross-validation folds that respect the class imbalance. Finally, I froze a reproducible feature set and produced “model-ready” datasets with consistent columns for training, evaluation, and submission generation.