That's the core of the task here : Clean the data in a way you can use it. So you're example is completly predictable, and there is worse : a lot of columns with more than 40% missing values, sex have male female and nan, A lot of outliers, inconsistent categories' names...etc
That's the core of the task here : Clean the data in a way you can use it. So you're example is completly predictable, and there is worse : a lot of columns with more than 40% missing values, sex have male female and nan, A lot of outliers, inconsistent categories' names...etc