Satyam Kumar
Aug 14, 2021

--

Hi Sabaybiometzger,

Yes, I agree that using pd.get_dummies() before the train test split will solve the column mismatch issue. But this may cause issues for future unseen points, as the characteristics of encoding are not saved.

Also, it's not a good practice to perform feature engineering before train test split, as it may cause data leakage.

Thanks

--

--

Responses (1)