Aug 14, 2021
Hi Sabaybiometzger,
Yes, I agree that using pd.get_dummies() before the train test split will solve the column mismatch issue. But this may cause issues for future unseen points, as the characteristics of encoding are not saved.
Also, it's not a good practice to perform feature engineering before train test split, as it may cause data leakage.
Thanks