Top Writer in AI | 4x Top 1000 Writer on Medium | Connect: | Unlimited Reads:

Essential guide to detect and handle multicollinearity in the dataset

Image by Gerd Altmann from Pixabay

Exploratory data analysis and statistical analysis are important components of a data science model development pipeline to generate insights about the data. Before fitting a machine learning model, a data scientist needs to perform various feature engineering and data preprocessing techniques to train a robust model. …

Deep dive analysis on predicting coupon redemption status to develop more precise and targeted coupons and marketing strategies

Image by George Dolgikh from Pixabay

E-commerce companies employ various marketing strategies to sell their products by running campaigns, advertisements, free product distribution, discount marketing, and many more. Coupons are one of the famous marketing strategies that various companies provide to increase their revenue. …

Essential guide to error-correcting output code (ECOC)

Image by Gerd Altmann from Pixabay

Machine learning algorithms such as Logistic Regression, Support Vector Machines, etc can classify binary class datasets, but when it comes to handling multi-class classification data, it fails. Multi-class classification tasks have target class labels with more than 2 cardinalities. …

Essential guide to train and display pipelines, column transformer, and chaining estimators

Image by Erik Stein from Pixabay

A data science model development pipeline involves various components including data injection, data preprocessing, feature engineering, feature scaling, and modeling. A data scientist needs to write the learning and inference code for all the components. …

Speed up your model selection workflow

Image by LTD EHU from Pixabay

Model selection is an essential component for a data science model development pipeline. After performing feature engineering the data scientist needs to choose the model with the best set of hyperparameters that performs best for the training dataset. There are various Auto-ML libraries that automate the model selection component.


Essential guide to Silhouette Analysis

Image by Michal Jarmoluk from Pixabay

Clustering is a machine learning technique that refers to grouping the unlabelled dataset. k-Means is a popular clustering algorithm, that groups or clusters that dataset in such a way that data points in the same cluster are similar to each other whereas data points in different clusters differ a lot.

Satyam Kumar

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store