Tutorial on Gaussian Mixture Model (GMM) using Scikit Learn (Sklearn) by Datacode With Sharad

Posted by


In this tutorial, we will be exploring Gaussian Mixture Model (GMM) using scikit-learn, also known as sklearn, which is a popular machine learning library in Python. GMM is a probabilistic clustering algorithm that assumes the data is generated from a mixture of several Gaussian distributions with unknown parameters. It is a powerful tool for clustering data when the underlying distributions are not clearly separated.

Before we begin, make sure you have scikit-learn installed. You can install it using pip:

pip install scikit-learn

Now, let’s dive into the tutorial:

  1. Import the necessary libraries:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
  1. Generate some synthetic data for clustering:
np.random.seed(0)
n_samples = 1000

# Generate random samples with two features
X1 = np.random.randn(n_samples, 2) + np.array([2, 2])
X2 = np.random.randn(n_samples, 2) + np.array([-2, -2])
X = np.vstack((X1, X2))
  1. Initialize and fit the GMM model to the data:
gmm = mixture.GaussianMixture(n_components=2)
gmm.fit(X)
  1. Predict the cluster labels for the data:
labels = gmm.predict(X)
  1. Visualize the clustering results:
plt.figure(figsize=(10, 6))

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title('GMM Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar()

plt.show()
  1. Plot the decision boundary of the GMM model:
h = .02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = gmm.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z, cmap='viridis', alpha=0.2)
plt.show()
  1. Evaluate the GMM model using the AIC and BIC criteria:
print("AIC: ", gmm.aic(X))
print("BIC: ", gmm.bic(X))
  1. Finally, you can experiment with different values of n_components to see how the number of clusters affects the clustering results. You can also try different covariance types (e.g., full, tied, diag, or spherical) to see how they impact the clustering performance.

That’s it for this tutorial on Gaussian Mixture Model using scikit-learn. I hope you found this tutorial helpful and informative. Remember to always experiment with different parameters and data to get a better understanding of how GMM works. Happy clustering!

0 0 votes
Article Rating
5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@anasfsd584
1 month ago

Sir Ya Complete Playlist Ha Scikit Learn Ki. Please Reply Zaroor Dena

@b_.techian_trader
1 month ago

Awesome lecture sir ❤❤❤❤❤❤ thanks

@luckyshaikh8360
1 month ago

more videos

@RoheshKumarIT
1 month ago

Thankiuu😊😊😊😊 Sir

@KumarPrakash-ty7dp
1 month ago

Wowwww Thanku Sir ek din mai 2 Lecture ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤