Tutorial on Gaussian Mixture Model (GMM) using Scikit Learn (Sklearn) by Datacode With Sharad

Posted by


In this tutorial, we will be exploring Gaussian Mixture Model (GMM) using scikit-learn, also known as sklearn, which is a popular machine learning library in Python. GMM is a probabilistic clustering algorithm that assumes the data is generated from a mixture of several Gaussian distributions with unknown parameters. It is a powerful tool for clustering data when the underlying distributions are not clearly separated.

Before we begin, make sure you have scikit-learn installed. You can install it using pip:

pip install scikit-learn

Now, let’s dive into the tutorial:

  1. Import the necessary libraries:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
  1. Generate some synthetic data for clustering:
np.random.seed(0)
n_samples = 1000

# Generate random samples with two features
X1 = np.random.randn(n_samples, 2) + np.array([2, 2])
X2 = np.random.randn(n_samples, 2) + np.array([-2, -2])
X = np.vstack((X1, X2))
  1. Initialize and fit the GMM model to the data:
gmm = mixture.GaussianMixture(n_components=2)
gmm.fit(X)
  1. Predict the cluster labels for the data:
labels = gmm.predict(X)
  1. Visualize the clustering results:
plt.figure(figsize=(10, 6))

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title('GMM Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar()

plt.show()
  1. Plot the decision boundary of the GMM model:
h = .02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = gmm.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z, cmap='viridis', alpha=0.2)
plt.show()
  1. Evaluate the GMM model using the AIC and BIC criteria:
print("AIC: ", gmm.aic(X))
print("BIC: ", gmm.bic(X))
  1. Finally, you can experiment with different values of n_components to see how the number of clusters affects the clustering results. You can also try different covariance types (e.g., full, tied, diag, or spherical) to see how they impact the clustering performance.

That’s it for this tutorial on Gaussian Mixture Model using scikit-learn. I hope you found this tutorial helpful and informative. Remember to always experiment with different parameters and data to get a better understanding of how GMM works. Happy clustering!

0 0 votes
Article Rating

Leave a Reply

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@anasfsd584
2 hours ago

Sir Ya Complete Playlist Ha Scikit Learn Ki. Please Reply Zaroor Dena

@b_.techian_trader
2 hours ago

Awesome lecture sir ❤❤❤❤❤❤ thanks

@luckyshaikh8360
2 hours ago

more videos

@RoheshKumarIT
2 hours ago

Thankiuu😊😊😊😊 Sir

@KumarPrakash-ty7dp
2 hours ago

Wowwww Thanku Sir ek din mai 2 Lecture ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

5
0
Would love your thoughts, please comment.x
()
x