In this tutorial, we will be exploring Gaussian Mixture Model (GMM) using scikit-learn, also known as sklearn, which is a popular machine learning library in Python. GMM is a probabilistic clustering algorithm that assumes the data is generated from a mixture of several Gaussian distributions with unknown parameters. It is a powerful tool for clustering data when the underlying distributions are not clearly separated.
Before we begin, make sure you have scikit-learn installed. You can install it using pip:
pip install scikit-learn
Now, let’s dive into the tutorial:
- Import the necessary libraries:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
- Generate some synthetic data for clustering:
np.random.seed(0)
n_samples = 1000
# Generate random samples with two features
X1 = np.random.randn(n_samples, 2) + np.array([2, 2])
X2 = np.random.randn(n_samples, 2) + np.array([-2, -2])
X = np.vstack((X1, X2))
- Initialize and fit the GMM model to the data:
gmm = mixture.GaussianMixture(n_components=2)
gmm.fit(X)
- Predict the cluster labels for the data:
labels = gmm.predict(X)
- Visualize the clustering results:
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title('GMM Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar()
plt.show()
- Plot the decision boundary of the GMM model:
h = .02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = gmm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z, cmap='viridis', alpha=0.2)
plt.show()
- Evaluate the GMM model using the AIC and BIC criteria:
print("AIC: ", gmm.aic(X))
print("BIC: ", gmm.bic(X))
- Finally, you can experiment with different values of
n_components
to see how the number of clusters affects the clustering results. You can also try different covariance types (e.g., full, tied, diag, or spherical) to see how they impact the clustering performance.
That’s it for this tutorial on Gaussian Mixture Model using scikit-learn. I hope you found this tutorial helpful and informative. Remember to always experiment with different parameters and data to get a better understanding of how GMM works. Happy clustering!
Sir Ya Complete Playlist Ha Scikit Learn Ki. Please Reply Zaroor Dena
Awesome lecture sir ❤❤❤❤❤❤ thanks
more videos
Thankiuu😊😊😊😊 Sir
Wowwww Thanku Sir ek din mai 2 Lecture ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤