Using Python and Scikit-Learn for K-Means Clustering

Posted by


K-Means clustering is a popular algorithm used for grouping similar data points into clusters based on their features. It is a widely used unsupervised machine learning technique that can be used for various applications such as customer segmentation, image segmentation, anomaly detection, and more. In this tutorial, we will learn how to perform K-Means clustering using Python and the Scikit-Learn library.

Step 1: Import the necessary libraries
To start, we need to import the necessary libraries. We will be using NumPy for numerical computations, Matplotlib for visualizations, and Scikit-Learn for implementing the K-Means algorithm.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

Step 2: Generate sample data
Next, we will generate some sample data that we can use for clustering. In this example, we will create synthetic data using NumPy’s random module.

# Generate random data
np.random.seed(0)
X = np.random.rand(100, 2)

Step 3: Initialize and fit the K-Means model
Now, we will initialize a K-Means model with a specified number of clusters and fit it to our data.

# Initialize K-Means model
kmeans = KMeans(n_clusters=3)

# Fit the model to the data
kmeans.fit(X)

Step 4: Get cluster labels and centroids
After fitting the model, we can get the cluster labels assigned to each data point and the centroids of each cluster.

# Get cluster labels
labels = kmeans.labels_

# Get cluster centroids
centroids = kmeans.cluster_centers_

Step 5: Visualize the clusters
Finally, we can visualize the clusters by plotting the data points and the centroids.

# Plot the data points
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
# Plot the centroids
plt.scatter(centroids[:, 0], centroids[:, 1], marker='^', c='red', s=100)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-Means Clustering')
plt.show()

In the visualization above, each data point is colored based on the cluster it belongs to, and the red triangles represent the cluster centroids.

Conclusion
In this tutorial, we learned how to perform K-Means clustering using Python and the Scikit-Learn library. We covered the essential steps involved in clustering, from importing the necessary libraries to visualizing the results. K-Means clustering is a powerful technique that can be used in various domains to identify patterns or group similar data points. I hope this tutorial was helpful, and you feel more confident in implementing K-Means clustering in your own projects. Happy coding!

0 0 votes
Article Rating
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@tanishqrastogi1011
3 months ago

hi, i have a question when we run k means algorithm it randomly initialize the starting points which later becomes cluster centroid but i read somewhere that it sometimes form cluster which minimizes the cost function as usual but there is a possibility that if we run the algorithm again it may form better clusters and may minimizes the cost function more, so in this case what should be approach should i run k means algorithm multiple times if so then how? i hope you will reply soon

@danielaubert3512
3 months ago

👍

@calebmatthews9232
3 months ago

Nice dream house

@craigblackwell500
3 months ago

Great job with the explanation!