Implementing K-Means Clustering in Python from Scratch (with Mathematical Explanation)

Posted by

Alfalfa

–

December 28, 2023

K-Means Clustering From Scratch in Python

K-Means Clustering From Scratch in Python (Mathematical)

K-means clustering is a popular unsupervised learning algorithm used for clustering data points into a specific number of clusters. In this article, we will explore how to implement K-means clustering from scratch in Python, along with the mathematical concepts behind it.

Understanding K-Means Clustering Algorithm

The K-means clustering algorithm works by partitioning a dataset into K clusters, where each cluster is represented by its centroid. The algorithm iteratively assigns data points to the nearest centroid and recalculates the centroids based on the newly assigned data points. This process continues until the centroids no longer change, or until a predefined number of iterations is reached.

Steps of K-Means Clustering

Initialize K centroids randomly.
Assign each data point to the nearest centroid.
Calculate the new centroids based on the data points assigned to each cluster.
Repeat steps 2 and 3 until the centroids do not change significantly or a maximum number of iterations is reached.

Implementing K-Means Clustering in Python

Now, let’s take a look at how we can implement the K-means clustering algorithm from scratch in Python.

“`python
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt

# Generate random data points
np.random.seed(0)
X = np.random.rand(100, 2)

# Define the K-means clustering function
def k_means_clustering(X, K, max_iterations):
# Initialize centroids randomly
centroids = X[np.random.choice(range(len(X)), K, replace=False)]
for _ in range(max_iterations):
# Assign each data point to the nearest centroid
cluster_assignments = np.argmin(np.linalg.norm(X[:, np.newaxis] – centroids, axis=2), axis=1)
# Calculate the new centroids
new_centroids = np.array([X[cluster_assignments == k].mean(axis=0) for k in range(K)])
# Check for convergence
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, cluster_assignments

# Perform K-means clustering with K=3
K = 3
max_iterations = 100
centroids, cluster_assignments = k_means_clustering(X, K, max_iterations)

# Visualize the clusters
plt.scatter(X[:, 0], X[:, 1], c=cluster_assignments)
plt.scatter(centroids[:, 0], centroids[:, 1], c=’red’, marker=’x’)
plt.show()
“`

Conclusion

In this article, we have learned about the K-means clustering algorithm and how to implement it from scratch in Python. This algorithm is widely used in various fields such as data mining, image segmentation, and customer segmentation. Understanding the mathematical concepts and implementing the algorithm from scratch can provide a deeper insight into how it works and how to customize it for specific applications.

Bottle, Clustering, django, explanation, fastapi,, flask, from, implementing, k-means, Keras, Kivy, kmeans, kmeans clustering from scratch, mathematical, numpy, PyQt, PySimpleGUI, python, python clustering, python k-means clustering, python kmeans clustering, python kmeans from scratch, PyTorch, scikit-learn, scratch, TensorFlow, theory, Tkinter, with

Alfalfa

0 0 votes

Article Rating

9 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

@DarwinRCAPNWattersonIII

10 months ago

When did Viktor Krum start teaching Python?! /s

Very informative video btw, thank you so much!!

@bijayamanandhar3890

10 months ago

It's a great tutorial. Beside everything, I just didn't understand why and how it was assumed to have 3 centroids for the example dataset where as you assumed the dataset has no label (unsupervised). Appreciate if you can elaborate. Thanks,

@AlexandLupand

10 months ago

I'm glad I found this tutorial!

@Larzsolice

10 months ago

I take random points from my data as initial centroids, less computations since you only need the set a random integers for indices.

@pitaeata8493

10 months ago

this is great, thank you. it feels good to understand something and be a little closer to understanding machine learning or how to use it properly.

@aravindputtapaka5147

10 months ago

I want a python code to convert handwritten image into plain text with accurate i have tried buti didnt got you can try it and show it me sir and plz respond to this comment bcz i am searching for this very curiosly…

@tcgvsocg1458

10 months ago

really interesting

@user-gk5bz6vd9j

10 months ago

hi i am getting this error can you tell how to solve it

ValueError: 'c' argument has 200 elements, which is inconsistent with 'x' and 'y' with size 100.

@philtoa334

10 months ago

Thx_.

Implementing K-Means Clustering in Python from Scratch (with Mathematical Explanation)

K-Means Clustering From Scratch in Python (Mathematical)

Understanding K-Means Clustering Algorithm

Steps of K-Means Clustering

Implementing K-Means Clustering in Python

Conclusion

Like this:

Recent Posts

Categories

Tags

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Support Vector Machines Demo Session 15: ML Using Python (MLUP-101) 18 Oct 2024 (M08 P02)

Amberlynn Reid’s Livestream Featuring Breakup with Alexis Removed

Leveraging Concurrent Processing with Web Workers in Angular Applications #javascript #typescript #angular #angularjs

Implementing K-Means Clustering in Python from Scratch (with Mathematical Explanation)

K-Means Clustering From Scratch in Python (Mathematical)

Understanding K-Means Clustering Algorithm

Steps of K-Means Clustering

Implementing K-Means Clustering in Python

Conclusion

Share this:

Like this:

Recent Posts

Categories

Tags