K Nearest Neighbour (KNN) Machine Learning Algorithm Tutorial using Python Scikit Learn

Posted by


In this tutorial, we will learn about the K Nearest Neighbors (KNN) algorithm, a popular machine learning algorithm used for classification and regression tasks. We will use the Python programming language and the Scikit-Learn library to implement the KNN algorithm.

K Nearest Neighbors (KNN) is a simple and intuitive algorithm that works by storing all available cases and classifying new cases based on a similarity measure. The algorithm makes predictions by finding the K most similar instances in the training dataset for a given data point and assigns a label or value based on these K nearest neighbors.

KNN is a type of lazy learning, as it does not build a model during training time but instead memorizes the training instances. This makes KNN computationally expensive, especially for large datasets, as it requires calculating the distance between the new data point and all training instances.

Let’s start by installing the required libraries and loading a dataset to demonstrate the KNN algorithm:

Step 1: Install the required libraries.

!pip install numpy
!pip install pandas
!pip install scikit-learn

Step 2: Import the required libraries.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

Step 3: Load and preprocess the dataset.
For this tutorial, we will use the famous Iris dataset, which contains 150 samples of Iris flowers, each with four features (sepal length, sepal width, petal length, petal width) and a target variable (species: setosa, versicolor, virginica).

# Load the Iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Feature scaling.
Before applying the KNN algorithm, it is essential to scale the features to have a mean of 0 and a standard deviation of 1 using the StandardScaler.

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 5: Train and evaluate the KNN algorithm.
Now we can train the KNN classifier using the training data and evaluate its performance on the testing data.

# Initialize the KNN classifier with K=3
knn = KNeighborsClassifier(n_neighbors=3)

# Fit the KNN classifier on the training data
knn.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = knn.predict(X_test)

# Calculate the accuracy of the KNN classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Step 6: Tune the hyperparameters.
The KNN algorithm has a hyperparameter K that represents the number of neighbors to consider when making predictions. We can tune this hyperparameter to improve the performance of the KNN classifier.

# Initialize the KNN classifier with K=5
knn = KNeighborsClassifier(n_neighbors=5)

# Fit the KNN classifier on the training data
knn.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = knn.predict(X_test)

# Calculate the accuracy of the KNN classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

By following these steps, you have successfully implemented the K Nearest Neighbors (KNN) algorithm in Python using the Scikit-Learn library. KNN is a versatile algorithm that can be used for both classification and regression tasks, making it a valuable tool in a data scientist’s toolkit. Happy coding!

0 0 votes
Article Rating
20 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@Cognitive-Programmer
1 month ago

👉 Follow my Learnings and Journey beyond software development in my other YouTube channel 🎦 @ http://bit.ly/CruisingDAKSH

@amrutalondhe1243
1 month ago

Thank you so much sir.
Today I have a presentation on KNN.
Your video saved me today.

@meenav413
1 month ago

Thanks a lot Sir. Difficult concept made aimple

@rubinrodriguez6185
1 month ago

Sir can you please make one video on MFCCs – Mel Frequency Cepstral Coefficients ?

@SANJUKUMARI-vr5nz
1 month ago

Very nice sir

@vikaswalber3809
1 month ago

Crisp and clear explanation.

@magnuswootton6181
1 month ago

Ive got an optimization to this that might work on arduino!!!

@koushikreddy7304
1 month ago

Pls share the document sir

@koushikreddy7304
1 month ago

sir pls share the document u are telling………..pls share the link in the description

@saiprashanth741
1 month ago

Have u have code sir for it.

@shyamalikannangara8665
1 month ago

Thank you sir

@arandomperson4110
1 month ago

knn means something else in singapore

@adiflorense1477
1 month ago

6:11 sir, how to know that new data point is noise or not

@adiflorense1477
1 month ago

sir, how k-nn can detect noise in data?

@gokulsrinath7714
1 month ago

Easily understandable with use case!! Thankyou sir

@rajdeepsil4823
1 month ago

Thanks a lot Sir for the amazingly simple and clear explanation!!

@dr.akashrajak5143
1 month ago

Well explained

@ansamali794
1 month ago

👍🏼👍🏼👍🏼👍🏼

@asliengineer684
1 month ago

it helped me a lot ,thank u sir

@vivekkumar-nw1qv
1 month ago

great video sir clear explanation!!!! thanku