In this tutorial, we will walk through the process of training a K-Nearest Neighbors (KNN) classifier using the Scikit-learn library in Python. KNN is a simple and intuitive machine learning algorithm that is commonly used for classification tasks.
Before we get started, make sure you have Scikit-learn installed in your Python environment. If not, you can install it using pip:
pip install -U scikit-learn
Now, let’s dive into the implementation:
Step 1: Import the necessary libraries
First, we need to import the required libraries for our implementation:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
Step 2: Load the dataset
For this tutorial, we will use the Iris dataset, which is a popular dataset for classification tasks. You can load the dataset using the following code:
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
Step 3: Split the dataset
Next, we need to split the dataset into training and testing sets. This can be done using the train_test_split function from Scikit-learn:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
In this code snippet, we are splitting the dataset into 80% training data and 20% testing data.
Step 4: Train the KNN classifier
Now, we can train the KNN classifier using the training data. The KNeighborsClassifier class in Scikit-learn can be used to create a KNN classifier:
# Create and train the KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
In this code snippet, we are creating a KNN classifier with k=3 (i.e., 3 nearest neighbors) and training it using the training data.
Step 5: Make predictions
Once the classifier has been trained, we can make predictions on the test data using the predict method:
# Make predictions
y_pred = knn.predict(X_test)
Step 6: Evaluate the model
Finally, we can evaluate the performance of the model by calculating the accuracy score on the test data:
# Calculate the accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
This code snippet calculates the accuracy of the model by comparing the predicted labels with the actual labels in the test data.
And that’s it! You have successfully trained a K-Nearest Neighbors classifier using Scikit-learn. Feel free to experiment with different values of k and other hyperparameters to see how they affect the model’s performance. Happy coding!