Detecting Cancer using KNN Classification in Python with Python Scikit-learn | Tutorial by Sachin Sirohi

Posted by


In this tutorial, we will learn about KNN (K-Nearest Neighbors) classification for cancer detection using Python and the Scikit-learn library. KNN is a popular machine learning algorithm that can be used for classification and regression tasks. In this tutorial, we will focus on using KNN for binary classification, specifically for detecting cancer.

Step 1: Install the Required Python Libraries
Before we get started with the tutorial, we need to make sure that we have the necessary Python libraries installed. We will be using the following libraries:

  1. NumPy: for numerical computing
  2. Pandas: for data manipulation and analysis
  3. Scikit-learn: for machine learning algorithms

You can install these libraries using the following commands:

pip install numpy
pip install pandas
pip install scikit-learn

Step 2: Load the Cancer Dataset
For this tutorial, we will be using the Breast Cancer Wisconsin dataset, which is available in the Scikit-learn library. The dataset contains features that are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. The task is to classify the samples as either benign or malignant.

Let’s load the dataset and explore its contents:

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
print(data)

Step 3: Preprocess the Data
Before we can use the dataset for classification, we need to preprocess it. First, we will split the dataset into features and labels:

X = data.data
y = data.target

Next, we will split the data into training and testing sets:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Train the KNN Classifier
Now that we have preprocessed the data, we can train the KNN classifier on the training set:

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

Step 5: Make Predictions
Once the KNN classifier has been trained, we can use it to make predictions on the test set:

y_pred = knn.predict(X_test)

Step 6: Evaluate the Model
Finally, we can evaluate the performance of the KNN classifier using metrics such as accuracy, precision, recall, and F1-score:

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy: {:.2f}%".format(accuracy * 100))
print("Precision: {:.2f}".format(precision))
print("Recall: {:.2f}".format(recall))
print("F1 Score: {:.2f}".format(f1))

Step 7: Conclusion
In this tutorial, we learned how to use the KNN algorithm for cancer detection in Python using the Scikit-learn library. We loaded the Breast Cancer Wisconsin dataset, preprocessed the data, trained the KNN classifier, made predictions, and evaluated the model’s performance using classification metrics. KNN is a simple but effective algorithm for classification tasks and can be applied to a wide range of applications, including cancer detection. Experiment with different hyperparameters and try different datasets to further explore the capabilities of the KNN algorithm.

0 0 votes
Article Rating
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@AnujKumar-xr8wy
1 month ago

Great vedio sir 🎉

@Gurukul-sm
1 month ago

Sir , power bi and ms Excel taught fully course