Python Tutorial for Beginners: Machine Learning Packages Pandas, NumPy, and Scikit-learn [Part 2]

Posted by


In part 1 of this tutorial on ML Packages Pandas, NumPy, and Scikit-learn, we covered the basics of each package and how they are used in machine learning applications. In this part, we will delve deeper into each package and demonstrate some practical examples to give you a better understanding of how to use these packages in your machine learning projects.

Pandas:
Pandas is a powerful data manipulation tool that is used extensively in data analysis and machine learning. It provides data structures like DataFrames and Series, which allow you to easily handle and manipulate data. Here are some common operations you can perform with Pandas:

  1. Creating a DataFrame:
    You can create a DataFrame from a dictionary, a list of lists, or a NumPy array. For example:

    import pandas as pd
    data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
    df = pd.DataFrame(data)

    This will create a DataFrame with two columns ‘A’ and ‘B’ and three rows.

  2. Reading data from files:
    You can read data from CSV, Excel, JSON, or SQL files using Pandas’ built-in functions like pd.read_csv(), pd.read_excel(), pd.read_json(), and pd.read_sql(). For example:

    df = pd.read_csv('data.csv')

    This will read data from a CSV file and store it in a DataFrame.

  3. Data manipulation:
    You can perform various data manipulation operations like filtering, sorting, grouping, merging, and joining on DataFrames. For example:

    
    # Filtering data
    filtered_df = df[df['A'] > 1]

Sorting data

sorted_df = df.sort_values(by=’B’)

Grouping data

grouped_df = df.groupby(‘A’).mean()

Merging data

merged_df = pd.merge(df1, df2, on=’key’)

Joining data

joined_df = df1.join(df2, how=’outer’)


NumPy:
NumPy is a powerful library for numerical computing in Python. It provides support for multidimensional arrays and mathematical functions to operate on these arrays. Here are some common operations you can perform with NumPy:

1. Creating arrays:
You can create NumPy arrays from lists, tuples, or other arrays. For example:

import numpy as np
arr = np.array([1, 2, 3, 4])

This will create a one-dimensional array with four elements.

2. Array operations:
You can perform various operations on NumPy arrays like element-wise addition, subtraction, multiplication, and division. For example:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

Element-wise addition

c = a + b

Element-wise multiplication

d = a * b


3. Mathematical functions:
NumPy provides a wide range of mathematical functions like `np.sum()`, `np.mean()`, `np.median()`, `np.std()`, `np.power()`, and many more to perform calculations on arrays. For example:

Calculate the sum of an array

sum_arr = np.sum(arr)

Calculate the mean of an array

mean_arr = np.mean(arr)

Calculate the standard deviation of an array

std_arr = np.std(arr)


Scikit-learn:
Scikit-learn is a popular machine learning library in Python that provides a wide range of tools for building and training machine learning models. It includes algorithms for classification, regression, clustering, dimensionality reduction, and more. Here are some common operations you can perform with Scikit-learn:

1. Splitting data:
You can split your data into training and testing sets using the `train_test_split()` function. For example:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

This will split your data into 80% training and 20% testing sets.

2. Building a model:
You can build a machine learning model using algorithms like Linear Regression, Decision Trees, Random Forest, Support Vector Machines, and more from Scikit-learn. For example:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)


3. Evaluating a model:
You can evaluate the performance of your model using metrics like accuracy, precision, recall, F1-score, and confusion matrix from Scikit-learn. For example:

from sklearn.metrics import accuracy_score, confusion_matrix
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)



In this tutorial, we have covered some common operations you can perform with Pandas, NumPy, and Scikit-learn in machine learning applications. By mastering these packages and their functionalities, you will be able to build and train machine learning models effectively in Python. Experiment with different datasets and algorithms to gain a deeper understanding of how these packages work together in a machine learning pipeline. Stay tuned for more advanced tutorials on machine learning with Python!
0 0 votes
Article Rating
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@prithvidesai5802
1 month ago

I want to go with ai sujest me a path what I should study..

@BEPEC
1 month ago

Python, Our favorite language for maintainability