Machine Learning for Investing with Python Using Scikit Learn: Tutorial Part 16

Posted by


In this tutorial, we will continue our exploration of Scikit Learn for Machine Learning in Python for investing. We will dive into more advanced topics such as ensemble models, feature engineering, and hyperparameter tuning.

Ensemble models combine the predictions of multiple models to improve accuracy and robustness. One popular ensemble model is the Random Forest, which is a collection of decision trees. To build a Random Forest model in Scikit Learn, we can use the RandomForestRegressor for regression tasks or the RandomForestClassifier for classification tasks.

To demonstrate how to build a Random Forest model, let’s use a dataset of historical stock prices. We will predict the future stock price using the past performance of the stock. First, we need to load the data and preprocess it.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Load the dataset
data = pd.read_csv('stock_prices.csv')

# Extract features and target variable
X = data.drop(columns=['Date', 'Close'])
y = data['Close']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the Random Forest model
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Make predictions
y_pred = rf.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)

Next, let’s explore feature engineering, which involves transforming raw data into more informative features. This can include creating new variables, encoding categorical variables, and scaling numerical features.

For example, we can create new features based on the historical stock prices, such as moving averages, relative strength index (RSI), and MACD. These indicators can capture trends and momentum in the stock price movement.

# Create additional features
data['MA_50'] = data['Close'].rolling(window=50).mean()
data['RSI'] = compute_rsi(data['Close'], window=14)
data['MACD'] = compute_macd(data['Close'], window_short=12, window_long=26, window_signal=9)

Finally, hyperparameter tuning involves selecting the optimal values for the model’s parameters to improve performance. We can use Grid Search to search for the best combination of hyperparameters automatically.

from sklearn.model_selection import GridSearchCV

# Define the grid of parameters to search
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30]
}

# Create the Grid Search object
grid_search = GridSearchCV(rf, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_
print('Best Hyperparameters:', best_params)

In this tutorial, we have covered ensemble models, feature engineering, and hyperparameter tuning in Scikit Learn for Machine Learning in Python for investing. By applying these techniques, you can build more accurate and robust models for predicting stock prices. Happy investing!

0 0 votes
Article Rating

Leave a Reply

12 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@obi666
24 days ago

You can install Quandl using pip install Quandl command.

@kedarjyotirlinga9288
24 days ago

where can i find the apple inc data file? Wiki/AAPL is not available

@o.y.930
24 days ago

Hey man WIKI is no longer avaliable what other dataset can you suggest? thanks in advance man

@wcottee
24 days ago

Found this link interesting…apparently, at least for the free version, quandl doesn't update prices after March 27, 2018.
https://github.com/quantopian/zipline/issues/2145.

Do we have any other options???

@liangyumin9405
24 days ago

WIKI/AAPL may unavailable…

@joepaykel6761
24 days ago

With Python 3.6 and Windows 10, just typing pip install quandl didnt work for installation. I got it to work with: py -m pip install quandl

@superfed2742
24 days ago

I get this error when i use your code "ImportError: cannot import name 'quandl'"
Then I change "from quandl import quandl" to be "import quandl"
The error is
"FileNotFoundError: [Errno 2] No such file or directory: 'auth.txt'"

Please help me to fix this error. I spend long time to fix this.

@spinQubit
24 days ago

If you're watching in 2017, quandl has updated some stuff.

Your file should contain

import quandl

quandl.ApiConfig.api_key = 'your token here'

@nsouj22
24 days ago

must've changed; quandl worked instead of Quandl

@ognyanmoore146
24 days ago

You can install Quandl with:

> pip install Quandl 

It mentions that in the readme on the github page (pandas and dateutil are dependencies)

@MichaelMerritt
24 days ago

killer video, love the stocks analysis + python stuff. 

@IW4TCH
24 days ago

Hey man I sent you a pm,hope you got it.

12
0
Would love your thoughts, please comment.x
()
x