In this tutorial, we will continue our exploration of Scikit Learn for Machine Learning in Python for investing. We will dive into more advanced topics such as ensemble models, feature engineering, and hyperparameter tuning.
Ensemble models combine the predictions of multiple models to improve accuracy and robustness. One popular ensemble model is the Random Forest, which is a collection of decision trees. To build a Random Forest model in Scikit Learn, we can use the RandomForestRegressor for regression tasks or the RandomForestClassifier for classification tasks.
To demonstrate how to build a Random Forest model, let’s use a dataset of historical stock prices. We will predict the future stock price using the past performance of the stock. First, we need to load the data and preprocess it.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
# Load the dataset
data = pd.read_csv('stock_prices.csv')
# Extract features and target variable
X = data.drop(columns=['Date', 'Close'])
y = data['Close']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build the Random Forest model
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
# Make predictions
y_pred = rf.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)
Next, let’s explore feature engineering, which involves transforming raw data into more informative features. This can include creating new variables, encoding categorical variables, and scaling numerical features.
For example, we can create new features based on the historical stock prices, such as moving averages, relative strength index (RSI), and MACD. These indicators can capture trends and momentum in the stock price movement.
# Create additional features
data['MA_50'] = data['Close'].rolling(window=50).mean()
data['RSI'] = compute_rsi(data['Close'], window=14)
data['MACD'] = compute_macd(data['Close'], window_short=12, window_long=26, window_signal=9)
Finally, hyperparameter tuning involves selecting the optimal values for the model’s parameters to improve performance. We can use Grid Search to search for the best combination of hyperparameters automatically.
from sklearn.model_selection import GridSearchCV
# Define the grid of parameters to search
param_grid = {
'n_estimators': [100, 200, 300],
'max_depth': [10, 20, 30]
}
# Create the Grid Search object
grid_search = GridSearchCV(rf, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
# Get the best hyperparameters
best_params = grid_search.best_params_
print('Best Hyperparameters:', best_params)
In this tutorial, we have covered ensemble models, feature engineering, and hyperparameter tuning in Scikit Learn for Machine Learning in Python for investing. By applying these techniques, you can build more accurate and robust models for predicting stock prices. Happy investing!
You can install Quandl using pip install Quandl command.
where can i find the apple inc data file? Wiki/AAPL is not available
Hey man WIKI is no longer avaliable what other dataset can you suggest? thanks in advance man
Found this link interesting…apparently, at least for the free version, quandl doesn't update prices after March 27, 2018.
https://github.com/quantopian/zipline/issues/2145.
Do we have any other options???
WIKI/AAPL may unavailable…
With Python 3.6 and Windows 10, just typing pip install quandl didnt work for installation. I got it to work with: py -m pip install quandl
I get this error when i use your code "ImportError: cannot import name 'quandl'"
Then I change "from quandl import quandl" to be "import quandl"
The error is
"FileNotFoundError: [Errno 2] No such file or directory: 'auth.txt'"
Please help me to fix this error. I spend long time to fix this.
If you're watching in 2017, quandl has updated some stuff.
Your file should contain
import quandl
quandl.ApiConfig.api_key = 'your token here'
must've changed; quandl worked instead of Quandl
You can install Quandl with:
> pip install Quandl
It mentions that in the readme on the github page (pandas and dateutil are dependencies)
killer video, love the stocks analysis + python stuff.
Hey man I sent you a pm,hope you got it.