Why do we split data into train test and validation sets?
When working with machine learning models, it is common practice to split your data into three separate sets: train, test, and validation. This is done in order to evaluate the performance of the model and prevent overfitting.
Training Set
The training set is used to train the model. It is used to help the model learn the patterns in the data and adjust its parameters accordingly. By using a training set, the model can better predict outcomes on unseen data.
Test Set
The test set is used to evaluate the performance of the model. Once the model has been trained on the training set, it is then tested on the test set to see how well it performs on new, unseen data. This helps to ensure that the model has generalized well and is not overfitting to the training data.
Validation Set
The validation set is used to fine-tune the model and make any final adjustments before deploying it in a real-world scenario. By using a validation set, you can measure the performance of the model on another set of unseen data and make any necessary changes to improve its accuracy.
Overall, splitting data into train, test, and validation sets is essential in machine learning to ensure that the model is performing well on new, unseen data and to prevent overfitting. It allows researchers and data scientists to build more robust and accurate models that can be deployed with confidence.
Simple and good explanation, thank you so much ☺️
Thanks for making a distinction between testing and validation
what is the need of testing data is the hyperparameters don't to be optimized?
NOLIKE for UN useful!!!!!
Can you explain something about it?????
Example the meaning and useful of each one
Good day ma, please can you help me out? I have been trying to figure out this for a long time but i could not. I want to know the best evaluation plots for machine learning models, specifically for classification problems. How best can someone visualize performance? Unlike deep learning models, you can use train and test curves, how best can we visualize using machine learning models? Do you have any video you have done about that? been checking your playlists but i can't find such, kindly help us out. Thanks
Ah, if you were living in France, I would have married you immediately, I would have taken you to some fancy restaurant everyday and during the night, you would have done my assignments within data science.
You are soo cute❤❤
Teşekkürler.
please make a video on logestic regression
video is very much useful. Your channel is so underrated.
Superb
Brilliant and clear!
Good one!
0:50'de blop efekti ödümü kopardı
You are the best, thank you!😊
Just what I was looking for, your video is so simple and easy to understand, and straight to the point!!!
I love your content. Everytime I split my data into train and valid, either using trainsplit function or manually, my val loss does not decrease below 1. The only way to get my val loss lower and lower, is to use part of my train data as validation data 😢
After finding the best hyperparameters for a model using validation data, should we retrain the model using both the training and validation data before using it on the test data?
Can we expect more pandas related videos