Hyperparameters & Grid-Search
Machine Learning Process
Utilizing Grid-Search on Hyperparameters allows data scientists to optimize how well their models are learning from the data. Before we dive into what hyperparameters are and how grid-search applies to them, let’s figure out where they fit into the overall machine learning process. The flow-chart below illustrates where both hyperparameters & hyperparameter tuning (i.e. Grid-Search) take place during the machine learning process.
Parameter vs Hyperparameter; Internal vs External
Like many terms and acronyms you come across when entering the world of machine learning, understanding the difference between ‘parameters’ and hyperparameters’ can be difficult. Let’s take a quick moment to break these two terms down.
Parameters — Model parameters are internal to the model. They are key to the machine learning algorithms as they are part of the model that is learned from the training data. Parameters are required by the model for making predictions, they are not set manually and are typically estimated using an optimization algorithm. Final parameters, determined after training, dictate how the model will perform on unseen data.
Common examples of parameters include:
- coefficients in Linear/Logistic Regression Models
- weights in an Artificial Neural Network
- split points in a Decision Tree
Hyperparameters — Model hyperparameters are external to the model and are not estimated by the training data. Typically, they are set before the training starts and are set manually by the user. They are estimated for a specific model using a tuning process such as ‘grid-search.’ Ultimately, your hyperparameters dictate how efficient the model training is.
Common examples of hyperparameters include:
- learning rate in Gradient Descent
- maximum depth in a Decision Tree
- # of neighbors in K-Nearest Neighbors
Grid-Search & its Importance to ML
Now that we know the difference between parameters and hyperparameters and their roles in machine learning — it's time to dive into determining which hyperparameters to use in our models. If you were like me when I was first learning about hyperparameters, I would implement the old ‘plug-and-chug’ method until I was happy with the results my models were outputting. While this method can be exhilarating when guess the correct combo and you see your training results sky-rocket, this can be time consuming, unoptimized and ultimately bad practice.
Grid-Search is a better method of hyperparameter tuning than my previously described ‘plug-and-chug’ method. Grid-Search (GS) can be used on a by-model basis, as each type of machine learning model has different catalogue of hyperparameters. GS is a tuning technique that allows users to select which hyperparameters and specific hyperparameters values (e.i. k=1,2,3 for n_neighors in KNN), GS then creates a model for each combination possible to then determine which hyperparameters work best for that specific model. Depending on the depth of the hyperparameters and values defined in a GS, it can become very computationally and time taxing, thought it is typically worthwhile. Grid-Search is only one process of hyperparameter tuning, other methods can be found here.
Lastly, It's common for data scientists to create several types of models during the early phases of a data science project and GS can be a big help when determining whether you should move forward with a Logistic Regression model vs a Decision Tree. So, it is important to apply GS to all models in order to best evaluate each different type of model on as even of a playing field as possible. As comparing a Grid-Searched Logistic Regression model to a non-Grid-Searched Decision Tree model would essentially be comparing apples to oranges.
I hope this quick introduction to Hyperparameters & Grid-Search was helpful as you explore the world of data science.
References & Resources