Hyperparameter Tuning: GridSearchCV, RandomizedSearchCV, Optuna, Cross-Validation Strategies, and Practical Tuning Workflows
You built a Random Forest with default parameters. It gets 82% accuracy. Is that good? Could it be 92% with different settings? You change n_estimators from 100 to 500 — accuracy jumps to 86%. You change max_depth from None to 10 — it drops to 79%. Which combination of settings gives the best result?
This is the hyperparameter tuning problem. Model parameters (like weights in linear regression) are learned from data automatically. Hyperparameters (like number of trees, learning rate, max depth) are settings YOU choose BEFORE training. The wrong combination wastes compute and produces mediocre models. The right combination unlocks the model’s full potential.
Think of it like tuning a guitar. The strings (parameters) find their pitch through playing (training). But the tuning pegs (hyperparameters) must be set by the musician BEFORE playing. Turn them too tight — the string snaps (overfitting). Too loose — it sounds flat (underfitting). The sweet spot requires systematic experimentation.
This post covers every hyperparameter tuning technique — from brute-force GridSearch to intelligent Bayesian optimization with Optuna — with Python code, comparison tables, cross-validation strategies, and a practical workflow you can apply to any model.
Table of Contents
- Parameters vs Hyperparameters
- Why Default Hyperparameters Are Rarely Optimal
- Cross-Validation: The Foundation
- K-Fold Cross-Validation
- Stratified K-Fold
- Other CV Strategies
- GridSearchCV (Exhaustive Search)
- How GridSearch Works
- GridSearchCV in Python
- GridSearch Limitations
- RandomizedSearchCV (Random Sampling)
- How RandomizedSearch Works
- RandomizedSearchCV in Python
- GridSearch vs RandomizedSearch
- Bayesian Optimization with Optuna
- How Bayesian Optimization Works
- Optuna in Python
- Optuna Visualization
- Key Hyperparameters by Model
- Random Forest Hyperparameters
- XGBoost Hyperparameters
- Logistic Regression Hyperparameters
- Practical Tuning Workflow
- Overfitting vs Underfitting During Tuning
- Common Mistakes
- Interview Questions
- Wrapping Up
Parameters vs Hyperparameters
| Aspect | Parameters | Hyperparameters |
|---|---|---|
| Set by | The model (learned during training) | You (set before training) |
| Examples | Weights, coefficients, split thresholds | Learning rate, max_depth, n_estimators, C |
| Change during training? | Yes (updated with each iteration) | No (fixed for the entire training run) |
| Stored in model? | Yes (model.coef_, tree splits) | Yes (model.get_params()) |
| Analogy | The notes a musician plays (learned) | The tuning pegs (set before playing) |
Key insight: You cannot tune hyperparameters by looking at training accuracy alone. A model with max_depth=50 might get 99% training accuracy but 60% test accuracy (overfitting). You need cross-validation to evaluate hyperparameter choices on unseen data.
Why Default Hyperparameters Are Rarely Optimal
Library defaults are chosen to work “reasonably” across many datasets. But your data is unique. Defaults are like buying medium-sized clothes for everyone — they fit nobody perfectly.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Default Random Forest
rf_default = RandomForestClassifier(random_state=42)
default_scores = cross_val_score(rf_default, X, y, cv=5, scoring='accuracy')
print(f"Default: {default_scores.mean():.4f}") # 0.8240
# After tuning
rf_tuned = RandomForestClassifier(
n_estimators=300, max_depth=12, min_samples_split=5,
min_samples_leaf=2, max_features='sqrt', random_state=42
)
tuned_scores = cross_val_score(rf_tuned, X, y, cv=5, scoring='accuracy')
print(f"Tuned: {tuned_scores.mean():.4f}") # 0.9120
# +8.8% accuracy just from hyperparameter tuning — no new data, no new features
Cross-Validation: The Foundation
Before learning any tuning technique, you must understand cross-validation — it is the evaluation method that all tuning techniques use internally.
The problem with train/test split: If you tune hyperparameters based on test set performance, you are indirectly “training” on the test set. The model optimizes for THAT specific test split, and your reported accuracy is overly optimistic.
Cross-validation solves this by evaluating on multiple different train/test splits and averaging the results.
K-Fold Cross-Validation
5-Fold Cross-Validation:
Fold 1: [TEST] [Train] [Train] [Train] [Train] → Score: 0.82
Fold 2: [Train] [TEST] [Train] [Train] [Train] → Score: 0.85
Fold 3: [Train] [Train] [TEST] [Train] [Train] → Score: 0.79
Fold 4: [Train] [Train] [Train] [TEST] [Train] → Score: 0.84
Fold 5: [Train] [Train] [Train] [Train] [TEST] → Score: 0.81
Average Score: 0.822 ± 0.02
Every data point is in the test set exactly once.
Every data point is in the training set exactly 4 times.
The average score is a more reliable estimate than any single split.
from sklearn.model_selection import cross_val_score, KFold
# Basic K-Fold
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')
print(f"Mean: {scores.mean():.4f}, Std: {scores.std():.4f}")
# Mean: 0.8220, Std: 0.0189
Stratified K-Fold
For classification problems, Stratified K-Fold preserves the class distribution in each fold. If 30% of your data is “fraud” (class 1), each fold will also have approximately 30% fraud. This prevents a fold from having zero fraud cases (which would give a misleading score).
from sklearn.model_selection import StratifiedKFold
# Stratified K-Fold (DEFAULT in scikit-learn for classifiers)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=skf, scoring='f1')
print(f"Stratified Mean F1: {scores.mean():.4f}")
# For imbalanced datasets (e.g., 5% fraud), Stratified is ESSENTIAL
# Regular K-Fold might create a fold with 0% fraud → meaningless evaluation
Other CV Strategies
| Strategy | Use When | Code |
|---|---|---|
| K-Fold | Regression, balanced classification | KFold(n_splits=5) |
| Stratified K-Fold | Imbalanced classification | StratifiedKFold(n_splits=5) |
| Leave-One-Out (LOO) | Very small datasets (<100 rows) | LeaveOneOut() |
| Repeated K-Fold | When you need very stable estimates | RepeatedKFold(n_splits=5, n_repeats=3) |
| Time Series Split | Temporal data (no future data leakage) | TimeSeriesSplit(n_splits=5) |
| Group K-Fold | When certain groups must stay together | GroupKFold(n_splits=5) |
Time Series Split deserves special attention for data engineers — it ensures training data always comes BEFORE test data chronologically, preventing future data leakage:
Time Series Split (5 folds):
Fold 1: [Train] [TEST] ---- ---- ----
Fold 2: [Train] [Train] [TEST] ---- ----
Fold 3: [Train] [Train] [Train] [TEST] ----
Fold 4: [Train] [Train] [Train] [Train] [TEST]
Training set grows with each fold. Test is always the NEXT period.
GridSearchCV (Exhaustive Search)
GridSearch tries EVERY combination of hyperparameter values you specify. If you give it 3 options for parameter A and 4 options for parameter B, it tries all 3 × 4 = 12 combinations.
Real-life analogy: You are trying to find the best coffee recipe. You have 3 grind sizes (coarse, medium, fine) and 4 brew times (2, 3, 4, 5 minutes). GridSearch brews ALL 12 combinations and picks the best. Thorough, but it takes 12 cups of coffee.
How GridSearch Works
Parameter Grid:
n_estimators: [100, 200, 300]
max_depth: [5, 10, 15, 20]
min_samples_split: [2, 5]
Total combinations: 3 × 4 × 2 = 24
With 5-fold CV: 24 × 5 = 120 model fits
GridSearch fits ALL 120, records every score, and returns the best combination.
GridSearchCV in Python
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
# Define the model
rf = RandomForestClassifier(random_state=42)
# Define the parameter grid
param_grid = {
'n_estimators': [100, 200, 300, 500],
'max_depth': [5, 10, 15, 20, None],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['sqrt', 'log2']
}
# Total: 4 × 5 × 3 × 3 × 2 = 360 combinations × 5 folds = 1,800 fits!
# Run GridSearch
grid_search = GridSearchCV(
estimator=rf,
param_grid=param_grid,
cv=5, # 5-fold cross-validation
scoring='accuracy', # metric to optimize
n_jobs=-1, # use all CPU cores
verbose=2, # show progress
return_train_score=True # also record training scores
)
grid_search.fit(X_train, y_train)
# Best results
print(f"Best Score: {grid_search.best_score_:.4f}")
print(f"Best Params: {grid_search.best_params_}")
# Use the best model directly
best_model = grid_search.best_estimator_
test_score = best_model.score(X_test, y_test)
print(f"Test Score: {test_score:.4f}")
# View all results as a DataFrame
results = pd.DataFrame(grid_search.cv_results_)
print(results[['params', 'mean_test_score', 'std_test_score', 'rank_test_score']]
.sort_values('rank_test_score').head(10))
GridSearch Limitations
- Computationally explosive — 5 parameters with 5 values each = 5⁵ = 3,125 combinations × 5 folds = 15,625 fits
- Wasteful — many combinations are obviously bad but still evaluated
- Discrete grid — the optimal value might be between your grid points (e.g., best learning_rate is 0.07 but you only tried 0.01, 0.05, 0.1)
When to use GridSearch: Small parameter spaces (<100 combinations), important final tuning, when compute time is not a concern.
RandomizedSearchCV (Random Sampling)
Instead of trying EVERY combination, RandomizedSearch samples a fixed number of random combinations from the parameter space. You control how many combinations to try with n_iter.
Real-life analogy: Instead of tasting every dish at a buffet (GridSearch), you randomly pick 20 dishes (RandomizedSearch). You are unlikely to find the absolute best dish, but you will find a great one — in a fraction of the time.
How RandomizedSearch Works
Parameter Space (same as GridSearch):
n_estimators: [100, 200, 300, 500]
max_depth: [5, 10, 15, 20, None]
min_samples_split: [2, 5, 10]
GridSearch: tries ALL 4 × 5 × 3 = 60 combinations
RandomizedSearch (n_iter=20): tries 20 RANDOM combinations
Key advantage: you can use CONTINUOUS distributions instead of fixed lists:
learning_rate: uniform(0.001, 0.3) → samples any value in range
max_depth: randint(3, 30) → samples any integer in range
This explores the space more efficiently than a fixed grid.
RandomizedSearchCV in Python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
# Define distributions (not just lists!)
param_distributions = {
'n_estimators': randint(100, 1000), # any integer 100-1000
'max_depth': randint(3, 30), # any integer 3-30
'min_samples_split': randint(2, 20), # any integer 2-20
'min_samples_leaf': randint(1, 10), # any integer 1-10
'max_features': ['sqrt', 'log2', None], # categorical
'bootstrap': [True, False] # boolean
}
# Run RandomizedSearch
random_search = RandomizedSearchCV(
estimator=RandomForestClassifier(random_state=42),
param_distributions=param_distributions,
n_iter=100, # try 100 random combinations
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=1,
random_state=42,
return_train_score=True
)
random_search.fit(X_train, y_train)
# Results
print(f"Best Score: {random_search.best_score_:.4f}")
print(f"Best Params: {random_search.best_params_}")
print(f"Test Score: {random_search.best_estimator_.score(X_test, y_test):.4f}")
GridSearch vs RandomizedSearch
| Feature | GridSearchCV | RandomizedSearchCV |
|---|---|---|
| Search strategy | Every combination | Random sample of combinations |
| Compute cost | Exponential (grows with grid size) | Fixed (you set n_iter) |
| Finds the best? | Guaranteed (within the grid) | Not guaranteed but usually close |
| Continuous params? | No (discrete grid only) | Yes (sample from distributions) |
| When to use | Small grids, final fine-tuning | Large spaces, first-pass exploration |
| Typical n_iter | All (no control) | 50-200 |
Best practice: Use RandomizedSearch first (broad exploration, n_iter=100), then GridSearch (narrow fine-tuning around the best region found).
Bayesian Optimization with Optuna
GridSearch and RandomizedSearch are “uninformed” — each trial is independent. Bayesian optimization is informed — it learns from previous trials. If trial #5 showed that max_depth=10 with learning_rate=0.05 scored well, trial #6 explores nearby values rather than jumping to a random corner of the search space.
Optuna is the most popular Bayesian optimization library for Python. It is faster, smarter, and easier to use than GridSearch or RandomizedSearch for complex hyperparameter spaces.
Real-life analogy: GridSearch is like searching for a restaurant by visiting every block in the city. RandomizedSearch is like picking random blocks. Optuna is like asking locals: “The Italian place on 5th was great” → “Try the one on 6th, it is similar but better.” Each trial is informed by what worked before.
How Bayesian Optimization Works
1. Try a few random combinations (exploration phase)
2. Build a probability model of "which hyperparameters → which scores"
3. Use the model to predict the MOST PROMISING next combination
4. Try that combination, observe the score
5. Update the probability model
6. Repeat steps 3-5 for N trials
The model balances:
EXPLORATION: try unexplored regions (maybe something better is out there)
EXPLOITATION: try near the current best (refine what is already working)
Result: finds near-optimal hyperparameters in 50-100 trials instead of 1000+
Optuna in Python
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Define the objective function
def objective(trial):
# Optuna suggests values from defined ranges
params = {
'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
'max_depth': trial.suggest_int('max_depth', 3, 30),
'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2']),
'bootstrap': trial.suggest_categorical('bootstrap', [True, False]),
}
model = RandomForestClassifier(**params, random_state=42)
scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
return scores.mean()
# Create study and optimize
study = optuna.create_study(direction='maximize') # maximize accuracy
study.optimize(objective, n_trials=100, show_progress_bar=True)
# Results
print(f"Best Score: {study.best_value:.4f}")
print(f"Best Params: {study.best_params}")
print(f"Trials completed: {len(study.trials)}")
# Train final model with best params
best_model = RandomForestClassifier(**study.best_params, random_state=42)
best_model.fit(X_train, y_train)
print(f"Test Score: {best_model.score(X_test, y_test):.4f}")
Optuna for XGBoost
import xgboost as xgb
def xgb_objective(trial):
params = {
'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
'max_depth': trial.suggest_int('max_depth', 3, 12),
'learning_rate': trial.suggest_float('learning_rate', 0.001, 0.3, log=True),
'subsample': trial.suggest_float('subsample', 0.5, 1.0),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
'gamma': trial.suggest_float('gamma', 0.0, 5.0),
'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
}
model = xgb.XGBClassifier(**params, use_label_encoder=False,
eval_metric='logloss', random_state=42)
scores = cross_val_score(model, X_train, y_train, cv=5, scoring='f1')
return scores.mean()
study = optuna.create_study(direction='maximize')
study.optimize(xgb_objective, n_trials=100)
Optuna Visualization
# Optuna has built-in visualization
from optuna.visualization import (
plot_optimization_history,
plot_param_importances,
plot_parallel_coordinate,
plot_slice
)
# 1. How the score improved over trials
fig = plot_optimization_history(study)
fig.show()
# 2. Which hyperparameters matter most
fig = plot_param_importances(study)
fig.show()
# 3. Parallel coordinate plot (relationships between params)
fig = plot_parallel_coordinate(study)
fig.show()
# 4. Slice plot (each parameter vs score)
fig = plot_slice(study)
fig.show()
# The param_importances plot is gold for interviews:
# "learning_rate and max_depth had the highest impact on model performance.
# min_samples_leaf barely mattered."
Key Hyperparameters by Model
Random Forest Hyperparameters
| Hyperparameter | What It Controls | Default | Typical Range |
|---|---|---|---|
n_estimators |
Number of trees | 100 | 100-1000 |
max_depth |
Maximum tree depth | None (unlimited) | 3-30 |
min_samples_split |
Min samples to split a node | 2 | 2-20 |
min_samples_leaf |
Min samples in a leaf node | 1 | 1-10 |
max_features |
Features considered per split | sqrt | sqrt, log2, None |
bootstrap |
Sample with replacement? | True | True, False |
XGBoost Hyperparameters
| Hyperparameter | What It Controls | Default | Typical Range |
|---|---|---|---|
learning_rate (eta) |
Step size per tree (lower = more trees needed) | 0.3 | 0.001-0.3 |
n_estimators |
Number of boosting rounds | 100 | 100-1000 |
max_depth |
Maximum tree depth | 6 | 3-12 |
subsample |
Row sampling ratio per tree | 1.0 | 0.5-1.0 |
colsample_bytree |
Column sampling ratio per tree | 1.0 | 0.5-1.0 |
min_child_weight |
Min sum of instance weight in a leaf | 1 | 1-10 |
gamma |
Min loss reduction to make a split | 0 | 0-5 |
reg_alpha (L1) |
L1 regularization | 0 | 1e-8 to 10 |
reg_lambda (L2) |
L2 regularization | 1 | 1e-8 to 10 |
Logistic Regression Hyperparameters
| Hyperparameter | What It Controls | Default | Typical Range |
|---|---|---|---|
C |
Inverse regularization strength (lower = stronger) | 1.0 | 0.001-100 |
penalty |
Regularization type | l2 | l1, l2, elasticnet |
solver |
Optimization algorithm | lbfgs | lbfgs, liblinear, saga |
max_iter |
Maximum iterations | 100 | 100-1000 |
Practical Tuning Workflow
Step 1: BASELINE
Train model with default hyperparameters.
Record baseline cross-validation score.
This is your "bar to beat."
Step 2: COARSE SEARCH (RandomizedSearch or Optuna, n_iter=50-100)
Define wide parameter ranges.
Find the general "good region" of hyperparameter space.
Step 3: FINE SEARCH (GridSearch, narrow grid around Step 2 best)
Narrow ranges to ±20% of Step 2 best values.
Example: Step 2 found max_depth=12 → Grid: [10, 11, 12, 13, 14]
Step 4: EVALUATE ON HELD-OUT TEST SET
Train final model with best params on ALL training data.
Evaluate ONCE on the test set.
This is your final reported score.
Step 5: CHECK FOR OVERFITTING
Compare train score vs CV score.
If train=0.99, CV=0.85 → overfitting → increase regularization.
If train=0.80, CV=0.78 → healthy gap.
# Complete tuning workflow in code
from sklearn.model_selection import train_test_split
# Hold out a final test set FIRST (never used during tuning)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Step 1: Baseline
baseline = RandomForestClassifier(random_state=42)
baseline_scores = cross_val_score(baseline, X_train, y_train, cv=5)
print(f"Baseline: {baseline_scores.mean():.4f}")
# Step 2: Coarse search with Optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
print(f"Optuna best: {study.best_value:.4f}")
# Step 3: Fine-tune with GridSearch around Optuna best
best = study.best_params
fine_grid = {
'n_estimators': [best['n_estimators'] - 50, best['n_estimators'], best['n_estimators'] + 50],
'max_depth': [best['max_depth'] - 1, best['max_depth'], best['max_depth'] + 1],
'min_samples_split': [max(2, best['min_samples_split'] - 1), best['min_samples_split'],
best['min_samples_split'] + 1],
}
grid = GridSearchCV(RandomForestClassifier(random_state=42), fine_grid, cv=5, n_jobs=-1)
grid.fit(X_train, y_train)
print(f"Fine-tuned: {grid.best_score_:.4f}")
# Step 4: Final evaluation
final_model = grid.best_estimator_
print(f"Test Score: {final_model.score(X_test, y_test):.4f}")
Overfitting vs Underfitting During Tuning
| Signal | Problem | Fix |
|---|---|---|
| Train=0.99, CV=0.75 | Overfitting | Reduce max_depth, increase min_samples_split, add regularization |
| Train=0.70, CV=0.68 | Underfitting | Increase n_estimators, increase max_depth, add more features |
| Train=0.88, CV=0.85 | Good fit | No action — healthy 3% gap |
| CV varies widely (0.60-0.90) | High variance | More data, simpler model, or more CV folds |
# Check for overfitting: compare train vs CV scores
from sklearn.model_selection import cross_validate
results = cross_validate(best_model, X_train, y_train, cv=5,
scoring='accuracy', return_train_score=True)
print(f"Train: {results['train_score'].mean():.4f}")
print(f"CV: {results['test_score'].mean():.4f}")
print(f"Gap: {results['train_score'].mean() - results['test_score'].mean():.4f}")
# Gap > 0.10 = likely overfitting
# Gap < 0.03 = good generalization
Common Mistakes
-
Tuning on the test set — if you use the test set to compare hyperparameter combinations, you are leaking test information into training. Use cross-validation on the training set for tuning. Evaluate on the test set ONCE at the very end.
-
Starting with GridSearch on a large space — GridSearch with 6 parameters × 5 values = 15,625 combinations × 5 folds = 78,125 fits. Start with RandomizedSearch or Optuna to narrow the space first, then fine-tune with a small GridSearch.
-
Ignoring cross-validation variance — a model with CV scores [0.60, 0.90, 0.70, 0.85, 0.65] (mean=0.74, std=0.12) is unreliable despite a decent mean. High variance means the model is inconsistent. Look at both mean AND standard deviation.
-
Using regular K-Fold for imbalanced data — if 5% of your data is the positive class, a fold might have 0% positives. Use Stratified K-Fold for classification to preserve class ratios in each fold.
-
Not setting random_state — without it, results change every run and you cannot reproduce your best model. Always set
random_state=42(or any fixed number) in the model, CV splits, and train_test_split. -
Tuning hyperparameters before fixing data quality — no amount of tuning compensates for missing values, leaky features, or incorrect labels. Clean and feature-engineer your data FIRST, then tune.
Interview Questions
Q: What is the difference between GridSearchCV and RandomizedSearchCV? A: GridSearchCV tries every combination in the parameter grid (exhaustive but slow). RandomizedSearchCV samples a fixed number of random combinations (faster, supports continuous distributions). Use RandomizedSearch for initial exploration and GridSearch for final fine-tuning around the best region.
Q: What is cross-validation and why is it necessary for hyperparameter tuning? A: Cross-validation splits training data into K folds, trains on K-1 folds, and evaluates on the remaining fold — repeating K times. It gives a more reliable performance estimate than a single train/test split. Without it, you might overfit your hyperparameters to one specific data split.
Q: How does Bayesian optimization differ from grid/random search? A: GridSearch and RandomizedSearch try combinations independently — each trial ignores results of previous trials. Bayesian optimization (Optuna) learns from previous trials, building a probability model of which hyperparameter regions produce good scores. It intelligently focuses on promising regions, finding near-optimal values in far fewer trials.
Q: When would you use Stratified K-Fold instead of regular K-Fold? A: For classification tasks with imbalanced classes. Stratified K-Fold preserves the class distribution in each fold. If 5% of data is fraud, each fold has approximately 5% fraud. Regular K-Fold might create a fold with 0% fraud, giving a misleading evaluation score.
Q: How do you know if your model is overfitting during tuning? A: Compare the training score with the cross-validation score. If training accuracy is 99% but CV accuracy is 80%, the 19% gap indicates overfitting. Fix by increasing regularization (lower max_depth, higher min_samples_split) or adding more training data. A healthy gap is typically under 5%.
Q: Describe a practical hyperparameter tuning workflow. A: Step 1: Establish a baseline with default hyperparameters. Step 2: Coarse search using RandomizedSearch or Optuna (n_iter=100) with wide parameter ranges. Step 3: Fine-tune using GridSearch with narrow ranges around the best values found. Step 4: Evaluate the final model ONCE on the held-out test set. Step 5: Check for overfitting by comparing train vs CV scores.
Wrapping Up
Hyperparameter tuning is where models go from “good enough” to “production-grade.” The jump from default hyperparameters to tuned hyperparameters can be 5-15% accuracy improvement — without any new data or features.
Start with RandomizedSearch or Optuna to explore the space broadly. Fine-tune with GridSearch. Always use cross-validation. And remember: the best hyperparameters in the world cannot fix bad data — clean and feature-engineer first, then tune.
Optuna is the modern choice for serious tuning — it is smarter (Bayesian), faster (pruning bad trials), and gives you built-in visualization of what matters most. If you are still using only GridSearch, try Optuna on your next project. You will never go back.
Related posts: – Decision Trees & Random Forests – XGBoost & Gradient Boosting – Model Evaluation Deep Dive – Feature Engineering – Clustering Algorithms
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.