Drive Dutch LogoDrive Dutch
Model Evaluation

Toyota Aygo Price Prediction: Ridge Regression

DriveDutch
August 6, 2025

Data Sources

Marktplaats.nl Toyota Aygo listings (July 2025). Nested CV for Ridge Regression using 50 ShuffleSplits per polynomial degree on outer‑train (n=245). Final model with degree=3, α=0.01 evaluated on outer‑test (n=62).

Toyota Aygo Price Prediction: Ridge Regression

Executive Summary

This report analyzes Toyota Aygo pricing using Ridge Regression with a 3rd‑degree polynomial expansion of age and mileage as predictors. After performing rigorous nested cross‑validation, the selected model with degree=3 and α=0.01 achieved R² = 0.9424 (adjusted 0.9324) on the final holdout set.

🔗 Source: Active listings on Marktplaats.nl (July 2025)

What the Plot Shows

The Predicted vs Actual scatterplot for the Ridge model shows close alignment with the ideal line. Some widening appears at the top price end, but predictions remain impressively stable even for more complex price interactions.

Data & Splits

  • Total sample: 307 cars
  • Outer‑train: 245 (80%)
  • Outer‑test: 62 (20%, never used during tuning)

Method (Nested CV)

  1. Outer split: Hold out 20% of the data for a final evaluation.
  2. Inner model selection (on outer‑train):
    • For each polynomial degree (1 to 3), test a range of α (regularization strength) values.
    • For each pair (degree, α), perform 50× ShuffleSplit CV on the outer‑train.
    • Record mean ± std R² scores from inner validation sets.
  3. Select best degree & α: Pick the degree/α with the highest inner‑CV mean R².
  4. Final fit & test: Train a new Ridge model on all outer‑train data using best degree and α, then evaluate on the outer‑test set.

Inner‑CV Summary (best α per degree)

Degree 1 → best α=1.00 → mean R² = 0.8791 ± 0.0206
Degree 2 → best α=0.10 → mean R² = 0.9152 ± 0.0167
Degree 3 → best α=0.01 → mean R² = 0.9209 ± 0.0150 ← SELECTED

Final Model Details (degree 3, α=0.01)

Intercept: 4853.91

Coefficients:
  age                   =   +329.63
  mileage_km            = −5625.47
  age²                  = −10366.99
  age × mileage_km      =  +5618.22
  mileage_km²           =  +3365.05
  age³                  =  +5898.32
  age² × mileage_km     =  +1367.17
  age × mileage_km²     = −4863.16
  mileage_km³           =   +617.56

Coefficient Significance Bootstrap Results for Ridge Regression on Toyota

  • Although normal p values can not be used to estiamte the significance of coefficients in a ridge regression, its possible to get an estimate of the coefficient accuracy using Bootstrapping. As seen in the plot above the resulting confidence intervals show that Age*Milage_km^2, Milage_km^2, Milage_km^3, and Age are possibly 0 and hence insignificant.
  • Furthermore, since the bootstrap confidence interval shows how stable that basis feature’s effect is across resampled datasets. The large confidence intervals for all the variables which tells us the model can’t pin down a consistent magnitude across resamples. Future models could arguably consider reducing the degree of polynomial due to its added complexity for a minor 0.005 increase in R^2 on the highest inner CV mean.

Final Test Performance (outer‑test n = 62)

  • R² = 0.9424
  • Adjusted R² = 0.9324
    This confirms the model generalizes well to unseen data while balancing flexibility and regularization.

Takeaways

  • Polynomial regression (deg‑3) with Ridge regularization (α=0.01) strikes a strong balance between fit and generalization.
  • Unlike Random Forests, Ridge provides interpretable coefficients showing how price varies with combinations of age and mileage.
  • The model explains 94% of price variance on unseen listings.

Disclaimer

These results reflect the available sample (n = 307 total) and the specified features (age, mileage). Real‑world transaction prices can vary with condition, trim, options, and market dynamics not included in this model.


Compare with a tree-based approach on the same dataset:
Random Forest Regression

About This Research

This report is part of our ongoing analysis of the Dutch automotive market. Our research combines multiple data sources to provide comprehensive insights for industry professionals and market participants.

Category: Model Evaluation