Used Automobile Price Prediction, a project completed as a course work in Machine Learning,
focused on
automobile data preprocessing,
and regression based price estimation using real world vehicle listing datasets.
The system processes noisy automobile data
to predict vehicle prices based on specifications,
fuel information, mileage, emissions, transmission,
and registration details.
Processed and cleaned large scale automobile datasets,
including handling missing values, inconsistent formats,
and categorical features.
Performed feature engineering workflows such as
vehicle age extraction, availability transformation,
fuel consumption parsing, and CO₂ emission processing.
Conducted exploratory data analysis (EDA)
and visualized distributions, outliers,
and feature relationships using statistical plots and boxplots.
Designed and trained a Random Forest Regression model
for predicting used vehicle prices based on engineered automobile features.
Implemented preprocessing pipelines for categorical encoding,
numerical feature extraction, and handling unseen categories
between training and testing datasets.
Evaluated model performance using Mean Absolute Error (MAE)
and optimized preprocessing workflows for improved prediction accuracy. Achieved 0.93 average
accuracy on test dataset.
Achieved 7th position in the Kaggle competition,
obtaining a Mean Absolute Error (MAE) score of 2137,
with the best experimental MAE reaching 1968.