← Back Home

Used Automobile Price Prediction

Github Repository Unavailable Project Report

Used Automobile Price Prediction, a project completed as a course work in Machine Learning, focused on automobile data preprocessing, and regression based price estimation using real world vehicle listing datasets. The system processes noisy automobile data to predict vehicle prices based on specifications, fuel information, mileage, emissions, transmission, and registration details.

Responsibilities

Processed and cleaned large scale automobile datasets, including handling missing values, inconsistent formats, and categorical features.

Performed feature engineering workflows such as vehicle age extraction, availability transformation, fuel consumption parsing, and CO₂ emission processing.

Conducted exploratory data analysis (EDA) and visualized distributions, outliers, and feature relationships using statistical plots and boxplots.

Designed and trained a Random Forest Regression model for predicting used vehicle prices based on engineered automobile features.

Implemented preprocessing pipelines for categorical encoding, numerical feature extraction, and handling unseen categories between training and testing datasets.

Evaluated model performance using Mean Absolute Error (MAE) and optimized preprocessing workflows for improved prediction accuracy. Achieved 0.93 average accuracy on test dataset.

Achieved 7th position in the Kaggle competition, obtaining a Mean Absolute Error (MAE) score of 2137, with the best experimental MAE reaching 1968.

Technologies & Domains

Python Pandas NumPy Scikit-learn Random Forest Regression Machine Learning Feature Engineering Data Preprocessing Exploratory Data Analysis Regression Modeling Matplotlib

Code was executed on Kaggle as part of a competition, a copy of it uploaded on GitHub.