Skip to content

vaish9825/Product_Price_Prediction

Repository files navigation

Product Price Prediction using Machine Learning

📌 Problem Statement (Amazon ML Hackathon 2025)

In e-commerce marketplaces, determining the optimal price for a product is critical for business success and customer satisfaction. Pricing depends on multiple complex factors such as brand, specifications, textual description, product quantity, and visual features from product images.

The objective of this project is to build a machine learning model that can analyze product details holistically and accurately predict product prices.


🔍 Overview

This project focuses on predicting product prices for large-scale e-commerce listings using machine learning.
It leverages unstructured catalog text, extracted product attributes, and image-based features to learn complex pricing patterns from noisy real-world data.

Built as part of the Amazon ML Challenge, the solution is designed to scale efficiently while optimizing predictions using business-centric evaluation metrics.


📊 Dataset Description

The dataset consists of both structured and unstructured product information.

Columns

  • sample_id – Unique identifier for each product
  • catalog_content – Concatenated product title, description, and Item Pack Quantity (IPQ)
  • image_link – Public URL of the product image
  • price – Product price (target variable, available only in training data)

Dataset Size

  • Training Set: 75,000 products with prices
  • Test Set: 75,000 products used for final evaluation

🎯 Objective

Develop a robust ML solution that predicts product prices by leveraging:

  • Unstructured textual descriptions
  • Product metadata such as IPQ and specifications
  • Visual features extracted from product images

📐 Evaluation Metric

Model performance is evaluated using Symmetric Mean Absolute Percentage Error (SMAPE):

$$ SMAPE = \frac{100}{n} \sum \frac{|y_{pred} - y_{true}|}{\left(\frac{|y_{pred}| + |y_{true}|}{2}\right)} $$

This metric:

  • Treats overestimation and underestimation equally
  • Is robust to scale differences in product prices

🧠 Approach

  • Cleaned and normalized noisy catalog text
  • Extracted structured signals such as brand and Item Pack Quantity (IPQ)
  • Generated text embeddings using transformer-based models
  • Fine-tuned a transformer model for price regression
  • Evaluated model performance using SMAPE

▶️ How to Run

Clone the repository using git clone https://github.com/vaish9825/Product_Price_Prediction.git and navigate into the project directory. Install the required dependencies with pip install -r requirements.txt, then run the notebooks in order: dali_embeddings.ipynb, model_optimized.ipynb, and model-inference.ipynb to preprocess data, train the model, and generate price predictions.


📈 Results

  • Trained a robust regression model on large-scale noisy e-commerce data
  • Achieved stable convergence and competitive performance on SMAPE
  • Final Rank: 17th (Top 50) in the Amazon ML Challenge

About

Machine learning project for predicting e-commerce product prices using transformer-based models on large-scale, noisy catalog data. Built for the Amazon ML Challenge 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors