Skip to content

theHoodguy4587/Market-Basket-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Online Retail Analysis

Project Overview

This project analyzes sales data from an online retail store to extract actionable insights on product performance, customer behavior, and temporal trends.
The goal is to support business decisions such as inventory planning, cross-selling strategies, and marketing campaigns.

The analysis includes:

  • Top products by revenue
  • Top countries by revenue
  • Frequently co-purchased product pairs
  • Monthly sales trends

All analyses are conducted using SQL queries and Python (Pandas, Matplotlib, Seaborn) in a Jupyter Notebook.


Dataset

  • Source: Online Retail dataset
  • Rows: ___
  • Columns: ___
  • Key Fields:
    • InvoiceNo
    • StockCode
    • Description
    • Quantity
    • InvoiceDate
    • UnitPrice
    • CustomerID
    • Country

Data Cleaning Steps:

  • Removed rows with missing or null CustomerID where necessary
  • Removed negative or zero quantities where appropriate
  • Created a Revenue column = Quantity * UnitPrice

Analysis & Visualizations

1. Top Products by Revenue

  • Identifies products generating the highest sales revenue.

Top Products by Revenue

2. Top Countries by Revenue

  • Shows which countries contribute the most to total sales.

Top Countries by Revenue

3. Product Pairs (Co-Purchases)

  • Identifies products frequently purchased together within the same invoice.

Top Product Pairs

4. Monthly Sales Trend

  • Analyzes revenue trends across months to understand seasonality and growth patterns.

Monthly Sales Trend


Key Insights

Top Products by Revenue

  • The highest revenue–generating product is Dotcom postage, contributing approximately $ 200000 in total revenue.
  • The lowest revenue among the top 10 products is Rabbit Night light, indicating a significant revenue gap between leading and trailing products.
  • This concentration suggests that a small number of products have a disproportionately large impact on overall sales performance.

Top Countries by Revenue

  • The highest revenue–generating country is United Kingdom, accounting for approximately $ 8300000 in total revenue.
  • The lowest revenue country within the top 10 is Sweden, highlighting uneven geographic distribution.
  • This indicates strong market dominance in certain regions, while other markets may present growth opportunities.

Product Pairs (Co-Purchases)

  • The most frequently co-purchased product pair is GREEN REGENCY TEACUP AND SAUCER + PINK REGENCY TEACUP AND SAUCER, with 893 co-purchases.
  • The least frequent pair within the top 10 is JUMBO BAG RED RETROSPOT + JUMBO BAG BAROQUE BLACK WHITE, suggesting weaker but still notable association.
  • Strong co-purchase patterns indicate complementary products that customers tend to buy together.
  • These relationships can be leveraged to improve cross-selling, bundling strategies, and recommendation systems.

Monthly Sales Trend

  • The month with the highest revenue is Novemeber 2011, generating approximately $1300000 in revenue.
  • The month with the lowest revenue is Feburary 2011, with total revenue of $483903.870.
  • Revenue shows a seasonal trend over time.
  • Identifying these patterns can help with demand forecasting and inventory planning.

Business Recommendations

  • Promote frequently co-purchased products together as bundles
  • Ensure inventory availability for top-selling products
  • Focus marketing efforts on high-performing countries
  • Explore growth opportunities in lower-performing regions
  • Use monthly revenue trends to plan inventory and staffing

Repository Structure

online-retail-analysis/
├─ data/ # Original dataset (Online Retail.xlsx)
├─ sql/
│ ├─ schema.sql # Database schema creation
│ ├─ data_cleaning.sql # cleaning the dataset
│ └─ product_analysis.sql # Main queries/views for analysis
├─ visuals/ # Exported charts and figures
├─ notebooks/ # Jupyter notebook(s) with code and visualizations
├─ insights.md # Text-only insights summary
└─ README.md # This file


Technologies Used

  • SQL – Data extraction, aggregation, and views
  • Python – Pandas for data manipulation, Matplotlib & Seaborn for visualization
  • Jupyter Notebook – Interactive analysis and visualizations
  • SQLite – Lightweight database for structured data storage

How to Run

  1. Clone the repository:
git clone https://github.com/theHoodguy4587/online-retail-analysis.git
  1. Open the notebook in Jupyter:
jupyter notebook notebooks/online_retail_analysis.ipynb
  1. Ensure the dataset (Online Retail.xlsx) is in the data/ folder.

  2. Run the notebook cells sequentially to reproduce the analysis and generate visualizations.


Notes

  • All analyses and visualizations are reproducible using the provided notebook and SQL queries.

  • Insights and recommendations are documented in insights.md.

About

This project analyzes sales data from an online retail store to extract actionable insights on product performance, customer behavior, and temporal trends.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors