This project analyzes sales data from an online retail store to extract actionable insights on product performance, customer behavior, and temporal trends.
The goal is to support business decisions such as inventory planning, cross-selling strategies, and marketing campaigns.
The analysis includes:
- Top products by revenue
- Top countries by revenue
- Frequently co-purchased product pairs
- Monthly sales trends
All analyses are conducted using SQL queries and Python (Pandas, Matplotlib, Seaborn) in a Jupyter Notebook.
- Source: Online Retail dataset
- Rows: ___
- Columns: ___
- Key Fields:
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
Data Cleaning Steps:
- Removed rows with missing or null
CustomerIDwhere necessary - Removed negative or zero quantities where appropriate
- Created a
Revenuecolumn =Quantity * UnitPrice
- Identifies products generating the highest sales revenue.
- Shows which countries contribute the most to total sales.
- Identifies products frequently purchased together within the same invoice.
- Analyzes revenue trends across months to understand seasonality and growth patterns.
- The highest revenue–generating product is Dotcom postage, contributing approximately $ 200000 in total revenue.
- The lowest revenue among the top 10 products is Rabbit Night light, indicating a significant revenue gap between leading and trailing products.
- This concentration suggests that a small number of products have a disproportionately large impact on overall sales performance.
- The highest revenue–generating country is United Kingdom, accounting for approximately $ 8300000 in total revenue.
- The lowest revenue country within the top 10 is Sweden, highlighting uneven geographic distribution.
- This indicates strong market dominance in certain regions, while other markets may present growth opportunities.
- The most frequently co-purchased product pair is GREEN REGENCY TEACUP AND SAUCER + PINK REGENCY TEACUP AND SAUCER, with 893 co-purchases.
- The least frequent pair within the top 10 is JUMBO BAG RED RETROSPOT + JUMBO BAG BAROQUE BLACK WHITE, suggesting weaker but still notable association.
- Strong co-purchase patterns indicate complementary products that customers tend to buy together.
- These relationships can be leveraged to improve cross-selling, bundling strategies, and recommendation systems.
- The month with the highest revenue is Novemeber 2011, generating approximately $1300000 in revenue.
- The month with the lowest revenue is Feburary 2011, with total revenue of $483903.870.
- Revenue shows a seasonal trend over time.
- Identifying these patterns can help with demand forecasting and inventory planning.
- Promote frequently co-purchased products together as bundles
- Ensure inventory availability for top-selling products
- Focus marketing efforts on high-performing countries
- Explore growth opportunities in lower-performing regions
- Use monthly revenue trends to plan inventory and staffing
online-retail-analysis/
├─ data/ # Original dataset (Online Retail.xlsx)
├─ sql/
│ ├─ schema.sql # Database schema creation
│ ├─ data_cleaning.sql # cleaning the dataset
│ └─ product_analysis.sql # Main queries/views for analysis
├─ visuals/ # Exported charts and figures
├─ notebooks/ # Jupyter notebook(s) with code and visualizations
├─ insights.md # Text-only insights summary
└─ README.md # This file
- SQL – Data extraction, aggregation, and views
- Python – Pandas for data manipulation, Matplotlib & Seaborn for visualization
- Jupyter Notebook – Interactive analysis and visualizations
- SQLite – Lightweight database for structured data storage
- Clone the repository:
git clone https://github.com/theHoodguy4587/online-retail-analysis.git- Open the notebook in Jupyter:
jupyter notebook notebooks/online_retail_analysis.ipynb-
Ensure the dataset (Online Retail.xlsx) is in the data/ folder.
-
Run the notebook cells sequentially to reproduce the analysis and generate visualizations.
-
All analyses and visualizations are reproducible using the provided notebook and SQL queries.
-
Insights and recommendations are documented in insights.md.



