erDiagram
FACT_SALES }o--|| DIM_DATE : order_date_key
FACT_FULFILMENT }o--|| DIM_DATE : order_date_key
FACT_FULFILMENT }o--|| DIM_DATE : ship_date_key
FACT_SALES }o--|| DIM_CUSTOMER : customer_id
FACT_SALES }o--|| DIM_PRODUCT : product_card_id
FACT_SALES }o--|| DIM_CATEGORY : category_id
FACT_SALES }o--|| DIM_DEPARTMENT : department_id
FACT_SALES }o--|| DIM_GEO : geo_key
FACT_SALES }o--|| DIM_CHANNEL : channel_key
FACT_SALES }o--|| DIM_DISCOUNT_BAND : discount_band_key
| table_name | rows |
|---|---|
| gold.dim_date | 1133 |
| gold.dim_customer | 20652 |
| gold.dim_product | 118 |
| gold.dim_category | 51 |
| gold.dim_department | 11 |
| gold.dim_geo | 3772 |
| gold.dim_channel | 92 |
| gold.dim_discount_band | 6 |
| gold.fact_sales | 180519 |
| gold.fact_fulfilment | 180519 |
All dimensions passed uniqueness checks:
- dim_date 1133/1133
- dim_customer 20652/20652
- dim_product 118/118
- dim_category 51/51
- dim_department 11/11
- dim_geo 3772/3772
- dim_channel 92/92
- dim_discount_band 6/6
- fact_sales: 180519 rows, 180519 distinct order_item_id
- fact_fulfilment: 180519 rows, 180519 distinct order_item_id
fact_fulfilment null checks:
- order_item_id: 0
- order_id: 0
- customer_id: 0
- product_card_id: 0
- order_date_key: 0
- ship_date_key: 0
All FK coverage checks returned 0 missing keys:
- order_date_dim, customer_dim, product_dim, geo_dim, channel_dim, discount_band_dim, category_dim, department_dim: 0
- fact_rows: 180519
All FK coverage checks returned 0 missing keys:
- order_date_dim, ship_date_dim, customer_dim, product_dim, geo_dim, channel_dim, category_dim, department_dim: 0
- fact_rows: 180519
- discount_rate range: 0 to 0.25
- min net_sales: 7.489999771
- min gross_sales: 9.989999771
- min profit: -4274.97998 (negative profit exists and is treated as a valid business scenario)
- shipping_days_variance range: -2 to 4
- Late lines — two definitions (reconciled):
late_delivery_risk = 1: 98,977 / 180,519 = 54.8% — equals exactly the rows wheredelivery_status = 'Late delivery'. This is the dashboard'sLate Delivery Rate %.is_late_by_days ≥ 1(actual shipping days > scheduled): 103,400 / 180,519 = 57.3% — a broader operational lateness signal.- Both are correct against their own definition; the dashboard standardises on the 54.8% flag. (The original v1 snapshot here reported only the 57.3% figure.)
delivery_statusdistribution: Late delivery 98,977 · Advance shipping 41,592 · Shipping on time 32,196 · Shipping canceled 7,754.
The v2 commercial enrichment (data-pipeline/02_gold_schema_remediation.py) runs on the CSV export and self-reports a PASS / MOCKED diagnostic matrix every run. PASS = structural/integrity check passed on real data; MOCKED = column is benchmark-synthesized (DataCo has no such field — see 03 §6 and 01 §2A).
| Check | Result |
|---|---|
Row count fact_sales / fact_fulfilment = 180,519 |
PASS |
PK uniqueness order_item_id (both facts) |
PASS |
| FK coverage — all 12 FK paths, zero orphans | PASS |
dim_contract_terms written (9 rows: Bronze/Silver/Gold × 3 segments) |
PASS |
Profitability sanity — total_logistics_cost ≤ gross_sales on every row |
PASS (0 violations) |
Chronological — order_date ≤ expected_delivery_date |
PASS |
| Ship-date orphan check | PASS |
Discount sanity — discount_amount ≤ gross_sales |
PASS |
Waterfall bridge — gross_revenue − discount_amount = net_sales |
PASS |
CTS/ABC drivers, freight/SLA/rebate rates, dates, is_promotional, contract tiers |
MOCKED (benchmark) |
Honesty note: the MOCKED rows are the cost/SLA/rebate rates and physical attributes, deterministically generated to industry benchmarks. Every structural and integrity check runs on the real 180,519-row fact data and passes. The seed is fixed (42) so re-runs are reproducible.
Gold schema is structurally valid:
- No row loss vs Silver (180,519 in both facts)
- Grain enforced (1 row per
order_item_id) - Dimensions unique; 100% FK coverage
- v2 enrichment integrity checks all PASS; synthesized columns clearly labelled
Gold is ready for the CSV export serving layer (and was loaded to Snowflake during the trial) and modelling in Power BI.