🚀 Production Readiness Guide

How to deploy this project to BigQuery or Snowflake, automate daily data refreshes, run the test suite, and verify that every dashboard number matches your warehouse exactly.

Promoting v0.9 → v1.0 — Pre-Release Checklist

Before flipping the GitHub release tag from v0.9.0 pre-release to v1.0.0, walk through this checklist. Each item is small but catches a different class of failure.

1. CI/CD must be green on `main`

ci.yml — last run on main exits 0 (validates real Olist-anchored data after PR #5)
warehouse-deploy.yml — both BigQuery and Snowflake jobs pass, OR the Snowflake job is intentionally marked continue-on-error: true if you don't have credits
scheduled-refresh.yml — at least one successful daily run in the Actions tab
daily-synthetic-data.yml — at least one successful daily run; check that the auto-commit lands on main

2. Required GitHub secrets are configured

Go to Settings → Secrets and variables → Actions and confirm each:

GCP_PROJECT_ID
GCP_SERVICE_ACCOUNT_KEY_JSON (full JSON contents, not a path)
SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_WAREHOUSE, SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA

The DuckDB PR-gate workflow needs no secrets — it runs entirely on the GitHub Actions runner.

3. Local end-to-end run passes

From a fresh clone (or after rm -rf data/olist_analytics.duckdb):

python scripts/load_duckdb.py
cd dbt_project && dbt run --target duckdb && dbt test --target duckdb && cd ..
python scripts/generate_golden_metrics.py
python scripts/validate_metrics.py     # must exit 0
pytest tests/ -v                       # must pass

dbt run completes with no errors
dbt test passes (no not_null, unique, or relationships test failures)
validate_metrics.py exits 0
pytest passes (skipping test_api.py if the ML model isn't trained is acceptable)

4. Dashboards render with correct data

streamlit run streamlit_app/app.py — check that the sidebar shows the correct connection badge and the Data Sources page loads
Open dashboards/full_funnel_marketing_dashboard.html locally and verify the KPI values match dashboards/golden_metrics.json
In Claude Code, type /marketing — values should match the HTML dashboard exactly

5. Documentation links resolve

Every link in README.md Documentation table opens the right file
docs/architecture.md renders correctly (no broken anchors)
The architecture v2 image displays at the top of the README

6. dbt Fusion compatibility (if applicable)

dbt Fusion is stricter than dbt Core. If you use it locally, verify:

metricflow_time_spine.sql — SQL comments are OUTSIDE the {{ config() }} Jinja block
metrics.yml and sem_marketing.yml — start with version: 2 (legacy YAML without it triggers warning dbt1157)
dbt parse runs clean with no dbt1502 errors

These three points are now enforced in the codebase as of PR #5.

7. Release hygiene

PR #5 (or its successor) is merged to main
All stale feature branches are deleted (claude/*, feat/*)
The v0.9.0 GitHub release has been live for at least 24 hours with no surprises
Edit the existing release: uncheck "Pre-release", change tag to v1.0.0, click Update — OR draft a fresh v1.0.0 release referencing the v0.9 testing period in the notes

8. LinkedIn post readiness

Repo description on GitHub reflects v1.0 scope (currently mentions $0/mo, 5 warehouses)
README hero gif / video loads on a fresh browser session (no auth)
Architecture v2 image renders correctly when opened from the GitHub-hosted README

What Was Built

Component	File(s)	Purpose
PR gate	`.github/workflows/ci.yml`	Compile + test every pull request on DuckDB (fast, no cloud creds)
Warehouse deploy	`.github/workflows/warehouse-deploy.yml`	Deploy to BigQuery and Snowflake on every push to `main`
Scheduled refresh	`.github/workflows/scheduled-refresh.yml`	Daily 6 AM UTC — re-run dbt, regenerate golden metrics
Daily synthetic data	`.github/workflows/daily-synthetic-data.yml`	Daily 5 AM UTC — append one new day to the mock dataset
Warehouse adapters	`scripts/_warehouse_adapters.py`	Uniform connection layer for DuckDB / BigQuery / Snowflake
Multi-target generate	`scripts/generate_golden_metrics.py`	`--target` and `--live` flags added; default (DuckDB) unchanged
Multi-target validate	`scripts/validate_metrics.py`	`--target` flag; compares warehouse mart tables to golden JSON
Daily append script	`scripts/daily_synthetic_append.py`	Appends N realistic days to mock CSVs
Test suite	`tests/`	20+ pytest assertions on golden metrics structure + API endpoint
Connector UI	`streamlit_app/pages/connectors.py`	Streamlit Data Sources page (file upload, warehouse config, MCP status)

1. CI/CD — GitHub Actions Setup

1.1 Required GitHub Secrets

Go to your repository → Settings → Secrets and variables → Actions → New repository secret.

BigQuery

Secret name	Value	Where to get it
`GCP_PROJECT_ID`	Your GCP project ID, e.g. `my-project-123`	GCP Console → project selector
`GCP_SERVICE_ACCOUNT_KEY_JSON`	Full contents of a service account JSON key file	See §1.2 below

Snowflake

Secret name	Value	Where to get it
`SNOWFLAKE_ACCOUNT`	Account identifier, e.g. `xy12345.us-east-1`	Snowflake UI → bottom-left profile → Account details
`SNOWFLAKE_USER`	Your Snowflake username	Snowflake admin
`SNOWFLAKE_PASSWORD`	Your Snowflake password	Snowflake admin
`SNOWFLAKE_WAREHOUSE`	Virtual warehouse name, e.g. `ANALYTICS_WH`	Snowflake → Warehouses
`SNOWFLAKE_DATABASE`	Database name, e.g. `OLIST_ANALYTICS`	Snowflake → Databases
`SNOWFLAKE_SCHEMA`	Schema name, e.g. `PUBLIC`	Default: `PUBLIC`

None of the DuckDB / local workflows need secrets — the ci.yml PR gate runs entirely on DuckDB with no cloud credentials.

1.2 Creating a BigQuery Service Account Key

# 1. Create a service account
gcloud iam service-accounts create dbt-ci \
  --display-name="dbt CI service account" \
  --project=YOUR_PROJECT_ID

# 2. Grant required roles
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:dbt-ci@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:dbt-ci@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

# 3. Download the key
gcloud iam service-accounts keys create /tmp/dbt-ci-key.json \
  --iam-account=dbt-ci@YOUR_PROJECT_ID.iam.gserviceaccount.com

# 4. Copy the ENTIRE contents of the JSON file into the GitHub Secret
cat /tmp/dbt-ci-key.json

Paste the entire JSON output (including the { } braces) as the value of GCP_SERVICE_ACCOUNT_KEY_JSON.

1.3 What Each Workflow Does

`ci.yml` — runs on every PR

checkout → pip install → dbt deps → generate mock data → load DuckDB
→ dbt run (DuckDB) → dbt test → generate golden metrics → validate drift
→ pytest tests/

If any step fails, the PR is blocked. The validate step prints exactly which metric drifted, what the golden value was, and what the live warehouse returned.

`warehouse-deploy.yml` — runs on push to `main`

Two jobs run in parallel — one for BigQuery, one for Snowflake:

load raw data into warehouse → dbt run --target <warehouse>
→ dbt test → generate golden metrics from warehouse
→ validate (must exit 0) → commit updated golden_metrics.json

The BigQuery job commits the updated golden_metrics.json back to the repo. This means every merge to main automatically refreshes the dashboards.

`scheduled-refresh.yml` — runs daily at 06:00 UTC

append today's synthetic data → load DuckDB → dbt run → dbt test
→ generate golden metrics (--live) → validate → commit golden_metrics.json

`daily-synthetic-data.yml` — runs daily at 05:00 UTC (before the refresh)

Appends one new day of synthetic data to the mock marketing CSVs. Can be triggered manually with a custom --days count via workflow_dispatch.

2. Verify Dashboard == Warehouse (The Guarantee)

The data chain is:

Raw data
  → dbt staging models (type casting, renaming)
  → dbt intermediate models (joins)
  → dbt mart tables (fct_marketing_daily, stg_marketing_attribution, ...)
  → generate_golden_metrics.py reads mart tables
  → dashboards/golden_metrics.json  ← all HTML dashboards read this
  → validate_metrics.py re-queries the warehouse and diffs vs JSON

To run the full verification locally:

# Step 1: run dbt on your warehouse of choice
dbt run --target bigquery    # or snowflake, or duckdb

# Step 2: generate the golden snapshot from that warehouse
python scripts/generate_golden_metrics.py --target bigquery

# Step 3: re-query the warehouse and compare to the snapshot
python scripts/validate_metrics.py --target bigquery
# exits 0 = in sync | exits 1 = drift (lists exactly which metrics)

# Step 4: open any HTML dashboard
# It reads golden_metrics.json — if step 3 passed, dashboard == warehouse

The validate script checks ~20 metrics including:

Total spend (Google, Meta, combined) — $1 tolerance
All-time and 90-day session counts — ±1 row
Blended ROAS — ±0.5% relative tolerance
Session CVR — ±0.5%
Per-channel ROAS (Google Ads, Meta Ads)
HubSpot total contacts
Salesforce closed-won revenue

3. Daily Synthetic Data

The synthetic dataset originally ended on 2026-03-15. The daily append script adds one new day each run so the dataset grows continuously — making dashboards feel like a live production system.

# Append 1 day (next day after the latest date in the CSV)
python scripts/daily_synthetic_append.py

# Backfill 30 days at once
python scripts/daily_synthetic_append.py --days 30

# Full reset (regenerates everything from scratch)
python scripts/daily_synthetic_append.py --reset

After appending, use --live to auto-detect the new latest date:

dbt run --target duckdb
python scripts/generate_golden_metrics.py --live   # uses max(date) from fct_marketing_daily
python scripts/validate_metrics.py --target duckdb

How the data is generated: Each new day's values are seeded by the calendar date (same date always produces identical numbers — reproducible), then scaled by:

Monthly seasonality (November = 1.8×, January = 0.7×)
Day-of-week factor (weekends = 0.8×)
±10–30% random noise per metric

The same 6 Google campaigns and 5 Meta campaigns appear every day with consistent IDs, so dbt joins and attribution models continue to work correctly.

4. Running the Test Suite

# Install test dependencies (already in requirements.txt)
pip install pytest pytest-httpx

# Run all tests
pytest tests/ -v

# Run only golden metrics tests (no warehouse needed)
pytest tests/test_golden_metrics.py -v

# Run only API tests (requires ml/lead_scoring_model.json)
pytest tests/test_api.py -v

The golden metrics tests run against the committed dashboards/golden_metrics.json and require no database connection. They will be skipped automatically if the file does not exist.

The API tests require the trained XGBoost model. To generate it:

python ml/src/train.py

5. Local Development with BigQuery or Snowflake

5.1 Copy and fill the profile

cp dbt_project/profiles.yml.example dbt_project/profiles.yml
# Edit profiles.yml and fill in your credentials
# profiles.yml is in .gitignore — it will never be committed

5.2 Run dbt against your warehouse

# BigQuery (uses OAuth — run `gcloud auth application-default login` first)
dbt run --target bigquery

# Snowflake
dbt run --target snowflake

# Generate golden metrics from your warehouse
python scripts/generate_golden_metrics.py --target bigquery
python scripts/validate_metrics.py --target bigquery

5.3 Switching back to DuckDB

dbt run --target duckdb           # default — no credentials needed
python scripts/generate_golden_metrics.py   # reads DuckDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Production Readiness Guide

Promoting v0.9 → v1.0 — Pre-Release Checklist

1. CI/CD must be green on `main`

2. Required GitHub secrets are configured

3. Local end-to-end run passes

4. Dashboards render with correct data

5. Documentation links resolve

6. dbt Fusion compatibility (if applicable)

7. Release hygiene

8. LinkedIn post readiness

What Was Built

1. CI/CD — GitHub Actions Setup

1.1 Required GitHub Secrets

BigQuery

Snowflake

1.2 Creating a BigQuery Service Account Key

1.3 What Each Workflow Does

`ci.yml` — runs on every PR

`warehouse-deploy.yml` — runs on push to `main`

`scheduled-refresh.yml` — runs daily at 06:00 UTC

`daily-synthetic-data.yml` — runs daily at 05:00 UTC (before the refresh)

2. Verify Dashboard == Warehouse (The Guarantee)

3. Daily Synthetic Data

4. Running the Test Suite

5. Local Development with BigQuery or Snowflake

5.1 Copy and fill the profile

5.2 Run dbt against your warehouse

5.3 Switching back to DuckDB

FilesExpand file tree

production_readiness_guide.md

Latest commit

History

production_readiness_guide.md

File metadata and controls

🚀 Production Readiness Guide

Promoting v0.9 → v1.0 — Pre-Release Checklist

1. CI/CD must be green on main

2. Required GitHub secrets are configured

3. Local end-to-end run passes

4. Dashboards render with correct data

5. Documentation links resolve

6. dbt Fusion compatibility (if applicable)

7. Release hygiene

8. LinkedIn post readiness

What Was Built

1. CI/CD — GitHub Actions Setup

1.1 Required GitHub Secrets

BigQuery

Snowflake

1.2 Creating a BigQuery Service Account Key

1.3 What Each Workflow Does

ci.yml — runs on every PR

warehouse-deploy.yml — runs on push to main

scheduled-refresh.yml — runs daily at 06:00 UTC

daily-synthetic-data.yml — runs daily at 05:00 UTC (before the refresh)

2. Verify Dashboard == Warehouse (The Guarantee)

3. Daily Synthetic Data

4. Running the Test Suite

5. Local Development with BigQuery or Snowflake

5.1 Copy and fill the profile

5.2 Run dbt against your warehouse

5.3 Switching back to DuckDB

1. CI/CD must be green on `main`

`ci.yml` — runs on every PR

`warehouse-deploy.yml` — runs on push to `main`

`scheduled-refresh.yml` — runs daily at 06:00 UTC

`daily-synthetic-data.yml` — runs daily at 05:00 UTC (before the refresh)