Full-Funnel AI Analytics — Agent Instructions

This file is loaded automatically at the start of every conversation. All rules here are mandatory when generating dashboards, charts, tables, or any metric output from MCP data.

1. Canonical Metric Definitions

These are the single source of truth. Never deviate without explicit user instruction.

CVR — Two valid definitions; always label which one

Name	Formula	When to use	UI label
Session CVR	`conversions / sessions`	Cross-channel funnel analysis; GA4 data	`CVR (session)`
Click CVR	`conversions / clicks`	Within-platform campaign comparison (Google Ads, Meta Ads)	`CVR (click)`

Rule: Never mix session CVR and click CVR on the same chart or table without explicit labelling of each. Session CVR ≈ 2–4% for e-commerce. Click CVR ≈ 5–20%. Values are not comparable across definitions.

ROAS

Source	Formula	Notes
Meta Ads (platform)	Use `roas` field directly from MCP response	Platform-reported; last-click, 7-day window
Google Ads (estimated)	`(conversions × AOV) / spend` where `AOV = $100`	Google MCP does not return revenue; use $100 AOV consistently
Blended ROAS	`total_attributed_revenue / total_spend`	Linear attribution; must label as "Linear · 90d"

Rule: Never use a hardcoded spend multiplier (e.g., spend × 3.5 or spend × 0.8) to estimate ROAS. Always use conversions × $100 for Google. Always use platform roas field for Meta.

CTR

clicks / impressions — always expressed as a percentage.

Conversion (funnel stages)

Stage	Definition	Source
Sessions	GA4 session events	GA4 MCP
Conversions	GA4 conversion events	GA4 MCP
Deals Won	CRM closed-won deals	HubSpot / Salesforce MCP
CRM Contacts	All contacts in CRM (all-time)	HubSpot MCP

Critical rule: CRM Contacts is an all-time, lifetime count. It must never appear as a funnel step after GA4 Conversions. A funnel step must always be ≤ the step above it. If CRM Contacts > GA4 Conversions, display CRM Contacts as a separate KPI card with label (HubSpot all-time), not as a funnel stage.

Win Rate

closed_won_deals / (closed_won_deals + closed_lost_deals)

2. Attribution Windows

Metric	Window	Model	Label to show in UI
Blended ROAS	90-day query period	Linear	`Linear attribution · 90d`
Channel ROAS	90-day query period	Linear	`Linear attribution · 90d`
Platform ROAS (Meta)	7-day click, 1-day view	Last-click	`Meta platform · 7-day window`
Platform ROAS (Google)	30-day (default)	Last-click	`Google est. · AOV $100`
GA4 CVR	Query period	Session-level	`Session CVR`
Pipeline / Closed Won (CRM)	All-time by default	N/A	Always label `CRM all-time`

Rule: Every ROAS and CVR KPI card must display its attribution window in the subtitle. Unlabelled ROAS is not acceptable.

3. Revenue Scopes — These are different things; never conflate them

Label	Definition
`Attributed Revenue (90d)`	GA4-tracked orders within the query period, attributed via linear model across paid channels
`Salesforce Closed Won (90d)`	Revenue from SF opportunities closed in the 90-day period
`CRM Closed Won (all-time)`	Lifetime closed-won revenue across HubSpot + Salesforce combined

When a dashboard shows both a pipeline KPI and a revenue attribution chart, explicitly annotate which scope each uses.

4. Channel Attribution Percentages

Attribution shares across channels must always sum to exactly 100%. Before rendering any pie/donut chart of channel attribution:

Sum all values
If sum ≠ 100, normalise: value = (value / sum) × 100
Round to 1 decimal place

5. Funnel Integrity Rules

A valid funnel requires each step ≤ the step above it:

Sessions > Engaged Sessions > Conversions > Deals Won

If any data violates this (e.g., a CRM count > GA4 conversions), do not include it in the funnel. Instead:

Display it as a separate KPI card
Label its scope clearly (e.g., "all-time", "different source")

6. Data Freshness

Every generated dashboard must include a data freshness indicator showing the query period. Format: [Start date] – [End date] · Data as of [today]

Default query period: last 90 days.

7. Spend Field Mapping

Platform	MCP field	Notes
Google Ads	`cost`	In USD
Meta Ads	`spend`	In USD

Use cost for Google, spend for Meta. Never swap them.

8. AOV Assumption

Average Order Value (AOV) = $100 for all Google ROAS estimates.

This is derived from Meta platform data: average revenue per conversion across Meta campaigns ≈ $99–$101. This assumption must be used consistently. If the user provides a different AOV, update it everywhere in the same response.

9. Queried Date Range (Default)

Synthetic Dataset (this project's default)

The dataset grows daily via daily_synthetic_append.py. The anchor date auto-detects MAX(date) from the database so all windows always cover the latest appended data.

Variable	Resolved at runtime	Notes
`anchor`	`MAX(date)` from DB	Latest appended day
`window_start`	`anchor - 89 days`	90-day window start
`window_end`	`anchor`	90-day window end

Rule for synthetic data: Read window bounds from golden_metrics.json _meta.window_start / _meta.window_end — never hardcode dates. These are regenerated daily and always reflect the latest data.

How to identify synthetic data: The dataset is in use when sourcing from data/olist_analytics.duckdb or data/mock_marketing/*.csv.

Available Pre-Computed Windows

golden_metrics.json (schema v2.2) contains these sections (dates roll forward with each daily refresh):

JSON key	Window	Label
`windowed_7d`	anchor-6d → anchor	Last 7 days
`windowed_30d`	anchor-29d → anchor	Last 30 days
`windowed_60d`	anchor-59d → anchor	Last 60 days
`windowed_90d`	anchor-89d → anchor	Canonical 90-day window
`windowed_180d`	anchor-179d → anchor	Last 180 days
`month_YYYY_MM`	full calendar month	Last 3 months relative to anchor
`all_time`	dataset_start → anchor	Full dataset range

The scheduled refresh (scheduled-refresh.yml) regenerates all windows daily at 06:00 UTC.

On-Demand Queries — Any Date Range

For windows NOT listed above (e.g., last 45 days, Q1 2026, a custom range), use scripts/query_window.py. It queries the CSV source files directly via DuckDB and returns the same schema as a golden_metrics.json section.

python scripts/query_window.py --last-days 45
python scripts/query_window.py --last-days 1          # yesterday (anchor-relative)
python scripts/query_window.py --month 2025-11        # November 2025
python scripts/query_window.py --year 2025            # full year
python scripts/query_window.py --start 2025-10-01 --end 2026-03-15
python scripts/query_window.py --last-days 60 --output metrics_60d.json

Python API:

from datetime import date
from scripts.query_window import query_window
section = query_window(date(2026, 1, 1), date(2026, 3, 15))
print(section["blended_roas"])   # same schema as golden_metrics.json

Labelling rule: Any dashboard built from query_window.py output must display:

⚡ Ad-hoc query · not from golden layer

This distinguishes it from zero-drift golden layer dashboards.

CRM data is always all-time regardless of the query window — it is a lifetime count (see §1 and §5). query_window.py labels this with _note.

Live / Real Dataset (open-source swap-in)

This project is open source. If the user connects a real marketing data source (live MCP, BigQuery, Supabase, Snowflake, or any daily-refreshed dataset), use the rolling window instead:

end_date: today (date.today())
start_date: 90 days before today (today - 90 days)

How to identify live data: User says "connect my real data", "use BigQuery", "use production", or configures a non-DuckDB dbt target.

Always pass explicit dates to MCP tools. Never use open-ended queries.

10. Dashboard Generation Rules

When generating any HTML dashboard or React artifact from MCP data:

Always read the data first, then compute metrics. Never hardcode values that should come from MCP.
Label every metric with its formula basis (session vs click CVR, attribution window on ROAS).
Do not use synthetic multipliers to estimate revenue for Google Ads. Always use conversions × $100.
Validate funnel stage ordering before rendering. Fix any stage-ordering violations.
Normalise attribution percentages to 100% before rendering pie/donut charts.
Separate CRM lifetime data from 90-day attributed data — use different section headings or KPI cards.
Include a data freshness badge in the dashboard header.

11. Skill Quick Reference

Each skill has two modes: golden (default, reads golden_metrics.json) and -mcp (live, queries mock MCP servers directly). The golden mode guarantees zero metric drift; the -mcp mode shows raw real-time platform data.

Golden Layer Skills (default — read `dashboards/golden_metrics.json`)

Skill	Data source	Primary metric
`/marketing`	`windowed_90d` + `all_time` sections	Blended ROAS (linear · 90d)
`/campaign`	`windowed_90d.campaigns`	Platform ROAS per campaign
`/attribution`	`windowed_90d.attribution_by_channel`	Channel revenue share (linear · 90d)
`/traffic`	`windowed_90d.ga4_by_channel`	Session CVR by channel
`/pipeline`	`all_time.crm`	Win rate, pipeline value (CRM all-time)

Live MCP Skills (opt-in — bypasses golden layer)

Triggered by appending -mcp to any skill name or by user requesting "live", "real-time", or "raw platform" data.

Skill	MCP servers queried	Notes
`/marketing-mcp`	google-ads, meta-ads, ga4, hubspot, salesforce	Add ⚡ Live MCP badge
`/campaign-mcp`	google-ads, meta-ads	Per-campaign raw platform data
`/attribution-mcp`	ga4, google-ads, meta-ads	Raw channel attribution
`/traffic-mcp`	ga4	Live session data
`/pipeline-mcp`	hubspot, salesforce	Live CRM data

Rule for MCP skills: Pass dates from golden_metrics.json _meta.window_start / _meta.window_end for synthetic data (these update daily with the appended data). Use rolling today() - 90 for live datasets (see §9).

12. Dbt Semantic Layer Reference

Canonical metric definitions live in: dbt_project/models/metrics/metrics.yml

When in doubt about a metric formula, read that file. It is the single source of truth for computed metrics.

Key metrics:

session_conversion_rate = total_orders / total_sessions ← canonical cross-channel CVR
blended_roas = total_revenue / total_spend (linear attribution, 90d)
channel_roas = channel_revenue / channel_spend (linear attribution, 90d)

13. Shared Metrics Module (code-level enforcement)

dashboards/js/metrics.js is the mandatory runtime enforcement of all metric rules.

All new HTML dashboards MUST load it before any inline script block:

<script src="js/metrics.js"></script>

NEVER compute CVR, ROAS, or attribution percentages inline. Use canonical functions:

Function	Formula	When to use
`Metrics.googleROAS(conv, cost)`	`conv × $100 / cost`	All Google Ads ROAS
`Metrics.metaROAS(platformRoas)`	pass-through	Meta platform roas field
`Metrics.sessionCVR(conv, sessions)`	`conv / sessions × 100`	Cross-channel, GA4
`Metrics.clickCVR(conv, clicks)`	`conv / clicks × 100`	Platform campaign tables
`Metrics.normaliseAttribution(channels)`	normalises `val` to sum 100%	Any pie/donut/attribution bar
`Metrics.validateFunnel(steps)`	returns `issues[]`	Before rendering any funnel

Use Metrics.labels.* for canonical KPI subtitle strings (attribution windows, CVR basis).

Existing dashboards that already use metrics.js:

attribution_dashboard.html
marketing_full_funnel_dashboard.html
full_funnel_marketing_dashboard.html

14. Golden Layer First — Mandatory Data Sourcing

This project's purpose is zero metric drift between dbt and dashboards. The following rules are mandatory.

Default: Read `dashboards/golden_metrics.json`

Before generating any dashboard, read dashboards/golden_metrics.json.
This file is pre-computed from the dbt golden layer (DuckDB mart tables).
Copy exact values from this file into HTML dashboard JS constants. Do NOT recalculate metrics independently.
Check _meta.available_windows to see which windows are pre-computed.
Use windowed_90d for the canonical 90-day view.
Use all_time only when explicitly showing lifetime data.
For any other window (30d, monthly, custom range) — see §9 query_window.py.

Why: If you recalculate metrics from MCP data, rounding differences and non-deterministic AI arithmetic will cause drift versus the golden layer. Reading pre-computed values guarantees bit-for-bit reproducibility.

Exception: Live MCP Querying (explicit opt-in only)

Only bypass golden_metrics.json when the user explicitly says one of:

"query live data" / "get real-time data" / "use MCP"
"show raw platform data" / "bypass golden layer" / "live from MCP"

When in Live MCP mode:

Pass dates from golden_metrics.json _meta.window_start / _meta.window_end
Add to the dashboard header badge: ⚡ Live MCP — may differ from golden layer
Still use metrics.js canonical functions for all calculations
For HubSpot/Salesforce, pass the same dates to get_deal_pipeline_summary() and get_opportunity_pipeline() — these servers now support date filtering.

Never Mix Scopes

Never divide all-time revenue by 90-day spend (this was the source of the nonsensical 71.1× Blended ROAS bug)
If using windowed_90d: ALL numbers must come from that section only
If using all_time: ALL numbers must come from that section only
Label every ROAS and CVR with its scope per §2 Attribution Windows rules

Data Source Decision Tree

User asks for data or a dashboard
  │
  ├─ Skill ends in "-mcp"  OR  user says "live", "real-time", "raw platform"?
  │   ├─ YES → Use MCP servers
  │   │         · Synthetic:  dates from golden_metrics.json _meta.window_start/end
  │   │         · Live/real:  dates = today()-90d → today()
  │   │         · Add ⚡ Live MCP badge
  │   │         · Use metrics.js for all calculations
  │   └─ NO  → Use golden layer (default)
  │
  └─ What date window does the user want?
      │
      ├─ 90d (default), 7d, 30d, 60d, 180d, or a recent calendar month?
      │   └─ Read dashboards/golden_metrics.json
      │       · Check _meta.available_windows for the exact key
      │       · Copy exact values — no recalculation
      │       · Use all_time only for lifetime/CRM views
      │
      ├─ Any other window (45d, last quarter, Q1, custom range, etc.)?
      │   └─ Run python scripts/query_window.py
      │       · --last-days N  |  --month YYYY-MM  |  --year YYYY
      │       · --start YYYY-MM-DD --end YYYY-MM-DD
      │       · OR: from scripts.query_window import query_window
      │       · Label dashboard: ⚡ Ad-hoc query · not from golden layer
      │       · CRM data is still all-time in the result
      │
      └─ Does golden_metrics.json exist?
          ├─ YES → Use it
          └─ NO  → Tell user: python scripts/generate_golden_metrics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full-Funnel AI Analytics — Agent Instructions

1. Canonical Metric Definitions

CVR — Two valid definitions; always label which one

ROAS

CTR

Conversion (funnel stages)

Win Rate

2. Attribution Windows

3. Revenue Scopes — These are different things; never conflate them

4. Channel Attribution Percentages

5. Funnel Integrity Rules

6. Data Freshness

7. Spend Field Mapping

8. AOV Assumption

9. Queried Date Range (Default)

Synthetic Dataset (this project's default)

Available Pre-Computed Windows

On-Demand Queries — Any Date Range

Live / Real Dataset (open-source swap-in)

10. Dashboard Generation Rules

11. Skill Quick Reference

Golden Layer Skills (default — read `dashboards/golden_metrics.json`)

Live MCP Skills (opt-in — bypasses golden layer)

12. Dbt Semantic Layer Reference

13. Shared Metrics Module (code-level enforcement)

14. Golden Layer First — Mandatory Data Sourcing

Default: Read `dashboards/golden_metrics.json`

Exception: Live MCP Querying (explicit opt-in only)

Never Mix Scopes

Data Source Decision Tree

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Full-Funnel AI Analytics — Agent Instructions

1. Canonical Metric Definitions

CVR — Two valid definitions; always label which one

ROAS

CTR

Conversion (funnel stages)

Win Rate

2. Attribution Windows

3. Revenue Scopes — These are different things; never conflate them

4. Channel Attribution Percentages

5. Funnel Integrity Rules

6. Data Freshness

7. Spend Field Mapping

8. AOV Assumption

9. Queried Date Range (Default)

Synthetic Dataset (this project's default)

Available Pre-Computed Windows

On-Demand Queries — Any Date Range

Live / Real Dataset (open-source swap-in)

10. Dashboard Generation Rules

11. Skill Quick Reference

Golden Layer Skills (default — read dashboards/golden_metrics.json)

Live MCP Skills (opt-in — bypasses golden layer)

12. Dbt Semantic Layer Reference

13. Shared Metrics Module (code-level enforcement)

14. Golden Layer First — Mandatory Data Sourcing

Default: Read dashboards/golden_metrics.json

Exception: Live MCP Querying (explicit opt-in only)

Never Mix Scopes

Data Source Decision Tree

Golden Layer Skills (default — read `dashboards/golden_metrics.json`)

Default: Read `dashboards/golden_metrics.json`