Skip to content

Latest commit

 

History

History
231 lines (176 loc) · 8.13 KB

File metadata and controls

231 lines (176 loc) · 8.13 KB

Monitoring

Generate a local monitoring snapshot after building the DuckDB marts:

uv run python -m src.monitoring.snapshot --snapshot-date 2025-06-30

Default output:

artifacts/monitoring/snapshot_date=2025-06-30/monitoring_snapshot.json
artifacts/monitoring/snapshot_date=2025-06-30/monitoring_snapshot.md

The snapshot checks:

  • DuckDB mart availability.
  • Activation rate and activation mart freshness.
  • Experiment support, complaint, and app-crash guardrails.
  • Pricing exposure coverage, net margin, complaint rate, and human-review load.
  • Pricing recommendation coverage.
  • Activation batch score extract availability.
  • API contract file readiness.

fail means the release should stop. warn means the project can still run, but the result needs human review before a public release or ramp-up.

The Streamlit dashboard also includes a Monitoring tab that computes the same snapshot against the current DuckDB path and shows the overall status, status counts, attention items, and full check table.

Scheduled Monitoring Workflow

The repository includes a GitHub Actions workflow at .github/workflows/monitoring-snapshot.yml that can run manually or on a weekly schedule. It rebuilds a deterministic synthetic warehouse, trains the activation model artifact, generates batch activation scores, writes the product monitoring snapshot, writes the activation model monitoring report, and uploads the monitoring/scoring outputs as workflow artifacts.

Run it from GitHub Actions with Monitoring Snapshot > Run workflow before a portfolio refresh or public demo review. The workflow is intentionally synthetic: it proves the operational path without requiring real customer data, secrets, or cloud warehouse credentials.

On a first run, the activation model monitoring report can return warn for score-distribution drift because no previous score extract exists yet. In a live setup, the previous successful artifact or warehouse score table would be passed as the reference extract.

Activation Model Monitoring

After generating daily activation scores, create a model monitoring report:

uv run python -m src.monitoring.model_report --report-date 2025-06-30

For score-distribution drift, pass a previous score extract as the reference:

uv run python -m src.monitoring.model_report `
  --score-path artifacts/scoring/activation/score_date=2025-06-30/customer_scores_daily.parquet `
  --reference-score-path artifacts/scoring/activation/score_date=2025-06-23/customer_scores_daily.parquet `
  --report-date 2025-06-30

Default output:

artifacts/monitoring/model_activation/report_date=2025-06-30/activation_model_monitoring.json
artifacts/monitoring/model_activation/report_date=2025-06-30/activation_model_monitoring.md

The model report checks probability bounds, score volume, targeting rate, vulnerable-customer review load, threshold validity, and score-distribution PSI. Use fail as a release stop, and use warn as a human-review trigger before a rollout or public demo refresh.

BigQuery Score Monitoring

After loading neobank_ml.customer_scores_daily, render the warehouse-side score monitoring query:

uv run python -m src.cloud.bigquery_score_monitoring_plan `
  --score-date 2025-06-30 `
  --project neobank-growth-platform-ross `
  --dataset neobank_ml `
  --location EU `
  --min-rows 5000

The query checks scored-user volume, duplicate users, model-version count, targeting rate, vulnerable-customer review load, probability bounds, and score quantiles directly in BigQuery. Treat monitoring_status = fail as a release stop and monitoring_status = warn as a human-review trigger.

The demo GCP score monitoring query was exercised on 2026-05-31 for score date 2025-06-30 and returned monitoring_status = pass: 5,000 scored users, 5,000 unique users, 1 model version, 1,390 targeted users, 27.80% targeting rate, 191 vulnerable-review users, 3.82% vulnerable-review rate, and activation probabilities bounded from 0.0000 to 1.0000.

For scheduled cloud monitoring, render Cloud Scheduler triggers for the Cloud Run Jobs. Deploy the runnable job image and Cloud Run Jobs first:

uv run python -m src.cloud.cloud_run_job_deploy_plan `
  --project neobank-growth-platform-ross `
  --project-number 319492039091 `
  --region europe-west2 `
  --bucket neobank-growth-platform-ross-raw `
  --bq-location EU `
  --bq-ml-dataset neobank_ml `
  --bq-monitoring-dataset neobank_monitoring `
  --score-date 2025-06-30 `
  --users 5000 `
  --months 6

Omit --score-date for a rolling daily schedule; keep it for the reproducible portfolio demo run.

Then render the scheduler triggers:

uv run python -m src.cloud.cloud_run_scheduler_plan `
  --project neobank-growth-platform-ross `
  --project-number 319492039091 `
  --run-region europe-west2 `
  --scheduler-region europe-west2 `
  --service-account-email neobank-scheduler@neobank-growth-platform-ross.iam.gserviceaccount.com

The default monitoring cadence runs scoring at 06:00 Europe/London and score monitoring at 06:30 Europe/London. Keep the second job dependent in practice by scheduling it after the scoring job's usual completion window and by treating a missing or low-row score partition as a monitoring failure.

The demo GCP schedules were resumed and verified on 2026-05-31:

gcloud scheduler jobs list --location=europe-west2

Expected active schedules:

neobank-daily-activation-scoring  0 6 * * * (Europe/London)   ENABLED
neobank-daily-score-monitoring    30 6 * * * (Europe/London)  ENABLED

Enabled log-based alert policies watch Cloud Run Job and private API service error logs:

gcloud monitoring policies list --format="table(displayName,enabled)"

Expected alert policies:

Neobank Cloud Run job failure alert  True
Neobank API service failure alert    True

The scheduled job alert filter is:

resource.type="cloud_run_job"
resource.labels.job_name=~"neobank-(activation-score-load|score-monitoring)"
severity>=ERROR

The private API service alert filter is:

resource.type="cloud_run_revision"
resource.labels.service_name="neobank-api"
severity>=ERROR

A project budget alert is configured as the cost-control guardrail for the demo GCP project. Budget alerts are notifications, not hard spend caps, so pause the schedules if an alert fires unexpectedly.

Realised-Label Calibration Monitoring

After D7 outcomes have matured for a scored cohort, generate the calibration report by joining score extracts to realised activation labels:

uv run python -m src.monitoring.calibration_report `
  --score-path artifacts/scoring/activation/score_date=2025-06-30/customer_scores_daily.parquet `
  --db neobank.duckdb `
  --report-date 2025-07-07

You can also provide a label extract with --label-path when labels are exported from a warehouse table. The file must contain user_id and activated_d7.

Default output:

artifacts/monitoring/model_activation_calibration/report_date=2025-07-07/activation_calibration_monitoring.json
artifacts/monitoring/model_activation_calibration/report_date=2025-07-07/activation_calibration_monitoring.md

The calibration report checks matched label coverage, sample size, expected calibration error, Brier score, portfolio prediction bias, and the largest segment calibration gap across income segment, signup channel, and region. Run this after the prediction window closes; before then, use the score-distribution report as the early-warning signal.

Operational Policy

Use this lightweight release gate before refreshing public screenshots or ramping a synthetic rollout:

  1. Run dbt build.
  2. Generate batch activation scores.
  3. Run the monitoring snapshot.
  4. Run the score-distribution report against a recent reference extract.
  5. After D7 labels mature, run the calibration report.

Stop the release when any report returns fail. Review the affected check, regenerate upstream data only if the failure is caused by stale local artifacts, and document the decision before continuing. Treat warn as a human-review state: acceptable for a demo when explained, but not for an unattended rollout.