Thanks for contributing to fraud-detection-streaming.
- Target Python 3.11.
- Run
ruff check producersbefore opening a PR. - Prefer small, testable functions and deterministic behavior where practical.
- Keep generator payloads schema-compatible with SQL sources.
- Keep pipeline layered:
models/sources/- Kafka sourcesmodels/staging/- Type casting and enrichmentmodels/fraud_signals/- Detection logicmodels/risk_aggregations/- Risk scores and KPIsmodels/case_management/- Analyst workflows
- Use
{{ ref('model_name') }}for dependencies between models. - Use
{{ config(materialized='materialized_view') }}for most models. - Avoid breaking downstream model contracts without coordinated updates.
- Document non-obvious thresholds and window choices inline.
- Prefer explicit versions over floating tags for production use.
- Preserve health checks and startup ordering.
- Keep changes idempotent (
seed-topics, SQL init).
make up
make validate
make dbt-test
pytest producers/tests -q
ruff check producersUseful:
make status
make logs-producer
make psqlmain: stable integration branch.- Feature branches:
feat/<area>-<short-description> - Fix branches:
fix/<area>-<short-description> - Docs branches:
docs/<area>-<short-description>
Rebase on latest main before merge when possible.
Each PR should include:
- Clear summary of behavior changes
- Risk assessment (what can break)
- Validation evidence (test output,
make validate, or SQL checks) - Rollback plan for operationally significant changes
- CI status green (
.github/workflows/ci.yml)
If dashboards or user-visible flows changed, include screenshots.
Reviewers should verify:
- Behavior matches requirement and is backward-compatible or clearly documented.
- dbt models and joins preserve intended semantics.
- Model dependencies are correctly specified using
{{ ref() }}. - Event schema fields and types remain consistent end-to-end.
- Errors are logged with actionable context.
- Retry/backoff behavior is safe.
- Health checks still reflect real service readiness.
- No credentials or secrets in code/docs.
- No unsafe default exposure added (ports, unauthenticated admin APIs).
- Sensitive data handling is appropriate for simulation scope.
- New windows/joins are bounded and justified.
- High-cardinality operations are minimized.
- Producer thread behavior does not introduce unbounded growth.
- Tests cover new logic and edge cases.
- dbt tests defined for data quality guardrails.
- dbt model documentation added where appropriate.
- README/docs updated when behavior or operations change.