Skip to content

19Vermouth/racecast-telemetry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

F1 Real-Time Telemetry Analytics Platform

A production-grade streaming data platform for processing Formula 1 telemetry in real-time. Built with Kafka, Spark Structured Streaming, Delta Lake, dbt, and Great Expectations.

Architecture Overview

┌─────────────────┐     ┌──────────────────┐     ┌───────────────────────┐
│  Telemetry      │     │  Spark Streaming │     │   Delta Lake          │
│  Sources        │────▶│  Processing      │───▶│   Storage             │
│  (50 cars)      │     │  (Enrich/Aggr)   │     │   (Bronze/Silver/Gold)│
└─────────────────┘     └──────────────────┘     └───────────────────────┘
                              │                        │
                              ▼                        ▼
                       ┌──────────────┐        ┌──────────────┐
                       │  Kafka       │        │  dbt         │
                       │  Re-stream   │        │  Gold        │
                       │  (Enriched)  │        │  Models      │
                       └──────────────┘        └──────────────┘
                                    │            │
                                    ▼            ▼
                             ┌─────────────┐    ┌─────────────┐
                             │  Prometheus │    │  Grafana    │
                             │  + Grafana  │    │  Dashboards │
                             └─────────────┘    └─────────────┘

Key Features

  • Real-time Processing: Sub-second latency streaming pipeline processing 10K+ events/sec
  • Medallion Architecture: Bronze (raw) → Silver (enriched) → Gold (aggregated) layers
  • Data Quality: Great Expectations validation with quarantine policy for failed records
  • Schema Governance: Protobuf + JSON Schema with schema registry integration
  • Exactly-Once Semantics: Checkpoint-based recovery with idempotent writes
  • Observability: Prometheus metrics + Grafana dashboards for monitoring

Technology Stack

Component Technology Purpose
Messaging Apache Kafka 7.5 Event streaming backbone
Processing Spark Structured Streaming 3.4 Real-time stream processing
Storage Delta Lake 2.4 ACID storage layer
Analytics dbt 1.7 Transformation and gold layer
Quality Great Expectations 0.18 Data validation
Orchestration Docker Compose Container management
Monitoring Prometheus + Grafana Observability

Quick Start

Prerequisites

  • Docker 24.0+
  • Docker Compose 2.20+
  • 16GB RAM available

Start the Platform

# Start all services
make up

# Wait for services to be healthy
make wait

# Initialize schemas and seed test data
make init

# View logs
make logs

# Stop services
make down

Access Points

Service URL Credentials
Spark Master http://localhost:8080 -
Grafana http://localhost:3000 admin/admin
Prometheus http://localhost:9090 -
MinIO Console http://localhost:9001 minioadmin/minioadmin
Jupyter Notebook http://localhost:8888 -
Schema Registry http://localhost:8081 -

Project Structure

f1-telemetry/
├── proto/                    # Protocol Buffer schemas
│   └── telemetry_event.proto
├── schemas/                  # JSON Schemas
│   └── telemetry_event.json
├── spark/                    # Spark streaming jobs
│   └── telemetry_stream.py
├── sql/                      # Delta Lake DDL
│   └── delta_ddl.sql
├── docker/                   # Docker Compose config
│   └── docker-compose.yml
├── dbt/                      # dbt analytics engineering
│   ├── models/
│   │   └── silver/
│   ├── profiles.yml
│   └── dbt_project.yml
├── expectations/             # Great Expectations suites
│   └── telemetry_suite.py
├── monitoring/              # Observability configs
│   ├── prometheus.yml
│   └── grafana/
├── data/                    # Local Delta Lake storage
├── tests/                  # Test suites
├── Makefile               # Orchestration
├── .env.example           # Environment template
└── README.md

Data Model

Event Types

Event Type Frequency Description
position 10Hz GPS coordinates, speed, heading
dynamics 20Hz Throttle, brake, gear, G-forces
power_unit 50Hz Engine RPM, ERS, temperatures
tire 5Hz/corner Pressure, temperature, wear
aerodynamics 10Hz DRS, wing angles, brake balance
system On-change Errors, warnings, flags

Medallion Layers

Bronze (bronze.telemetry_events)

  • Raw events as received from Kafka
  • JSON payload for schema flexibility
  • Partitioned by: session_date, event_type

Silver (silver.telemetry_enriched)

  • Deduplicated, validated, flattened
  • Denormalized session context
  • Partitioned by: session_date, car_number

Gold (gold.lap_metrics, gold.session_metrics)

  • Aggregated metrics per lap/session
  • Materialized views for downstream consumers

Configuration

Environment Variables

Copy .env.example to .env and adjust:

# Kafka
KAFKA_BOOTSTRAP_SERVERS=kafka:9092

# Spark
SPARK_CHECKPOINT_PATH=/data/checkpoints
SPARK_SHUFFLE_PARTITIONS=20

# Delta Lake
DELTA_STORAGE_PATH=/data

# dbt
DBT_TARGET=prod

Key Parameters

Parameter Default Description
maxOffsetsPerTrigger 10000 Max events per micro-batch
triggerInterval 10s Micro-batch frequency
checkpointInterval 30s Checkpoint save frequency

Running the Pipeline

Start Spark Streaming Job

# Inside Spark container
spark-submit \
  --master spark://spark-master:7077 \
  --packages io.delta:delta-core_2.12:2.4.0 \
  /app/telemetry_stream.py

Run dbt Transformations

# In dbt container
dbt deps
dbt run --select silver
dbt run --select gold

Validate Data Quality

# Run Great Expectations
python -m great_expectations checkpoint run telemetry_checkpoint

Monitoring

Key Metrics

Metric Description Alert Threshold
streaming_lag Kafka-to-processing lag > 5 seconds
processing_latency Event processing time > 1 second
validation_failure_rate Failed validation % > 1%
checkpoint_age Time since last checkpoint > 60 seconds

Dashboards

  • Stream Health: Processing rate, lag, errors
  • Data Quality: Validation pass rates, quarantine metrics
  • System Resources: CPU, memory, disk usage

Development

Run Tests

make test

Code Linting

make lint

Clean Start

make clean   # Remove all data and containers
make reset   # Rebuild containers

Design Decisions

Why Kafka over Kinesis?

  • Better replay capabilities
  • Exactly-once semantics with transactions
  • Larger ecosystem and tooling

Why Spark over Flink?

  • Unified batch/streaming API
  • Better Delta Lake integration
  • More mature ecosystem

Why Delta over Iceberg?

  • Native Spark integration
  • ACID transactions
  • Time travel capabilities

Partitioning Strategy

  • Bronze: session_date/event_type - optimize for raw event access
  • Silver: session_date/car_number - optimize for car-centric queries
  • Gold: session_date - optimize for aggregate queries

Production Considerations

Scaling

  • Horizontal scaling: Add Spark executors
  • Partition scaling: Increase Kafka partitions
  • Storage scaling: Tier to S3/ADLS

Reliability

  • Checkpoint every 30 seconds
  • Minimum 3x Kafka replication
  • Dead letter queue for failed records

Security

  • Enable Kafka SASL authentication
  • Encrypt Delta Lake at rest
  • RBAC for dbt models

License

MIT License

Contributing

Contributions welcome. Please open an issue to discuss changes.

About

Production-grade racing telemetry platform using Medallion Architecture. Built with Spark Streaming, Kafka, and Delta Lake, containerized via Docker. Enables real-time analytics and contract-driven pipelines for ML features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors