GitLabViz can render a heatmap and leaderboard of your test suite's flake history when a compatible JSON bundle is published as a GitLab Generic Package. This document is the public contract: it describes the bundle's schema, how the viewer fetches it, and how to write a producer that emits a compatible bundle.
The bundle is the only contract. Producers are free to compute it however they like; the viewer is free to evolve its UI; both sides agree only on the JSON shape defined in flake-history-bundle.v1.schema.json.
When tests run across many machines and many pipelines, individual failures are easy to dismiss as "just flaky" until somebody collates the data. This integration:
- Reads a single bundle JSON describing every recorded run and every test's per-context pass/fail history.
- Renders a heatmap (tests × runs) and a leaderboard (most-flaky tests first).
- Provides a kiosk variant for studio dashboards: "currently broken" + "top flaky."
The viewer makes no assumptions about where the data came from. Any team running any CI on any platform can produce the bundle; the only requirement is that the published artifact match the schema and live at a Generic Package URL the viewer can reach.
In GitLabViz, open Settings → Flake history source and fill in:
| Field | Description |
|---|---|
| GitLab project ID or path | The numeric project ID, or group/project. The same project that hosts the bundle's Generic Package. |
| Package name | Defaults to flake-history. Override if your bundler publishes under a different name. |
| Refresh interval | Minutes between automatic refreshes. 0 = manual only. |
The GitLab URL and personal access token from the existing GitLab settings panel are reused; the token needs at least read_api scope.
When unset the Flake History view shows a "Not configured" empty state with a link back here. GitLabViz never has a default project baked in.
Bundles are versioned: schema_version: 1 is the current shape. Additive changes (new optional fields) stay in v1; breaking changes bump to v2 and the viewer will refuse to read v2 until it is updated. Always declare the version you produce — bundles without schema_version are rejected.
Full JSON Schema: flake-history-bundle.v1.schema.json. Annotated example: ../fixtures/flake-history-bundle-sample.json.
{
"schema_version": 1,
"generated_at": "2026-05-21T15:00:00Z",
"generator": { "name": "your-bundler", "version": "1.0.0" },
"window": {
"retention_days": 14,
"oldest_run_uploaded_at": "...",
"newest_run_uploaded_at": "..."
},
"runs": [ /* one entry per test-suite run */ ],
"tests": [ /* one entry per distinct test */ ]
}Each entry describes one distinct execution of the test suite. Test results reference these by run_id.
| Field | Type | Notes |
|---|---|---|
run_id |
string, required | Producer-stable identifier. Referenced from test cells. |
suite |
string | Suite label, e.g. "smoketest". |
gfx_api, quality, custom_profile_hash |
string | Optional facet labels used to bucket cells. |
source_revision, source_branch |
string | VCS state under test. Strings, so SVN revisions and git SHAs both fit. |
pipeline_id |
integer | The CI pipeline that contributed this run. |
pipeline_url |
string (uri) | Optional. When present the viewer renders cells as clickable links to this URL. Omit if you don't have GitLab pipeline URLs — viewer will show a non-clickable cell. |
artifacts_url |
string (uri) | Optional. Direct download URL for the run's CI artifacts. When present, clicking a heatmap cell downloads the artifacts instead of opening the pipeline. Omit or null once artifacts have expired. The viewer independently dims cells whose artifacts it estimates have expired (based on the run's age against a per-suite retention window), regardless of this field. |
started_at, finished_at |
ISO 8601 UTC | finished_at may be null for an interrupted run. |
duration_seconds |
integer ≥ 0 | Optional; derivable from start/finish if you'd rather not store it. |
runner_id |
string | Whatever identifies the host. |
status |
enum | "complete", "interrupted", or "unknown". Drives the heatmap's "interrupted" rendering (hollow red border) for cells from these runs. |
Each entry is one test, with per-context aggregates and overall stats.
| Field | Type | Notes |
|---|---|---|
test_id |
string, required | Opaque primary key. Conventionally <suite>::<module>::<test_name>. |
name, module, suite |
string | Display labels. |
results_by_context |
array | One entry per (gfx_api, quality, custom_profile_hash) under which the test was observed. |
overall |
object | Aggregated counts, is_flaky, and flake_classification. |
Within results_by_context[]:
| Field | Type | Notes |
|---|---|---|
passing_run_ids, failing_run_ids |
string[] | References into runs[].run_id. The viewer joins on these to render heatmap cells. |
pass_count, fail_count, pass_rate |
numbers | Derived; the viewer trusts the producer's values rather than recomputing. |
last_status, last_run_id |
enum, string | The most recent outcome in this context. |
The viewer treats this as an opaque label and uses it only for grouping/colour. Producers pick whatever thresholds suit their data. The reference bundler uses:
| Class | Condition |
|---|---|
stable |
No failures, or pass_rate ≥ 0.95. |
intermittent |
0.5 ≤ pass_rate < 0.95. |
actively_flaky |
0 < pass_rate < 0.5. |
broken |
No passes recorded (pass_rate == 0 and fail_count > 0). |
Anything that can PUT a file to a GitLab Generic Package can be a producer. The smallest possible producer is a curl invocation from CI:
curl --upload-file bundle.json \
--header "JOB-TOKEN: $CI_JOB_TOKEN" \
"$CI_API_V4_URL/projects/$CI_PROJECT_ID/packages/generic/flake-history/$(date +%s)/bundle.json"That's it. The viewer fetches the newest version under the configured package name and renders it.
Most teams will produce the bundle by walking their own per-runner or per-job test results and folding them into the runs[] + tests[] shape. Practical advice:
- Use Unix-timestamp versions (or any monotonically increasing string). The viewer's "newest" lookup sorts by
created_at, so collisions are tolerated, but monotonic versions make manual debugging easier. - Keep at least a week of bundles. Prune older versions in the same job.
- Don't fail the build on a bundling error. Flake history is observability, not a gate.
Can I use a non-GitLab source? Today the viewer's fetch path assumes GitLab's Generic Package API. A "custom URL source" mode is on the roadmap. The schema is portable — your producer can already emit bundles to any storage, you just can't yet point the viewer at S3 / a static host.
My producer doesn't have pipeline URLs. Omit pipeline_url. The viewer renders a non-clickable cell.
My runs don't have quality or gfx_api. Set them to null. The viewer's facet filters collapse to "all" for those bundles.
How big can the bundle be? No fixed cap, but the viewer loads it entirely into memory. A year of nightly runs is typically well under 5 MB JSON. If you push past that, consider stricter retention rather than streaming.
GitLabViz is released under the licence in the repository root. The schema and sample data are part of the same repo and same licence. If you change the schema — even an additive change — please update flake-history-bundle.v1.schema.json, the sample bundle, and this document in the same commit.