[receiver/oracledb] Add redo log metrics by rluidash · Pull Request #49062 · open-telemetry/opentelemetry-collector-contrib

rluidash · 2026-06-15T07:18:16Z

Description

The Oracle DB receiver scrapes v$sysstat with SELECT * FROM v$sysstat, which returns all system-statistics rows. However, the scraper only processes a subset of those rows in its switch block and silently discards the rest -- including every redo-log statistic. This PR surfaces 8 discarded redo rows as 6 new opt-in metrics under the oracledb.redo.* namespace, giving operators visibility into redo write latency, redo throughput, and redo-log sizing pressure -- the foundation of Oracle's durability and recovery story (the equivalent of PostgreSQL's WAL).

Because the data is already being fetched, the implementation is purely additive Go code:

No new SQL queries
No new database round-trips
Zero additional load on the Oracle instance

Related statistics are consolidated under a single metric name with an OTel attribute rather than mapping each flat statistic to its own metric name, following the receiver's existing attributed-metric convention. The redo-pipeline timing statistics share one metric (oracledb.redo.time) differentiated by the oracledb.redo.kind attribute, and the write-direction counters reuse the existing disk.io.direction attribute (introduced for oracledb.physical_io.*).

This PR adds 6 new opt-in metrics (disabled by default, stability: development):

oracledb.redo.time -- Cumulative time, in seconds, spent in each phase of the redo pipeline. Attribute: oracledb.redo.kind (write|log_space_wait|synch). Covers v$sysstat: redo write time, redo log space wait time, redo synch time (the raw centisecond value is divided by 100 in the scraper). High write/synch time directly raises commit latency.
oracledb.redo.size -- Total amount of redo generated, in bytes. No attributes. Covers v$sysstat: redo size. The canonical redo write-throughput baseline.
oracledb.redo.operations -- Number of redo I/O operations by the log writer (LGWR). Attribute: disk.io.direction (write). Covers v$sysstat: redo writes.
oracledb.redo.blocks -- Number of redo blocks moved between the redo log and storage. Attribute: disk.io.direction (write). Covers v$sysstat: redo blocks written.
oracledb.redo.buffer_allocation.retries -- Number of times a process waited and retried to allocate space in the redo buffer. No attributes. Covers v$sysstat: redo buffer allocation retries. A rising value indicates redo buffer or log writer contention.
oracledb.redo.log_space.requests -- Number of times a process requested space in the redo log buffer and had to wait. No attributes. Covers v$sysstat: redo log space requests.

oracledb.redo.time is emitted as a Sum with aggregation_temporality: cumulative, monotonic: true, value type double, unit s. The other five are emitted as Sum with aggregation_temporality: cumulative, monotonic: true, value type int. Per-metric units: s (oracledb.redo.time); By (oracledb.redo.size); {operations} (oracledb.redo.operations); {blocks} (oracledb.redo.blocks); {retries} (oracledb.redo.buffer_allocation.retries); {requests} (oracledb.redo.log_space.requests).

oracledb.redo.time reports seconds (s) rather than the raw cs (centisecond) units that Oracle exposes via v$sysstat. The scraper divides the raw value by 100 before recording, matching the conversion already in place for the existing oracledb.cpu_time metric. This keeps the receiver's time-unit story consistent and avoids forcing downstream consumers to convert.

The new attribute oracledb.redo.kind uses a dotted-namespaced receiver-scoped key, as no OTel semantic-convention attribute covers Oracle's redo-pipeline phases. oracledb.redo.operations and oracledb.redo.blocks reuse the existing disk.io.direction attribute (currently write), which also leaves room for a future read counterpart without a schema change.

The metric set covers the redo statistics present on currently-supported Oracle (12c+); statistics that were removed in 12c (e.g. redo writer latching time) or that are not exposed by v$sysstat are intentionally excluded.

These metrics can be enabled in the collector configuration:

receivers:
  oracledb:
    datasource: "oracle://user:password@host:1521/service"
    metrics:
      oracledb.redo.time:
        enabled: true
      oracledb.redo.size:
        enabled: true
      oracledb.redo.operations:
        enabled: true
      oracledb.redo.blocks:
        enabled: true
      oracledb.redo.buffer_allocation.retries:
        enabled: true
      oracledb.redo.log_space.requests:
        enabled: true

Link to tracking issue

Fixes #49060

Testing

Unit tests added in scraper_test.go:

TestScraper_ScrapeRedoMetrics exercises all 6 new metrics end-to-end through the scraper using the existing fake DB client, asserting one expected value per data point (3 oracledb.redo.time data points -- one per oracledb.redo.kind -- plus 5 standalone data points = 8 data points per scrape across the 6 metrics).
The shared queryResponses[statsSQL] fixture is extended with 8 new fake v$sysstat rows (one per covered NAME), so TestScraper_Scrape, TestScraper_ScrapeOperationalMetrics, and TestScraper_ScrapeIOPerformanceMetrics continue to pass unchanged.
The test explicitly verifies the centiseconds -> seconds conversion on oracledb.redo.time: raw 1500/250/900 cs from redo write time / redo log space wait time / redo synch time produce 15.0/2.5/9.0 s on the write/log_space_wait/synch data points respectively, and asserts disk.io.direction=write on oracledb.redo.operations and oracledb.redo.blocks.
Auto-generated tests in internal/metadata/generated_metrics_test.go and generated_config_test.go are regenerated by make mdatagen and cover the new metric configs / metric builders.

Documentation

Auto-generated documentation.md updated with descriptions and metadata for the 6 new metrics and the new oracledb.redo.kind attribute (oracledb.redo.operations and oracledb.redo.blocks reuse the existing disk.io.direction attribute). internal/metadata/generated_*.go and internal/metadata/testdata/config.yaml regenerated via mdatagen. internal/metadata/config.schema.yaml was manually updated for the 6 new metric stanzas and the new oracledb.redo.kind enum, as that file is not regenerated by mdatagen.

Authorship

I, a human, wrote this pull request description myself.

thompson-tomo · 2026-06-15T13:13:17Z

+      value_type: double
+    unit: s
+    attributes: [oracledb.redo.kind]
+  oracledb.redo.writes:


What about the below with an attribute to indicate read/write

Suggested change

oracledb.redo.writes:

oracledb.redo.operations:

Or

Suggested change

oracledb.redo.writes:

oracledb.redo.actions:

Renamed to oracledb.redo.operations. Kept it a plain counter — redo writes is write-only in v$sysstat (the only redo-read stat is redo blocks read for recovery, a separate recovery concept), so a read/write attribute would be constant-valued.

I like having the attribute as it makes clear that it is describing writes. It also enables the potential of read metric to be added at a later date.

Done — added disk.io.direction attribute (write)

thompson-tomo · 2026-06-15T13:14:48Z

    gauge:
      value_type: double
    unit: By
+  oracledb.redo.blocks_written:


This doesn't flow like the others, what about

Suggested change

oracledb.redo.blocks_written:

oracledb.redo.blocks:

With an attribute to indicate read/write.

As above having the attribute would be advantageous.

Done — added disk.io.direction (write)

rluidash · 2026-06-19T01:28:59Z

Update after integration testing: I deployed a build against an Oracle 19c
instance and found two of the proposed v$sysstat statistics are not present on
Oracle 12c+, so they would never emit data. I've removed them:

Dropped oracledb.redo.log_switch.interrupts — redo log switch interrupts
is not a v$sysstat statistic.
Dropped the latching value from oracledb.redo.kind — its source
redo writer latching time was removed in 12c (replaced by the granular
redo write * timing stats). oracledb.redo.kind is now write/log_space_wait/synch.

This PR is now for 6 metrics; the remaining 8 source statistics all populate (verified on Oracle 19c).
Updated the PR description

…ks per review feedback

Removes oracledb.redo.log_switch.interrupts and the latching oracledb.redo.kind value; their v$sysstat sources are not present on Oracle 12c+ (verified on Oracle 19c).

thompson-tomo reviewed Jun 15, 2026

View reviewed changes

rluidash requested a review from thompson-tomo June 16, 2026 00:21

rluidash added 5 commits June 21, 2026 23:58

[receiver/oracledb] Add redo log metrics

8708609

[receiver/oracledb] Rename redo metrics per review feedback

fd02405

ci: re-trigger build

1f6ac02

[receiver/oracledb] Add disk.io.direction to redo operations and bloc…

00365ee

…ks per review feedback

[receiver/oracledb] Drop redo statistics absent on Oracle 12c+

be4bac1

Removes oracledb.redo.log_switch.interrupts and the latching oracledb.redo.kind value; their v$sysstat sources are not present on Oracle 12c+ (verified on Oracle 19c).

rluidash force-pushed the feat/oracledb-redo-log branch from 0ffb34f to be4bac1 Compare June 22, 2026 07:29

rluidash marked this pull request as ready for review June 22, 2026 08:20

rluidash requested review from a team, atoulme, crobert-1 and dmitryax as code owners June 22, 2026 08:20

github-actions Bot assigned ArthurSens Jun 22, 2026

github-actions Bot added the receiver/oracledb label Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/oracledb] Add redo log metrics#49062

[receiver/oracledb] Add redo log metrics#49062
rluidash wants to merge 5 commits into
open-telemetry:mainfrom
rluidash:feat/oracledb-redo-log

rluidash commented Jun 15, 2026 •

edited

Loading

Uh oh!

thompson-tomo Jun 15, 2026

Uh oh!

rluidash Jun 16, 2026

Uh oh!

thompson-tomo Jun 16, 2026

Uh oh!

rluidash Jun 18, 2026

Uh oh!

thompson-tomo Jun 15, 2026

Uh oh!

rluidash Jun 16, 2026

Uh oh!

thompson-tomo Jun 16, 2026

Uh oh!

rluidash Jun 18, 2026

Uh oh!

rluidash commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rluidash commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Link to tracking issue

Testing

Documentation

Authorship

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rluidash commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rluidash commented Jun 15, 2026 •

edited

Loading

rluidash commented Jun 19, 2026 •

edited

Loading