Summary
The citation-extraction gold set (evals/gold/citation_extraction/) measures verification_gate.verify_citation's lookup_verified 3-class enum (true / false / unresolvable). After the #184 amendment that removed the valid_unresolvable class, the false class is carried entirely by fabricated tuples (intentionally-bogus DOI + title).
This leaves a known coverage gap, recorded here so it is not silently dropped.
The gap
fabricated tuples exercise a clean-404 path: a bogus DOI that no index resolves, a bogus title that no fallback search collides with. They reliably test the ≥1 unmatched AND 0 matched ⇒ false reducer.
They do not exercise the path a real-but-unindexed citation would take: a real title/author/year that a resolver's fuzzy fallback search might accidentally match against the wrong record, producing a false-positive matched. With zero real-but-unmatched tuples in the gold set, a regression where the classifier wrongly marks a real-but-unindexed work as matched would not be caught by the current false-class fixtures.
Why the class was removed (not just deferred)
The valid_unresolvable class required a citation that is (1) real, (2) first-party metadata-verifiable, and (3) unmatched across all four resolvers (Crossref / OpenAlex / Semantic Scholar / arXiv) simultaneously. No stable, first-party-verifiable source satisfying all three was found under current index coverage — anything real carrying a queryable DOI/arXiv ID tends to be indexed by at least one resolver (OpenAlex coverage is broad). A prior attempt at sourcing this class produced metadata that could not be first-party verified and was withdrawn.
The class was therefore removed rather than left empty, and the gap above is the accepted trade-off. See the §3.1 amendment note in the citation-extraction README.
Closing condition (not a committed deliverable)
If a stable, first-party-verifiable, all-resolver-unmatched citation source is later identified, a small non-gating real_unindexed canary set (e.g. 3–5 tuples) can be added to close the gap. This issue does not commit to a delivery date; it records the gap and the condition under which it can be closed.
Scope
Summary
The citation-extraction gold set (
evals/gold/citation_extraction/) measuresverification_gate.verify_citation'slookup_verified3-class enum (true/false/unresolvable). After the #184 amendment that removed thevalid_unresolvableclass, thefalseclass is carried entirely byfabricatedtuples (intentionally-bogus DOI + title).This leaves a known coverage gap, recorded here so it is not silently dropped.
The gap
fabricatedtuples exercise a clean-404 path: a bogus DOI that no index resolves, a bogus title that no fallback search collides with. They reliably test the≥1 unmatched AND 0 matched ⇒ falsereducer.They do not exercise the path a real-but-unindexed citation would take: a real title/author/year that a resolver's fuzzy fallback search might accidentally match against the wrong record, producing a false-positive
matched. With zero real-but-unmatched tuples in the gold set, a regression where the classifier wrongly marks a real-but-unindexed work asmatchedwould not be caught by the currentfalse-class fixtures.Why the class was removed (not just deferred)
The
valid_unresolvableclass required a citation that is (1) real, (2) first-party metadata-verifiable, and (3)unmatchedacross all four resolvers (Crossref / OpenAlex / Semantic Scholar / arXiv) simultaneously. No stable, first-party-verifiable source satisfying all three was found under current index coverage — anything real carrying a queryable DOI/arXiv ID tends to be indexed by at least one resolver (OpenAlex coverage is broad). A prior attempt at sourcing this class produced metadata that could not be first-party verified and was withdrawn.The class was therefore removed rather than left empty, and the gap above is the accepted trade-off. See the §3.1 amendment note in the citation-extraction README.
Closing condition (not a committed deliverable)
If a stable, first-party-verifiable, all-resolver-unmatched citation source is later identified, a small non-gating
real_unindexedcanary set (e.g. 3–5 tuples) can be added to close the gap. This issue does not commit to a delivery date; it records the gap and the condition under which it can be closed.Scope
verify_citationcorrectness bug.lookup_verifiedreducer logic (Add deterministic citation verification gate (independent of LLM review) #182) is unaffected.