Skip to content

Commit 09216e1

Browse files
authored
feat: AI-native rebuild + production polish (closes #8 #9 #13 #15 #30 #31) (#51)
* chore(repo): AGENTS.md + citations contract + CORS/OpenAPI + PDF rasterizer seam (Refs #41) * feat(ocr): capture source-span geometry + scanned-PDF OCR + BdxReviewPayload (Refs #41, #13) * feat(extraction): optional Ollama LLM field extractor with deterministic-first merge (Refs #41, #15) * feat(ingest): inbound-source + parser-adapter seams, Docling flag, file-based email source (Refs #41, #8, #9) * feat(api): review-payload, page images, settings, schema-mappings, inbound-sources + agent tool endpoints (Refs #41) * feat(extraction): wire real Ollama IChatClient field proposals behind the flag (Refs #41, #15) * chore(web): scaffold Next.js 16 + AI SDK 6 + ollama-ai-provider-v2 + zod frontend (Refs #41) * feat(web): institutional-calm design system, app shell, and typed API client (Refs #41) * feat(web): AG-UI assistant route over local Ollama with useChat panel (Refs #41, #15) * feat(web): workspace, source-cited review split-view, export, and settings (Refs #41, #13, #30, #31) * refactor(web): cut over Reva.Web to API-only host; repoint e2e to API smoke (Refs #41, #31) * feat(web): first-run onboarding tour, a11y polish, and Playwright e2e specs (Refs #41, #30, #31) * feat(web): assistant dock minimize/expand, image + document attachments, new conversation; raise agent num_ctx so vision attachments are not truncated (Closes #43) * design(web): verdict-first review experience — reconciliation with money deltas up top, grouped fields, plain language, line-items table, review actions (Closes #49, #45) * feat(web): per-sender schema-mapping management page (Closes #47) * feat(web): export template editor (create/edit/duplicate/delete) + format-aware download (Closes #46) * feat(review): real document preview with per-line citation overlay Render ingested raster images and rendered PDF pages as a real document viewer in the review split-view: zoom, fit-to-width, page navigation, loading state, and citation bounding-box overlays that scale with zoom. Workspace queue rows show a page thumbnail for image and PDF documents, falling back to the document icon for digital files. Wire the image OCR path to emit a page (stored image plus real pixel dimensions) and real-dimension source spans so image documents render with overlays. Replace the reflection-based OpenCV region reader with the typed RotatedRect API so per-line bounding boxes are captured instead of a full-page fallback (also fixes citations on the scanned-PDF path). Seed a scanned-bordereau image sample so the preview and overlays are demonstrable end to end. Closes #50 * feat(settings): reconciliation tolerance, LLM-assist toggle, data management, default template Add two persisted app settings — money reconciliation tolerance and a local LLM-assist toggle — and surface them on the Settings page alongside a default export-template selector and demo data management (reseed and clear). The reconciliation engine reads the configured tolerance instead of a fixed threshold, and ingestion only invokes the LLM extractor when assist is enabled so extraction stays fully deterministic by default. New POST /api/data/reseed and /api/data/clear endpoints back the data-management actions, and the export view pre-selects the configured default template. Closes #44 * test(web): green Playwright e2e suite + axe accessibility pass Run the end-to-end suite against the live stack and bring it to green: - a shared fixture suppresses the first-run tour so specs start unobstructed - a new assistant spec covers opening the dock and sending a message - the onboarding spec drives the tour to completion via the persisted flag, tolerating route changes and auto-skipped steps - correct the review field-row selector after the verdict-first redesign Accessibility: zero serious or critical axe violations on /, /review, /export, and /settings. Wrap queue rows in listitem (aria-required-children), mark the collapsed assistant dock inert so its controls leave the focus and a11y tree, label the upload file input, underline in-text links, and raise the subtle-foreground and success token contrast to meet WCAG AA. Closes #48
1 parent a7924ce commit 09216e1

158 files changed

Lines changed: 12829 additions & 6279 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,4 @@ jobs:
3434
dotnet-version: 10.0.x
3535
- run: dotnet build Reva.slnx
3636
- run: dotnet build tests/Reva.E2E/Reva.E2E.csproj
37-
- shell: pwsh
38-
name: Install Playwright browser
39-
run: tests/Reva.E2E/bin/Debug/net10.0/playwright.ps1 install chromium
4037
- run: dotnet test tests/Reva.E2E/Reva.E2E.csproj --no-build

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,4 @@ Thumbs.db
5454
.idea/
5555
.vscode/*
5656
!.vscode/extensions.json
57+
.proof/

AGENTS.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Reva agent guide
2+
3+
## Commands
4+
- Restore/build: `dotnet build Reva.slnx -warnaserror`
5+
- Tests: `dotnet test`
6+
- Format before EF migration commits: `dotnet format`
7+
- Run app without browser: `dotnet run --project src/Reva.Web/Reva.Web.csproj -- --no-open`
8+
9+
## Layout
10+
- Core contracts and domain types: `src/Reva.Core`
11+
- Backend services, parsing, extraction, OCR, persistence: `src/Reva.Infrastructure`
12+
- Blazor host and HTTP API endpoints: `src/Reva.Web`
13+
- Unit/integration tests: `tests/Reva.Unit`, `tests/Reva.Integration`
14+
- Contract schemas: `contracts`
15+
16+
## Boundaries
17+
- Do not edit `src/Reva.Web/Components/**`, `web/**`, `tests/Reva.E2E/**`, CI workflows, or release automation unless Tony explicitly scopes it.
18+
- Keep the keyless/offline default path working. External AI and Docling paths stay disabled unless config enables them.
19+
- Secrets come from environment or local config only and are never committed.
20+
21+
## API contracts
22+
- Review payloads follow `contracts/bdx-review-payload.schema.json`.
23+
- Bounding boxes are normalized to `0..1` against the final rendered page size.
24+
- Provenance is always present; citations may be empty only when geometry is unavailable.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
@AGENTS.md
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"$id": "https://reva.local/contracts/bdx-review-payload.schema.json",
4+
"title": "BdxReviewPayload",
5+
"type": "object",
6+
"required": ["document", "sourceSpans", "fields", "lineItems", "reconciliation"],
7+
"additionalProperties": false,
8+
"$defs": {
9+
"bbox": { "type": "object", "required": ["x", "y", "width", "height"], "additionalProperties": false, "properties": { "x": { "type": "number", "minimum": 0, "maximum": 1 }, "y": { "type": "number", "minimum": 0, "maximum": 1 }, "width": { "type": "number", "minimum": 0, "maximum": 1 }, "height": { "type": "number", "minimum": 0, "maximum": 1 } } },
10+
"point": { "type": "object", "required": ["x", "y"], "additionalProperties": false, "properties": { "x": { "type": "number", "minimum": 0, "maximum": 1 }, "y": { "type": "number", "minimum": 0, "maximum": 1 } } },
11+
"citation": { "type": "object", "required": ["sourceSpanId", "page", "bbox", "role"], "additionalProperties": false, "properties": { "sourceSpanId": { "type": "string", "minLength": 1 }, "page": { "type": "integer", "minimum": 1 }, "bbox": { "$ref": "#/$defs/bbox" }, "quote": { "type": ["string", "null"] }, "role": { "enum": ["value", "label", "header", "row", "control-total", "supporting"] } } },
12+
"provenance": { "type": "object", "required": ["method", "stepId", "citations"], "additionalProperties": false, "properties": { "method": { "enum": ["digital_parse", "csv_parse", "excel_parse", "paddle_ocr", "schema_mapping", "llm_proposal", "merge", "manual"] }, "stepId": { "type": "string", "minLength": 1 }, "model": { "type": ["string", "null"] }, "promptVersion": { "type": ["string", "null"] }, "citations": { "type": "array", "items": { "$ref": "#/$defs/citation" } } } },
13+
"fieldValue": { "type": "object", "required": ["key", "label", "value", "status", "confidence", "provenance"], "additionalProperties": false, "properties": { "key": { "type": "string" }, "label": { "type": "string" }, "value": { "type": "string" }, "rawText": { "type": ["string", "null"] }, "status": { "enum": ["detected", "expected", "missing", "conflict", "low_confidence", "user_confirmed"] }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 }, "provenance": { "$ref": "#/$defs/provenance" } } }
14+
},
15+
"properties": {
16+
"document": { "type": "object", "required": ["id", "filename", "pages"], "additionalProperties": false, "properties": { "id": { "type": "string" }, "filename": { "type": "string" }, "pages": { "type": "array", "items": { "type": "object", "required": ["page", "imageUrl", "width", "height", "rotation"], "additionalProperties": false, "properties": { "page": { "type": "integer", "minimum": 1 }, "imageUrl": { "type": "string" }, "width": { "type": "number" }, "height": { "type": "number" }, "rotation": { "enum": [0, 90, 180, 270] } } } } } },
17+
"sourceSpans": { "type": "array", "items": { "type": "object", "required": ["id", "documentId", "page", "pageWidth", "pageHeight", "rotation", "bbox", "text"], "additionalProperties": false, "properties": { "id": { "type": "string" }, "documentId": { "type": "string" }, "page": { "type": "integer", "minimum": 1 }, "pageWidth": { "type": "number" }, "pageHeight": { "type": "number" }, "rotation": { "enum": [0, 90, 180, 270] }, "bbox": { "$ref": "#/$defs/bbox" }, "polygon": { "type": "array", "items": { "$ref": "#/$defs/point" } }, "text": { "type": "string" }, "ocrConfidence": { "type": ["number", "null"], "minimum": 0, "maximum": 1 }, "blockId": { "type": ["string", "null"] }, "tableId": { "type": ["string", "null"] }, "rowIndex": { "type": ["integer", "null"], "minimum": 0 }, "columnIndex": { "type": ["integer", "null"], "minimum": 0 } } } },
18+
"fields": { "type": "array", "items": { "$ref": "#/$defs/fieldValue" } },
19+
"lineItems": { "type": "array", "items": { "type": "object", "required": ["id", "rowNumber", "fields", "rowCitationIds"], "additionalProperties": false, "properties": { "id": { "type": "string" }, "rowNumber": { "type": "integer", "minimum": 1 }, "fields": { "type": "array", "items": { "$ref": "#/$defs/fieldValue" } }, "rowCitationIds": { "type": "array", "items": { "type": "string" } } } } },
20+
"reconciliation": { "type": "array", "items": { "type": "object", "required": ["id", "name", "expected", "detected", "delta", "tolerance", "status", "explanation", "citations"], "additionalProperties": false, "properties": { "id": { "type": "string" }, "name": { "type": "string" }, "expected": { "$ref": "#/$defs/fieldValue" }, "detected": { "$ref": "#/$defs/fieldValue" }, "delta": { "type": "number" }, "tolerance": { "type": "number" }, "status": { "enum": ["pass", "fail", "warning", "not_applicable"] }, "explanation": { "type": "string" }, "citations": { "type": "array", "items": { "$ref": "#/$defs/citation" } } } } }
21+
}
22+
}

docs/index.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22

33
| Doc | What it covers |
44
|:---|:---|
5-
| [architecture.md](architecture.md) | Backend-first architecture, contracts, data flow, and adapter boundaries. |
6-
| [ai-pipeline.md](ai-pipeline.md) | Native .NET parser router, offline PaddleOCR, the reconciliation engine, and the optional LLM seam. |
5+
| [architecture.md](architecture.md) | API-only .NET host, Next.js `web/` UI, contracts, data flow, and adapter boundaries. |
6+
| [ai-pipeline.md](ai-pipeline.md) | Native .NET parser router, offline PaddleOCR, local Ollama seam, and reconciliation. |
77
| [packaging.md](packaging.md) | Windows executable packaging and smoke-test flow. |
8-
| [demo-script.md](demo-script.md) | Five-minute product walkthrough. |
98
| [research/reinsurance-landscape.md](research/reinsurance-landscape.md) | Domain + market research grounding the product: document types, fields, must-have features, competitor patterns, standards. |
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
namespace Reva.Core.Contracts;
2+
3+
public sealed record SourceBox(double X, double Y, double Width, double Height);
4+
5+
public sealed record SourcePoint(double X, double Y);
6+
7+
public sealed record SourceSpan(
8+
string Id,
9+
Guid DocumentId,
10+
int Page,
11+
double PageWidth,
12+
double PageHeight,
13+
int Rotation,
14+
SourceBox Bbox,
15+
IReadOnlyList<SourcePoint>? Polygon,
16+
string Text,
17+
double? OcrConfidence,
18+
string? BlockId,
19+
string? TableId,
20+
int? RowIndex,
21+
int? ColumnIndex);
22+
23+
public sealed record Citation(
24+
string SourceSpanId,
25+
int Page,
26+
SourceBox Bbox,
27+
string? Quote,
28+
string Role);
29+
30+
public sealed record FieldProvenance(
31+
string Method,
32+
string StepId,
33+
string? Model,
34+
string? PromptVersion,
35+
IReadOnlyList<Citation> Citations);
36+
37+
public sealed record FieldValue(
38+
string Key,
39+
string Label,
40+
string Value,
41+
string? RawText,
42+
string Status,
43+
double Confidence,
44+
FieldProvenance Provenance);
45+
46+
public sealed record ReconciliationCheck(
47+
string Id,
48+
string Name,
49+
FieldValue Expected,
50+
FieldValue Detected,
51+
double Delta,
52+
double Tolerance,
53+
string Status,
54+
string Explanation,
55+
IReadOnlyList<Citation> Citations);
56+
57+
public sealed record BdxPage(int Page, string ImageUrl, double Width, double Height, int Rotation);
58+
59+
public sealed record BdxDocument(Guid Id, string Filename, IReadOnlyList<BdxPage> Pages);
60+
61+
public sealed record LineItemValue(string Id, int RowNumber, IReadOnlyList<FieldValue> Fields, IReadOnlyList<string> RowCitationIds);
62+
63+
public sealed record BdxReviewPayload(
64+
BdxDocument Document,
65+
IReadOnlyList<SourceSpan> SourceSpans,
66+
IReadOnlyList<FieldValue> Fields,
67+
IReadOnlyList<LineItemValue> LineItems,
68+
IReadOnlyList<ReconciliationCheck> Reconciliation);

src/Reva.Core/Settings/AppSettings.cs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@ public sealed record AppSettings(
88
string ProductName,
99
double ConfidenceLowMax, // score below this renders as "Low"
1010
double ConfidenceMediumMax, // score below this renders as "Medium"; at or above is "High"
11-
Guid? DefaultTemplateId)
11+
Guid? DefaultTemplateId,
12+
double ReconciliationTolerance,
13+
bool UseLlmAssist)
1214
{
13-
public static AppSettings Default => new(AppTheme.Light, string.Empty, "Reve Intelligence", 0.6, 0.85, null);
15+
public static AppSettings Default => new(AppTheme.Light, string.Empty, "Reve Intelligence", 0.6, 0.85, null, 0.01, false);
1416
}

src/Reva.Infrastructure/DocumentWorkflow.cs

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
using Reva.Core.Contracts;
44
using Reva.Core.Documents;
55
using Reva.Core.Reinsurance;
6+
using Reva.Core.Settings;
67
using Reva.Infrastructure.Extraction;
78
using Reva.Infrastructure.Hashing;
89
using Reva.Infrastructure.Parsing;
@@ -19,7 +20,9 @@ public sealed class DocumentWorkflow(
1920
IDocumentParser parser,
2021
IReinsuranceClassifier classifier,
2122
IReinsuranceExtractor extractor,
22-
ISchemaMappingService schemaMapping) : IDocumentWorkflow
23+
ISchemaMappingService schemaMapping,
24+
ILlmFieldExtractor llmExtractor,
25+
IExtractionMerger extractionMerger) : IDocumentWorkflow
2326
{
2427
private static readonly JsonSerializerOptions SerializerOptions = new(JsonSerializerDefaults.Web);
2528

@@ -161,7 +164,11 @@ private async Task ParseAndExtractAsync(DocumentRecord record, CancellationToken
161164

162165
// Always run best-effort extraction. Unknown/low-confidence documents are still
163166
// ingested as reviewable records (the extractor flags them) — never quarantined.
164-
var extraction = extractor.Extract(parsed, classification);
167+
var deterministic = extractor.Extract(parsed, classification);
168+
var proposal = RuntimeSettings.Current.UseLlmAssist
169+
? await llmExtractor.ProposeAsync(parsed, deterministic, cancellationToken)
170+
: null;
171+
var extraction = extractionMerger.Merge(deterministic, proposal);
165172
var mapped = await schemaMapping.MapAsync(parsed, extraction.Fields, cancellationToken);
166173
record.Status = DocumentStatus.Extracted.ToString();
167174
record.DocumentType = extraction.DocumentType.ToString();
@@ -175,6 +182,33 @@ private async Task ParseAndExtractAsync(DocumentRecord record, CancellationToken
175182
IsCorrected = field.IsCorrected
176183
}).ToList();
177184
record.SchemaMappings = mapped.Mappings.ToList();
185+
record.SourceSpans = parsed.SourceSpans.Select(span => new DocumentSourceSpanRecord
186+
{
187+
SpanId = span.Id,
188+
Page = span.Page,
189+
PageWidth = span.PageWidth,
190+
PageHeight = span.PageHeight,
191+
Rotation = span.Rotation,
192+
X = span.Bbox.X,
193+
Y = span.Bbox.Y,
194+
Width = span.Bbox.Width,
195+
Height = span.Bbox.Height,
196+
PolygonJson = JsonSerializer.Serialize(span.Polygon ?? [], SerializerOptions),
197+
Text = span.Text,
198+
OcrConfidence = span.OcrConfidence,
199+
BlockId = span.BlockId,
200+
TableId = span.TableId,
201+
RowIndex = span.RowIndex,
202+
ColumnIndex = span.ColumnIndex
203+
}).ToList();
204+
record.Pages = parsed.Pages.Select(page => new DocumentPageRecord
205+
{
206+
Page = page.Page,
207+
ImagePath = page.ImagePath,
208+
Width = page.Width,
209+
Height = page.Height,
210+
Rotation = page.Rotation
211+
}).ToList();
178212
record.Tables = extraction.Tables.Select(table => new DocumentTableRecord
179213
{
180214
Name = table.Name,
@@ -228,6 +262,8 @@ private static void ValidateFile(string fileName, Stream content)
228262
.Include(document => document.Fields)
229263
.Include(document => document.Tables)
230264
.Include(document => document.SchemaMappings)
265+
.Include(document => document.SourceSpans)
266+
.Include(document => document.Pages)
231267
.Include(document => document.Exceptions)
232268
.Include(document => document.ReviewEvents)
233269
.FirstOrDefaultAsync(document => document.Id == id, cancellationToken);
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
namespace Reva.Infrastructure.Extraction;
2+
3+
public sealed class LlmExtractionOptions
4+
{
5+
public const string ProviderNone = "None";
6+
public const string ProviderOllama = "Ollama";
7+
public const string PromptVersion = "bdx-review-v1";
8+
public const string SchemaVersion = "bdx-review-payload-v1";
9+
public const string DefaultBaseUrl = "http://localhost:11434/v1";
10+
public const string DefaultModel = "qwen3-vl:8b";
11+
public const int MaxPromptCharacters = 8000;
12+
public const int RetryPromptCharacters = 3000;
13+
public const string SystemPrompt = "You extract reinsurance bordereaux fields and return strict JSON only.";
14+
public string Provider { get; set; } = ProviderNone;
15+
public string BaseUrl { get; set; } = DefaultBaseUrl;
16+
public string Model { get; set; } = DefaultModel;
17+
public bool DeterministicOnly { get; set; } = true;
18+
}

0 commit comments

Comments
 (0)