|
| 1 | +# ARCHITECTURE |
| 2 | + |
| 3 | +How open-cowork is put together, and why. Companion docs: `DECISIONS.md` |
| 4 | +(choices + trade-offs), `SECURITY.md` (trust boundaries), `PLAN.md` (original |
| 5 | +build plan and package contracts). |
| 6 | + |
| 7 | +## The two hard truths the design hangs on |
| 8 | + |
| 9 | +**1. Local vs remote execution.** Only the desktop app can capture and control |
| 10 | +the user's *own* screen; web and mobile drive a Coasty cloud machine instead. |
| 11 | +This is modeled as one `Executor` interface with three implementations behind a |
| 12 | +single shared agent loop — the rest of the product never cares which screen it |
| 13 | +is driving. |
| 14 | + |
| 15 | +**2. The API key never touches a client.** All clients speak to the |
| 16 | +open-cowork backend, which is the only holder of `COASTY_API_KEY` and of every |
| 17 | +per-run `webhook_secret`. Clients hold short-lived session tokens. |
| 18 | + |
| 19 | +## Component map |
| 20 | + |
| 21 | +```text |
| 22 | + ┌────────────────────────────────────────────┐ |
| 23 | + │ Coasty API │ |
| 24 | + │ /predict /sessions /runs /workflows │ |
| 25 | + │ /machines · SSE events · HMAC webhooks │ |
| 26 | + └────────▲───────────────────────┬───────────┘ |
| 27 | + │ X-API-Key (backend only)│ webhooks (HMAC) |
| 28 | +┌──────────────┐ ┌────────┴───────────────────────▼───────────┐ |
| 29 | +│ apps/desktop │ IPC │ apps/backend (Fastify) │ |
| 30 | +│ Electron ├──────►│ auth (bearer tokens) · Coasty proxy │ |
| 31 | +│ main proc: │ local │ estimates + confirmCostCents + budget caps │ |
| 32 | +│ LocalRun- │ runs │ Ingestor: Coasty SSE → events table → bus │ |
| 33 | +│ Manager │ mirror│ webhook receiver (verify before mutate) │ |
| 34 | +│ + Local- │ │ SQLite (node:sqlite) · SSE fan-out │ |
| 35 | +│ Executor │ └────────▲───────────────▲────────────────────┘ |
| 36 | +└──────▲───────┘ │ REST + SSE │ REST + polling |
| 37 | + │ hosts │ │ |
| 38 | +┌──────┴───────┐ ┌───────┴──────┐ ┌──────┴───────┐ |
| 39 | +│ webview: │ │ apps/web │ │ apps/mobile │ |
| 40 | +│ the same SPA │ │ Vite+React │ │ Expo / RN │ |
| 41 | +└──────────────┘ └──────────────┘ └──────────────┘ |
| 42 | + shared: packages/core · packages/executor · packages/ui |
| 43 | +``` |
| 44 | + |
| 45 | +## packages/core — the framework-agnostic heart |
| 46 | + |
| 47 | +Zero runtime dependencies, isomorphic (Node + browser): injectable `fetch`, |
| 48 | +Web Crypto for HMAC, injectable clocks/sleeps for deterministic tests. |
| 49 | + |
| 50 | +- **`CoastyClient`** — typed methods for every documented endpoint. Transport |
| 51 | + policy: timeouts compose with caller signals; retries use exponential backoff |
| 52 | + with full jitter and honor `Retry-After`; **GET/DELETE retry by default, POST |
| 53 | + retries only when an `Idempotency-Key` was provided** (a retried unkeyed POST |
| 54 | + could double-bill). Errors map to `CoastyApiError` carrying `code`, |
| 55 | + `request_id`, and code-specific extras. |
| 56 | +- **`runAgentLoop`** — screenshot → predict → execute → repeat until |
| 57 | + done/fail/cap/abort. Takes an `AgentScreen` (what executors implement) and a |
| 58 | + `PredictStepFn`, so predictions can come from a raw Coasty session *or* the |
| 59 | + backend proxy. Emits structured events; tolerates up to 3 consecutive |
| 60 | + action-execution failures; cooperative cancellation via `AbortSignal`. |
| 61 | +- **Workflow DSL** — validator enforcing every documented limit (≤200 steps, |
| 62 | + ≤8 nesting, ≤16 parallel branches, retry 1–20, no approvals inside parallel, |
| 63 | + reserved `save_as` names), the 13-op condition evaluator, `{{path}}` |
| 64 | + templating, and a deterministic executor with `budget_cents` / |
| 65 | + `max_iterations` / `deadline_seconds` guards — used for builder feedback, |
| 66 | + dry-run estimates, and cross-checking the server. |
| 67 | +- **Cost estimator** — mirrors the documented pricing table exactly (including |
| 68 | + the strict HD boundary: 1280×720 is *not* HD) and computes run/workflow |
| 69 | + worst-case estimates the backend uses for the confirmation handshake. |
| 70 | +- **Webhook HMAC** — sign/verify `t=<unix>,v1=<hex>` over `"<t>.<body>"`, |
| 71 | + constant-time byte comparison, ±300s tolerance both directions, multiple |
| 72 | + `v1` entries accepted (rotation). |
| 73 | +- **SSE** — a spec-correct parser plus reconnecting event streams that resume |
| 74 | + via `Last-Event-ID` with overlap de-duplication. |
| 75 | + |
| 76 | +## packages/executor — one loop, three screens |
| 77 | + |
| 78 | +```ts |
| 79 | +interface Executor extends AgentScreen { |
| 80 | + kind: 'local' | 'remote-machine' | 'browser'; |
| 81 | + screenshot(): Promise<{ base64; width; height }>; |
| 82 | + execute(action: CuaAction): Promise<void>; |
| 83 | + dimensions(): Promise<{ width; height }>; |
| 84 | + dispose(): Promise<void>; |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +- **RemoteMachineExecutor** maps canonical actions onto the documented machine |
| 89 | + endpoints (`GET /machines/{id}/screenshot`, `POST /machines/{id}/actions`) |
| 90 | + through an injected transport — `CoastyClient` on the backend, a thin proxy |
| 91 | + client elsewhere. `wait` sleeps locally; `raw` code execution is refused by |
| 92 | + policy on every target. |
| 93 | +- **LocalExecutor** wraps a `NativeBridge` and solves the #1 documented |
| 94 | + pitfall — coordinate scaling — by mapping model-space (screenshot pixels) to |
| 95 | + input-space (real screen pixels) on every action. |
| 96 | +- **Bridges**: Windows is the reference — a persistent PowerShell daemon |
| 97 | + (`System.Drawing` capture + `user32` SendInput-family input) speaking |
| 98 | + JSON-lines over stdio, started via `-EncodedCommand`; zero native npm |
| 99 | + modules, so installs never compile anything. macOS (`screencapture`/ |
| 100 | + `cliclick`/`osascript`) and Linux (`import`/`xdotool`) are best-effort |
| 101 | + equivalents behind the same interface. |
| 102 | +- Actions are normalized first (`normalizeAction`) because the upstream docs' |
| 103 | + reference table and examples disagree on some param shapes — both are |
| 104 | + accepted, one canonical shape is executed. |
| 105 | + |
| 106 | +## apps/backend — proxy, custodian, fan-out |
| 107 | + |
| 108 | +- **Auth**: `POST /api/auth/login {email}` issues an opaque random token |
| 109 | + (stored hashed, 7-day expiry). Single-tenant demo auth by design |
| 110 | + (`DECISIONS.md` D6); every table already carries `user_id`. |
| 111 | +- **Spend safety — the confirmCostCents handshake.** Billable routes compute |
| 112 | + the relevant number server-side (run worst case = `maxSteps × perStep`; |
| 113 | + machines = first-hour rate; workflows = the budget cap itself) and reject |
| 114 | + unless the client echoes it exactly (`409 ESTIMATE_CHANGED` with the expected |
| 115 | + value). Budgets are then enforced again: runs whose worst case exceeds the |
| 116 | + user's cap are refused with a suggested `maxSteps`; workflow runs pass |
| 117 | + `budget_cents` so *Coasty* halts them at the cap (`GUARD_EXCEEDED`); wallet |
| 118 | + pre-flight checks surface 402s before anything starts. |
| 119 | +- **Event pipeline.** Creating a run starts an **Ingestor** subscription to |
| 120 | + Coasty's SSE stream (resuming from the last stored seq). Events are mirrored |
| 121 | + into the `events` table **keeping the upstream `seq`**, applied to run state, |
| 122 | + and published on an in-process bus. Client SSE routes replay from SQLite |
| 123 | + (`Last-Event-ID`), then attach to the bus — with gap-filling if live events |
| 124 | + race the replay. The same table serves cloud runs, local runs, workflow |
| 125 | + runs, and per-user notification feeds (`stream_kind` + `stream_id`). |
| 126 | +- **Webhooks as reconciliation.** `POST /webhooks/coasty` verifies the HMAC |
| 127 | + against the per-run secret (looked up by the payload's run id) over the |
| 128 | + exact raw bytes before *any* state change; stale/tampered/unknown deliveries |
| 129 | + get 401. Verified events update run state and post to the owner's |
| 130 | + notification stream — so terminal transitions arrive even if an SSE |
| 131 | + subscription dropped. `GET /api/runs/:id` additionally reconciles |
| 132 | + non-terminal runs against Coasty on read. |
| 133 | +- **Local runs.** The desktop app mirrors its LocalExecutor loop through |
| 134 | + `POST /api/local-runs(/:id/events)`, so a run on your laptop is supervisable |
| 135 | + from your phone exactly like a cloud run — same timeline route, same |
| 136 | + approval notifications. |
| 137 | +- **Persistence**: `node:sqlite` behind a repository class (`db.ts`); events |
| 138 | + have `(stream_kind, stream_id, seq)` primary keys so ingestion is idempotent |
| 139 | + and replay is a range scan. Postgres is a contained swap (`DEPLOYMENT.md`). |
| 140 | + |
| 141 | +## Realtime model (end to end) |
| 142 | + |
| 143 | +```text |
| 144 | +Coasty SSE ──► Ingestor ──► events table (durable, seq) ──► bus ──► client SSE |
| 145 | +Coasty webhooks ──► HMAC verify ──► state + notification stream ──► bus ──► feeds |
| 146 | +desktop local loop ──► POST /api/local-runs/:id/events ──► same table/bus |
| 147 | +``` |
| 148 | + |
| 149 | +Every hop resumes: the Ingestor reconnects to Coasty with `Last-Event-ID`; |
| 150 | +clients reconnect to the backend the same way; mobile polls |
| 151 | +`/api/runs/:id/events.json?after=N` (React Native fetch lacks streaming). |
| 152 | +Nothing is lost or duplicated because the durable seq is the single cursor. |
| 153 | + |
| 154 | +## apps/desktop — local control, safely |
| 155 | + |
| 156 | +Electron with `contextIsolation: true`, `nodeIntegration: false`. The renderer |
| 157 | +is the same SPA as the web app; a small preload exposes |
| 158 | +`window.cowork = { platform, backendUrl, startLocalRun, cancelLocalRun }`. |
| 159 | +`LocalRunManager` (main process) runs core's `runAgentLoop` with |
| 160 | +`LocalExecutor`, gets predictions through the backend's `/api/proxy/sessions` |
| 161 | +(key stays server-side), and mirrors events to `/api/local-runs` in batches. |
| 162 | +The E2E suite deliberately never starts a local run (it would seize the real |
| 163 | +mouse); that path is covered by unit tests plus an opt-in native capture smoke |
| 164 | +test (`COWORK_NATIVE_SMOKE=1`). |
| 165 | + |
| 166 | +## apps/web + packages/ui |
| 167 | + |
| 168 | +One Vite SPA serves browsers and the desktop webview. `packages/ui` is a |
| 169 | +dependency-free design system (dark-first tokens, accessible primitives, |
| 170 | +domain components like `EventTimeline`, `ScreenView`, `ApprovalBar`, |
| 171 | +`WorkflowStepTree`); apps map API DTOs into its presentational props. The live |
| 172 | +screen view polls machine screenshots every 2s while a run is active — frames |
| 173 | +are cross-platform and cheap (`DECISIONS.md` A3). |
| 174 | + |
| 175 | +## apps/mobile |
| 176 | + |
| 177 | +Expo/React Native with zero extra native deps; every screen is |
| 178 | +react-native-web-compatible, which is how the same UI is verified in CI |
| 179 | +(D7). Timelines poll the REST fallback; approvals hit the same resume routes. |
| 180 | + |
| 181 | +## tools/mock-coasty |
| 182 | + |
| 183 | +A faithful offline twin of the documented API: key kinds + billing headers, |
| 184 | +the full error catalog, exact pricing math, run/workflow steppers with the |
| 185 | +documented state machine, durable SSE with replay, HMAC-signed webhook |
| 186 | +delivery, sandbox machines with generated-PNG screenshots. Deliberately does |
| 187 | +**not** import `core` (D9) so contract bugs can't hide; behavior triggers in |
| 188 | +task text (`NEEDS_HUMAN`, `MUST_FAIL`, `RUN_LONG`, `MOCK_DONE`) make every |
| 189 | +lifecycle deterministic for tests and demos. |
| 190 | + |
| 191 | +## Data model (SQLite) |
| 192 | + |
| 193 | +```text |
| 194 | +users(id, email, budget_cents, created_at) |
| 195 | +sessions(token_hash PK, user_id, expires_at) -- tokens stored hashed |
| 196 | +runs(id, user_id, kind coasty|local, coasty_run_id, machine_id, task, status, |
| 197 | + cua_version, max_steps, budget_cents, cost_cents, steps_completed, |
| 198 | + result_json, error_json, awaiting_human_reason, webhook_secret, …) |
| 199 | +workflow_runs(id, user_id, coasty_workflow_run_id, workflow_id, status, |
| 200 | + budget_cents, spent_cents, awaiting_step_id, webhook_secret, …) |
| 201 | +events(stream_kind, stream_id, seq, type, data_json, created_at, |
| 202 | + PRIMARY KEY (stream_kind, stream_id, seq)) -- the realtime spine |
| 203 | +``` |
0 commit comments