You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v0.3.0 — phone-surface bridge for the agent (notify, intent, share, camera, mic, location, inbox)
A localhost HTTP server (127.0.0.1:9998) gated by a per-launch bearer
token exposes the phone's surface to Pi: notifications, generic Android
intents (+ open-url / dial / settings convenience wrappers), share sheet,
clipboard, battery, location, camera/photo, mic/record, and an inbox
that queues share-target payloads and pi://agent/... deep-links for the
agent to drain. No companion APK required.
What landed:
- PocketPiApiServer + Handlers + Token + Inbox under
android/app/src/main/java/com/zosma/pocketpi/api/; owned by
PocketPiService.
- Three transient foreground services (Location/Camera/Mic) so the green
privacy dot only shows during active capture.
- MainActivity.onNewIntent routes share + pi:// deep-link intents into
\$HOME/.pi/agent/inbox/<ts>-<rand>.json. Runtime permissions for camera,
mic, location, notifications requested at first launch.
- AndroidManifest: 4 new permissions, share + pi://agent/ intent-filters,
3 new <service> entries with location/camera/microphone FGS types.
- CameraX deps added to gradle. targetSdk stays at 28 (Termux
exec-from-data invariant unchanged).
- bootstrap/postinstall.sh writes a pocket-pi-api curl shim under
\$PREFIX/bin/ and adds dashboard tool overrides for node/npm/git/ps/pgrep
so they flip from "not found" to override in Settings > Tools.
- build-bootstrap.sh bundles the rewritten pi-termux-tools extension into
\$PREFIX/lib/pocket-pi/ and postinstall pi-installs it. Extension
routes every call to the HTTP API via the bearer token at
\$PREFIX/etc/pocket-pi/api-token; matches Pi's actual registerTool
shape (single-object with async execute); keeps the 10 termux_* tool
names for back-compat and adds 7 new pocket_pi_* tools (intent_send,
intent_open_url, intent_dial, intent_settings, mic_record, inbox_list,
inbox_pop).
Validated end-to-end on the emulator: notify, intent + open_url + dial
+ settings, share, clipboard, battery, location (with adb emu geo fix),
camera/photo (60KB JPEG), mic/record (27KB AAC), inbox/list + pop after
firing both ACTION_SEND and a pi://agent/... deep link, and the agent
calling termux_notify + the intent suite end-to-end via the dashboard.
UI automation (Accessibility Service) intentionally deferred to v0.4;
README explains the roadmap.
versionCode 3 -> 4, versionName 0.2.1 -> 0.3.0.
Copy file name to clipboardExpand all lines: README.md
+24-10Lines changed: 24 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,14 +7,12 @@ Pocket Pi is a thin Android wrapper around two upstream projects that do the rea
7
7
-**[Pi coding agent](https://github.com/mariozechner/pi-coding-agent)** by [Mario Zechner](https://github.com/mariozechner) — the underlying agent engine. The canonical home is now [earendil-works/pi-coding-agent](https://github.com/earendil-works/pi-coding-agent).
8
8
-**[pi-agent-dashboard](https://github.com/BlackBeltTechnology/pi-agent-dashboard)** by [BlackBelt Technology](https://github.com/BlackBeltTechnology) — the web chat UI rendered inside the APK's WebView (slash commands, session history, model switcher, provider settings, OAuth flows). [pi-anthropic-messages](https://github.com/BlackBeltTechnology/pi-anthropic-messages) (also BlackBelt) is the Anthropic protocol bridge that makes Claude Pro/Max OAuth tokens usable from Pi.
9
9
10
-
What Pocket Pi adds is the packaging: a Termux runtime, postinstall script, an Android service that supervises `pi --mode rpc` + the dashboard's Node server, and a Compose WebView with a small recovery UI for when the bootstrap stalls.
11
-
12
-
> POC, fast-tracked. We bundle Termux's Linux runtime inside an Android app so anyone can try a single-tap Pi install on a phone. Whether this approach is worth productizing (vs. building a proper native Android client) is the open question — that's what the POC is for.
10
+
What Pocket Pi adds is the packaging: a Termux runtime, postinstall script, an Android service that supervises `pi --mode rpc` + the dashboard's Node server, a Compose WebView with a small recovery UI for when the bootstrap stalls, and an on-device HTTP bridge (`127.0.0.1:9998`, per-launch bearer token) that lets the agent reach Android capabilities — notifications, intents, share-sheet, camera, mic, location, clipboard, deep-link inbox — without any companion APK.
13
11
14
12
## Install
15
13
16
-
1. Grab the latest APK — **v0.2.1** — from the [Releases page](https://github.com/CelestialCreator/pocket-pi/releases/latest), or directly: [pocket-pi-v0.2.1.apk](https://github.com/CelestialCreator/pocket-pi/releases/download/v0.2.1/pocket-pi-v0.2.1.apk) (40 MB, aarch64 only).
17
-
2. Sideload — tap the APK on the phone (allow install from unknown sources for your browser/file manager), or `adb install pocket-pi-v0.2.1.apk`.
14
+
1. Grab the latest APK — **v0.3.0** — from the [Releases page](https://github.com/CelestialCreator/pocket-pi/releases/latest), or directly: [pocket-pi-v0.3.0.apk](https://github.com/CelestialCreator/pocket-pi/releases/download/v0.3.0/pocket-pi-v0.3.0.apk) (40 MB, aarch64 only).
15
+
2. Sideload — tap the APK on the phone (allow install from unknown sources for your browser/file manager), or `adb install pocket-pi-v0.3.0.apk`.
18
16
3. Open the app. First launch runs the bootstrap (3–5 min on Wi-Fi: extracts Termux, installs Node + npm packages, registers Pi extensions).
19
17
4. When the dashboard loads, tap its **⚙** (top-right of the page chrome) → **Providers** → add at least one provider. See [Providers — what works](#providers--what-works) below.
20
18
5. Pick a model, chat away.
@@ -47,9 +45,9 @@ If you want to use Claude Pro/Max OAuth on Pocket Pi but prefer signing in on a
| Compose-side UI | Loading / recovery pane only (Pocket Pi splash, postinstall log tail, inline `Restart Pi` + `Re-run setup` buttons after a 15s stall). Everything else lives in the dashboard's own settings UI. | Pocket Pi |
52
-
| Native bridges |`xdg-open` shim (postinstall) → Android `ACTION_VIEW` so the dashboard's OAuth flows open the device's default browser. Compose-side `PocketPi.notify/share/openExternal/toast` JS interface for the WebView. | Pocket Pi |
50
+
| Native bridges |**Localhost HTTP API** on `127.0.0.1:9998` (bearer token at `$PREFIX/etc/pocket-pi/api-token`, mode 0600, rotated per service start) exposing notify / share / intent / clipboard / battery / location / camera/photo / mic/record / inbox to the agent. `xdg-open` shim (postinstall) → Android `ACTION_VIEW`. Compose-side `PocketPi.notify/share/openExternal/toast` JS interface for the WebView. Share-target + `pi://agent/…` deep-link intent-filters queue payloads into `$HOME/.pi/agent/inbox/`. | Pocket Pi |
The current build uses `applicationId = com.termux` so the upstream Termux bootstrap binaries (which bake in the path `/data/data/com.termux/files/usr`) work without recompiling. To ship under a real app id, run `bootstrap/rebuild-with-prefix.sh` (Docker, 4–12 h on Apple Silicon) to produce a bootstrap pinned to a custom prefix, then flip `applicationId` in `android/app/build.gradle.kts`.
94
92
95
-
## What works / what doesn't (v0.2.1)
93
+
## What works / what doesn't (v0.3.0)
96
94
97
95
|| Status |
98
96
|---|---|
99
97
| Single-APK install on aarch64 phones | ✓ |
100
98
| pi-agent-dashboard as the WebView UI (slash commands, model switcher, session history all native) | ✓ |
101
99
| API-key chat for OpenAI / Anthropic API / Google Gemini (AI Studio) / Groq / Mistral / xAI / NVIDIA NIM / OpenRouter (tool use, cost tracking) | ✓ |
|**Phone-surface tools for the agent** (notifications, share sheet, generic Android intents, dial, settings deep-links, clipboard, battery) | ✓ — new in v0.3.0 |
102
+
|**Location** (fused gps/network, foreground only) | ✓ — new in v0.3.0 |
103
+
|**Camera** (one-shot still capture, front/back) | ✓ — new in v0.3.0 |
104
+
|**Microphone** (record N seconds to AAC/.m4a) | ✓ — new in v0.3.0 |
105
+
|**Incoming intents** — "Share to Pocket Pi" target, `pi://agent/…` deep links, queued for the agent | ✓ — new in v0.3.0 |
106
+
|**`pocket-pi-api` shell shim** — `pocket-pi-api notify '{…}'`, `pocket-pi-api camera/photo '{…}'` etc. from any Termux session | ✓ — new in v0.3.0 |
103
107
| Recovery UI when the dashboard doesn't bind within 15s (inline Restart Pi / Re-run setup buttons) | ✓ |
104
108
| Other OAuth providers (Gemini CLI, ChatGPT Codex, GitHub Copilot, Antigravity) | sign-in completes but no models — Pi-side protocol bridges not bundled. Use the API-key path instead. |
105
109
| Shell-session feature inside the dashboard | not yet — `node-pty` has no android-arm64 prebuild and is stubbed; chat/files/tasks work, terminal tab will fail |
110
+
| Mobile UI automation (the agent driving other apps) | not yet — deferred to v0.4; needs an Accessibility Service (the user has to enable it manually in Settings, no runtime dialog exists). Plan is to vendor [droidrun/droidrun-portal](https://github.com/droidrun/droidrun-portal). |
111
+
| Background location ("Allow all the time") | not yet — foreground only this release. Add the Settings escalation when a real use case appears. |
| Old Android WebView builds (Chrome < ~120) | emulator system images ship stale WebView; real devices auto-update — confirmed working in Chrome 140+ |
108
114
@@ -122,4 +128,12 @@ MIT for Pocket Pi's own source. Third-party runtime components keep their own li
122
128
123
129
## Status
124
130
125
-
v0.2.1 — POC, shippable. The Termux-fork-inside-an-APK approach works: pi-agent-dashboard is the chat UI, single-tap APK install handles the rest. Whether to invest in productizing it (custom prefix bootstrap, real applicationId, signed release builds, Play Store, etc.) or rewrite this as a proper native Android client that talks to Pi over the network is the question this POC is meant to inform.
131
+
**v0.3.0 — agent has the phone.** Daily-drivable. The Termux-runtime-inside-an-APK approach lands cleanly: single-tap install, dashboard binds the WebView, and the agent now has a real Android surface to act on — notifications, intents (both directions), share-sheet, camera, mic, location, clipboard, deep-link inbox — all gated by a per-launch bearer token over localhost. No companion APK, no root, no shell setup.
132
+
133
+
Roadmap from here, in rough priority order:
134
+
135
+
-**v0.4 — UI automation.** Vendor [droidrun/droidrun-portal](https://github.com/droidrun/droidrun-portal)'s Kotlin AccessibilityService into the APK so the agent can read screens + dispatch taps/swipes/gestures against other apps. The one irreducible cost: Android forces the user to enable Accessibility in Settings manually (no runtime dialog exists for this permission).
136
+
-**Background location escalation** when a real use case lands ("Allow all the time" → `ACCESS_BACKGROUND_LOCATION`).
137
+
-**More OAuth providers** end-to-end (Gemini CLI, ChatGPT Codex, GitHub Copilot, Antigravity) — each requires a small Pi-side protocol bridge analogous to `pi-anthropic-messages`.
138
+
-**Custom-prefix bootstrap** so `applicationId` can move off `com.termux`. Currently a 4–12 h Docker build on Apple Silicon; once it's clean, the path to a real signed release on Play Store is short.
139
+
-**Working shell-session tab** in the dashboard once a working `node-pty` android-arm64 prebuild exists (currently stubbed).
0 commit comments