Skip to content

Commit 91a466d

Browse files
v0.3.0 — phone-surface bridge for the agent (notify, intent, share, camera, mic, location, inbox)
A localhost HTTP server (127.0.0.1:9998) gated by a per-launch bearer token exposes the phone's surface to Pi: notifications, generic Android intents (+ open-url / dial / settings convenience wrappers), share sheet, clipboard, battery, location, camera/photo, mic/record, and an inbox that queues share-target payloads and pi://agent/... deep-links for the agent to drain. No companion APK required. What landed: - PocketPiApiServer + Handlers + Token + Inbox under android/app/src/main/java/com/zosma/pocketpi/api/; owned by PocketPiService. - Three transient foreground services (Location/Camera/Mic) so the green privacy dot only shows during active capture. - MainActivity.onNewIntent routes share + pi:// deep-link intents into \$HOME/.pi/agent/inbox/<ts>-<rand>.json. Runtime permissions for camera, mic, location, notifications requested at first launch. - AndroidManifest: 4 new permissions, share + pi://agent/ intent-filters, 3 new <service> entries with location/camera/microphone FGS types. - CameraX deps added to gradle. targetSdk stays at 28 (Termux exec-from-data invariant unchanged). - bootstrap/postinstall.sh writes a pocket-pi-api curl shim under \$PREFIX/bin/ and adds dashboard tool overrides for node/npm/git/ps/pgrep so they flip from "not found" to override in Settings > Tools. - build-bootstrap.sh bundles the rewritten pi-termux-tools extension into \$PREFIX/lib/pocket-pi/ and postinstall pi-installs it. Extension routes every call to the HTTP API via the bearer token at \$PREFIX/etc/pocket-pi/api-token; matches Pi's actual registerTool shape (single-object with async execute); keeps the 10 termux_* tool names for back-compat and adds 7 new pocket_pi_* tools (intent_send, intent_open_url, intent_dial, intent_settings, mic_record, inbox_list, inbox_pop). Validated end-to-end on the emulator: notify, intent + open_url + dial + settings, share, clipboard, battery, location (with adb emu geo fix), camera/photo (60KB JPEG), mic/record (27KB AAC), inbox/list + pop after firing both ACTION_SEND and a pi://agent/... deep link, and the agent calling termux_notify + the intent suite end-to-end via the dashboard. UI automation (Accessibility Service) intentionally deferred to v0.4; README explains the roadmap. versionCode 3 -> 4, versionName 0.2.1 -> 0.3.0.
1 parent 5844492 commit 91a466d

19 files changed

Lines changed: 1713 additions & 240 deletions

README.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,12 @@ Pocket Pi is a thin Android wrapper around two upstream projects that do the rea
77
- **[Pi coding agent](https://github.com/mariozechner/pi-coding-agent)** by [Mario Zechner](https://github.com/mariozechner) — the underlying agent engine. The canonical home is now [earendil-works/pi-coding-agent](https://github.com/earendil-works/pi-coding-agent).
88
- **[pi-agent-dashboard](https://github.com/BlackBeltTechnology/pi-agent-dashboard)** by [BlackBelt Technology](https://github.com/BlackBeltTechnology) — the web chat UI rendered inside the APK's WebView (slash commands, session history, model switcher, provider settings, OAuth flows). [pi-anthropic-messages](https://github.com/BlackBeltTechnology/pi-anthropic-messages) (also BlackBelt) is the Anthropic protocol bridge that makes Claude Pro/Max OAuth tokens usable from Pi.
99

10-
What Pocket Pi adds is the packaging: a Termux runtime, postinstall script, an Android service that supervises `pi --mode rpc` + the dashboard's Node server, and a Compose WebView with a small recovery UI for when the bootstrap stalls.
11-
12-
> POC, fast-tracked. We bundle Termux's Linux runtime inside an Android app so anyone can try a single-tap Pi install on a phone. Whether this approach is worth productizing (vs. building a proper native Android client) is the open question — that's what the POC is for.
10+
What Pocket Pi adds is the packaging: a Termux runtime, postinstall script, an Android service that supervises `pi --mode rpc` + the dashboard's Node server, a Compose WebView with a small recovery UI for when the bootstrap stalls, and an on-device HTTP bridge (`127.0.0.1:9998`, per-launch bearer token) that lets the agent reach Android capabilities — notifications, intents, share-sheet, camera, mic, location, clipboard, deep-link inbox — without any companion APK.
1311

1412
## Install
1513

16-
1. Grab the latest APK — **v0.2.1** — from the [Releases page](https://github.com/CelestialCreator/pocket-pi/releases/latest), or directly: [pocket-pi-v0.2.1.apk](https://github.com/CelestialCreator/pocket-pi/releases/download/v0.2.1/pocket-pi-v0.2.1.apk) (40 MB, aarch64 only).
17-
2. Sideload — tap the APK on the phone (allow install from unknown sources for your browser/file manager), or `adb install pocket-pi-v0.2.1.apk`.
14+
1. Grab the latest APK — **v0.3.0** — from the [Releases page](https://github.com/CelestialCreator/pocket-pi/releases/latest), or directly: [pocket-pi-v0.3.0.apk](https://github.com/CelestialCreator/pocket-pi/releases/download/v0.3.0/pocket-pi-v0.3.0.apk) (40 MB, aarch64 only).
15+
2. Sideload — tap the APK on the phone (allow install from unknown sources for your browser/file manager), or `adb install pocket-pi-v0.3.0.apk`.
1816
3. Open the app. First launch runs the bootstrap (3–5 min on Wi-Fi: extracts Termux, installs Node + npm packages, registers Pi extensions).
1917
4. When the dashboard loads, tap its **** (top-right of the page chrome) → **Providers** → add at least one provider. See [Providers — what works](#providers--what-works) below.
2018
5. Pick a model, chat away.
@@ -47,9 +45,9 @@ If you want to use Claude Pro/Max OAuth on Pocket Pi but prefer signing in on a
4745
| Linux runtime | Termux bootstrap (Node 25, Python, git, ripgrep, openssl) — `bootstrap/` | [Termux](https://termux.dev/) |
4846
| Chat UI | [`@blackbelt-technology/pi-agent-dashboard`](https://www.npmjs.com/package/@blackbelt-technology/pi-agent-dashboard) — binds `:8000` (browser UI) + `:9999` (pi extension bridge); rendered in the app WebView. Slash commands, model switching, session history, provider settings, OAuth. | [BlackBelt Technology](https://github.com/BlackBeltTechnology/pi-agent-dashboard) |
4947
| Agent engine | [`@earendil-works/pi-coding-agent`](https://www.npmjs.com/package/@earendil-works/pi-coding-agent), spawned as `pi --mode rpc` | [Mario Zechner](https://github.com/mariozechner/pi-coding-agent) / [earendil-works](https://github.com/earendil-works/pi-coding-agent) |
50-
| Pi extensions | [`pi-anthropic-messages`](https://github.com/BlackBeltTechnology/pi-anthropic-messages) (Claude Pro/Max OAuth + tool-call rendering) + `pi-web-access`, `pi-subagents`, `oh-pi`, `@aliou/pi-guardrails`, `pi-mcp-adapter`, `pk-pi-hermes-evolve` | various (see `bootstrap/npm-packages.txt`) |
48+
| Pi extensions | [`pi-anthropic-messages`](https://github.com/BlackBeltTechnology/pi-anthropic-messages) (Claude Pro/Max OAuth + tool-call rendering) + `pi-web-access`, `pi-subagents`, `oh-pi`, `@aliou/pi-guardrails`, `pi-mcp-adapter`, `pk-pi-hermes-evolve`, **`pi-termux-tools`** (Pocket Pi's phone-surface tools — notifications, intents, camera, mic, location, inbox) | various (see `bootstrap/npm-packages.txt`) + `extensions/pi-termux-tools/` |
5149
| Compose-side UI | Loading / recovery pane only (Pocket Pi splash, postinstall log tail, inline `Restart Pi` + `Re-run setup` buttons after a 15s stall). Everything else lives in the dashboard's own settings UI. | Pocket Pi |
52-
| Native bridges | `xdg-open` shim (postinstall) → Android `ACTION_VIEW` so the dashboard's OAuth flows open the device's default browser. Compose-side `PocketPi.notify/share/openExternal/toast` JS interface for the WebView. | Pocket Pi |
50+
| Native bridges | **Localhost HTTP API** on `127.0.0.1:9998` (bearer token at `$PREFIX/etc/pocket-pi/api-token`, mode 0600, rotated per service start) exposing notify / share / intent / clipboard / battery / location / camera/photo / mic/record / inbox to the agent. `xdg-open` shim (postinstall) → Android `ACTION_VIEW`. Compose-side `PocketPi.notify/share/openExternal/toast` JS interface for the WebView. Share-target + `pi://agent/…` deep-link intent-filters queue payloads into `$HOME/.pi/agent/inbox/`. | Pocket Pi |
5351

5452
## Repo layout
5553

@@ -87,22 +85,30 @@ cd ../pi-skill-learner && pnpm install && pnpm build
8785

8886
# 3. APK
8987
cd ../../android && ./gradlew :app:assembleDebug
90-
# Output: android/app/build/outputs/apk/debug/app-debug.apk (~67 MB)
88+
# Output: android/app/build/outputs/apk/debug/app-debug.apk (~40 MB)
9189
```
9290

9391
The current build uses `applicationId = com.termux` so the upstream Termux bootstrap binaries (which bake in the path `/data/data/com.termux/files/usr`) work without recompiling. To ship under a real app id, run `bootstrap/rebuild-with-prefix.sh` (Docker, 4–12 h on Apple Silicon) to produce a bootstrap pinned to a custom prefix, then flip `applicationId` in `android/app/build.gradle.kts`.
9492

95-
## What works / what doesn't (v0.2.1)
93+
## What works / what doesn't (v0.3.0)
9694

9795
| | Status |
9896
|---|---|
9997
| Single-APK install on aarch64 phones ||
10098
| pi-agent-dashboard as the WebView UI (slash commands, model switcher, session history all native) ||
10199
| API-key chat for OpenAI / Anthropic API / Google Gemini (AI Studio) / Groq / Mistral / xAI / NVIDIA NIM / OpenRouter (tool use, cost tracking) ||
102100
| Claude Pro/Max **OAuth** Sign-In → device default browser → on-device callback ||
101+
| **Phone-surface tools for the agent** (notifications, share sheet, generic Android intents, dial, settings deep-links, clipboard, battery) | ✓ — new in v0.3.0 |
102+
| **Location** (fused gps/network, foreground only) | ✓ — new in v0.3.0 |
103+
| **Camera** (one-shot still capture, front/back) | ✓ — new in v0.3.0 |
104+
| **Microphone** (record N seconds to AAC/.m4a) | ✓ — new in v0.3.0 |
105+
| **Incoming intents** — "Share to Pocket Pi" target, `pi://agent/…` deep links, queued for the agent | ✓ — new in v0.3.0 |
106+
| **`pocket-pi-api` shell shim**`pocket-pi-api notify '{…}'`, `pocket-pi-api camera/photo '{…}'` etc. from any Termux session | ✓ — new in v0.3.0 |
103107
| Recovery UI when the dashboard doesn't bind within 15s (inline Restart Pi / Re-run setup buttons) ||
104108
| Other OAuth providers (Gemini CLI, ChatGPT Codex, GitHub Copilot, Antigravity) | sign-in completes but no models — Pi-side protocol bridges not bundled. Use the API-key path instead. |
105109
| Shell-session feature inside the dashboard | not yet — `node-pty` has no android-arm64 prebuild and is stubbed; chat/files/tasks work, terminal tab will fail |
110+
| Mobile UI automation (the agent driving other apps) | not yet — deferred to v0.4; needs an Accessibility Service (the user has to enable it manually in Settings, no runtime dialog exists). Plan is to vendor [droidrun/droidrun-portal](https://github.com/droidrun/droidrun-portal). |
111+
| Background location ("Allow all the time") | not yet — foreground only this release. Add the Settings escalation when a real use case appears. |
106112
| `applicationId``com.termux` | not yet — requires custom bootstrap rebuild |
107113
| Old Android WebView builds (Chrome < ~120) | emulator system images ship stale WebView; real devices auto-update — confirmed working in Chrome 140+ |
108114

@@ -122,4 +128,12 @@ MIT for Pocket Pi's own source. Third-party runtime components keep their own li
122128

123129
## Status
124130

125-
v0.2.1 — POC, shippable. The Termux-fork-inside-an-APK approach works: pi-agent-dashboard is the chat UI, single-tap APK install handles the rest. Whether to invest in productizing it (custom prefix bootstrap, real applicationId, signed release builds, Play Store, etc.) or rewrite this as a proper native Android client that talks to Pi over the network is the question this POC is meant to inform.
131+
**v0.3.0 — agent has the phone.** Daily-drivable. The Termux-runtime-inside-an-APK approach lands cleanly: single-tap install, dashboard binds the WebView, and the agent now has a real Android surface to act on — notifications, intents (both directions), share-sheet, camera, mic, location, clipboard, deep-link inbox — all gated by a per-launch bearer token over localhost. No companion APK, no root, no shell setup.
132+
133+
Roadmap from here, in rough priority order:
134+
135+
- **v0.4 — UI automation.** Vendor [droidrun/droidrun-portal](https://github.com/droidrun/droidrun-portal)'s Kotlin AccessibilityService into the APK so the agent can read screens + dispatch taps/swipes/gestures against other apps. The one irreducible cost: Android forces the user to enable Accessibility in Settings manually (no runtime dialog exists for this permission).
136+
- **Background location escalation** when a real use case lands ("Allow all the time" → `ACCESS_BACKGROUND_LOCATION`).
137+
- **More OAuth providers** end-to-end (Gemini CLI, ChatGPT Codex, GitHub Copilot, Antigravity) — each requires a small Pi-side protocol bridge analogous to `pi-anthropic-messages`.
138+
- **Custom-prefix bootstrap** so `applicationId` can move off `com.termux`. Currently a 4–12 h Docker build on Apple Silicon; once it's clean, the path to a real signed release on Play Store is short.
139+
- **Working shell-session tab** in the dashboard once a working `node-pty` android-arm64 prebuild exists (currently stubbed).

android/app/build.gradle.kts

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ android {
2626
// legacy semantics that allow exec from app-data, which Pi (and the
2727
// entire Termux runtime) relies on.
2828
targetSdk = 28
29-
versionCode = 3
30-
versionName = "0.2.1"
29+
versionCode = 4
30+
versionName = "0.3.0"
3131

3232
ndk { abiFilters += listOf("arm64-v8a") }
3333
}
@@ -75,4 +75,7 @@ dependencies {
7575
implementation(libs.androidx.work)
7676
implementation(libs.kotlinx.serialization.json)
7777
implementation(libs.kotlinx.coroutines.android)
78+
implementation(libs.androidx.camera.core)
79+
implementation(libs.androidx.camera.camera2)
80+
implementation(libs.androidx.camera.lifecycle)
7881
}

android/app/src/main/AndroidManifest.xml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,14 @@
66
<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
77
<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
88
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC" />
9+
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_LOCATION" />
10+
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_CAMERA" />
11+
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_MICROPHONE" />
912
<uses-permission android:name="android.permission.WAKE_LOCK" />
1013
<uses-permission android:name="android.permission.RECORD_AUDIO" />
1114
<uses-permission android:name="android.permission.CAMERA" />
1215
<uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
16+
<uses-permission android:name="android.permission.ACCESS_COARSE_LOCATION" />
1317
<uses-permission android:name="android.permission.READ_MEDIA_IMAGES" />
1418
<uses-permission android:name="android.permission.REQUEST_IGNORE_BATTERY_OPTIMIZATIONS" />
1519

@@ -36,6 +40,31 @@
3640
<action android:name="android.intent.action.MAIN" />
3741
<category android:name="android.intent.category.LAUNCHER" />
3842
</intent-filter>
43+
44+
<!--
45+
Deep link: pi://agent/<path>?<query>. The agent doesn't need a
46+
verified App Link (https + assetlinks.json) since pi:// is a
47+
custom scheme owned by Pocket Pi. MainActivity.onNewIntent
48+
routes incoming Intents into the inbox file the agent polls.
49+
-->
50+
<intent-filter>
51+
<action android:name="android.intent.action.VIEW" />
52+
<category android:name="android.intent.category.DEFAULT" />
53+
<category android:name="android.intent.category.BROWSABLE" />
54+
<data android:scheme="pi" android:host="agent" />
55+
</intent-filter>
56+
57+
<!--
58+
Share target: "Share to Pocket Pi" from any app. Text + image
59+
MIME types cover the common cases (selected text, screenshots,
60+
photos). The payload (text or content URI) lands in the inbox.
61+
-->
62+
<intent-filter>
63+
<action android:name="android.intent.action.SEND" />
64+
<category android:name="android.intent.category.DEFAULT" />
65+
<data android:mimeType="text/*" />
66+
<data android:mimeType="image/*" />
67+
</intent-filter>
3968
</activity>
4069

4170

@@ -44,6 +73,26 @@
4473
android:exported="false"
4574
android:foregroundServiceType="dataSync" />
4675

76+
<!--
77+
Transient foreground services for hardware capture. Each runs only
78+
while a capture is in flight, so the system privacy dot (mic/camera)
79+
and notification only show when the agent is actively using the
80+
hardware. Service classes drive runtime permissions for their
81+
respective sensors via PocketPiApiServer dispatching to them.
82+
-->
83+
<service
84+
android:name=".service.LocationFgService"
85+
android:exported="false"
86+
android:foregroundServiceType="location" />
87+
<service
88+
android:name=".service.CameraFgService"
89+
android:exported="false"
90+
android:foregroundServiceType="camera" />
91+
<service
92+
android:name=".service.MicFgService"
93+
android:exported="false"
94+
android:foregroundServiceType="microphone" />
95+
4796
<provider
4897
android:name="androidx.core.content.FileProvider"
4998
android:authorities="${applicationId}.fileprovider"

android/app/src/main/java/com/zosma/pocketpi/MainActivity.kt

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
package com.zosma.pocketpi
22

3+
import android.Manifest
34
import android.content.Intent
5+
import android.content.pm.PackageManager
46
import android.os.Bundle
57
import androidx.activity.ComponentActivity
68
import androidx.activity.compose.setContent
9+
import androidx.activity.result.contract.ActivityResultContracts
10+
import androidx.core.content.ContextCompat
711
import androidx.core.view.WindowCompat
812
import androidx.compose.foundation.layout.fillMaxSize
913
import androidx.compose.material3.MaterialTheme
@@ -15,18 +19,30 @@ import androidx.compose.runtime.mutableStateOf
1519
import androidx.compose.runtime.remember
1620
import androidx.compose.runtime.setValue
1721
import androidx.compose.ui.Modifier
22+
import com.zosma.pocketpi.api.Inbox
1823
import com.zosma.pocketpi.pi.Bootstrapper
1924
import com.zosma.pocketpi.service.PocketPiService
2025
import com.zosma.pocketpi.ui.OnboardingScreen
2126
import com.zosma.pocketpi.ui.WebViewScreen
2227
import com.zosma.pocketpi.ui.theme.PocketPiTheme
2328

2429
class MainActivity : ComponentActivity() {
30+
private val permissionsLauncher = registerForActivityResult(
31+
ActivityResultContracts.RequestMultiplePermissions(),
32+
) { /* best-effort: tools that need a missing perm return 403 */ }
33+
2534
override fun onCreate(savedInstanceState: Bundle?) {
2635
super.onCreate(savedInstanceState)
2736
// Keep the system insets ours — the dashboard handles its own
2837
// safe-area math via the viewport meta and dvh units.
2938
WindowCompat.setDecorFitsSystemWindows(window, true)
39+
// First-launch permission ask — non-blocking, the WebView/onboarding
40+
// flow renders regardless. Tools whose permission was denied just
41+
// return a 403 the user can fix in System Settings later.
42+
requestRuntimePermissionsIfNeeded()
43+
// Route the launching intent into the inbox if it's a share or
44+
// pi:// deep-link target.
45+
Inbox.writeInboxEntry(applicationContext, intent)
3046
setContent {
3147
PocketPiTheme {
3248
Surface(modifier = Modifier.fillMaxSize(), color = MaterialTheme.colorScheme.background) {
@@ -36,6 +52,27 @@ class MainActivity : ComponentActivity() {
3652
}
3753
}
3854

55+
override fun onNewIntent(intent: Intent) {
56+
super.onNewIntent(intent)
57+
setIntent(intent)
58+
Inbox.writeInboxEntry(applicationContext, intent)
59+
}
60+
61+
private fun requestRuntimePermissionsIfNeeded() {
62+
val want = listOf(
63+
Manifest.permission.CAMERA,
64+
Manifest.permission.RECORD_AUDIO,
65+
Manifest.permission.ACCESS_FINE_LOCATION,
66+
Manifest.permission.POST_NOTIFICATIONS,
67+
)
68+
val missing = want.filter {
69+
ContextCompat.checkSelfPermission(this, it) != PackageManager.PERMISSION_GRANTED
70+
}
71+
if (missing.isNotEmpty()) {
72+
permissionsLauncher.launch(missing.toTypedArray())
73+
}
74+
}
75+
3976
@Composable
4077
private fun Root() {
4178
val activity = this

0 commit comments

Comments
 (0)