Skip to content

Commit 8530919

Browse files
Merge pull request #22 from google-ai-edge:clean-cli-v3
PiperOrigin-RevId: 915209537
2 parents 159fa61 + f494357 commit 8530919

7 files changed

Lines changed: 177 additions & 72 deletions

File tree

.agents/skills/litert_cli/SKILL.md

Lines changed: 61 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,21 @@ virtual environment and `litert-cli` is installed.
1616

1717
### 1. Check/Create Virtual Environment
1818

19-
We highly recommend using **`uv`** (written in Rust) for extremely fast
20-
environment management and package installs.
19+
We highly recommend using **`uv`** (written in Rust) for extremely fast environment management and package installs.
2120

22-
**Option A: Use UV (Recommended - Super Fast):** ```bash
21+
**Option A: Use UV (Recommended - Super Fast):**
22+
```bash
2323

2424
# Create a virtual environment with Python 3.13.
2525

2626
# We use --seed to pre-install pip, setuptools, and wheel inside the venv.
2727

2828
# This is critical to allow our CLI dynamic dependency auto-installers (deps.py) to function.
2929

30+
# When meeting dependency resolution error, try to set environment variable:
31+
32+
# UV_INDEX_URL=https://pypi.org/simple
33+
3034
uv venv --clear --python=3.13 --seed source .venv/bin/activate ```
3135
3236
**Option B: Use Standard Pip/Venv:** ```bash
@@ -45,15 +49,15 @@ pip install --upgrade pip setuptools wheel ```
4549
4650
Ensure `litert-cli` and any required optional extensions (extras) are installed:
4751
48-
**Using UV:** ```bash
49-
52+
**Using UV:**
53+
```bash
5054
# Install in editable mode from local source
51-
5255
uv pip install -e .
5356

5457
# Or install from local source with extras (e.g., convert, lm, compile)
5558

56-
uv pip install -e ".[convert,lm,compile]" ```
59+
uv pip install -e ".[convert,lm,compile]"
60+
```
5761

5862
**Using standard Pip:** ```bash
5963

@@ -63,7 +67,8 @@ pip install -e .
6367

6468
# Or install with extras
6569

66-
pip install -e ".[convert,lm,compile]" ```
70+
pip install -e ".[convert,lm,compile]"
71+
```
6772
6873
## Core Commands
6974
@@ -83,23 +88,21 @@ Once a model is registered, **all CLI commands** (including `run`, `benchmark`,
8388
file path! The CLI will automatically resolve it to the correct absolute cache
8489
file path on the fly.
8590
86-
**Examples:** ```bash
87-
91+
**Examples:**
92+
```bash
8893
# Run inference using the central alias directly
89-
9094
litert run mobilenet --android --cpu
9195
9296
# Benchmark using a specific sub-reference GPU file
9397
9498
litert benchmark resnet18:gpu --android --gpu
9599
96100
# Compile for NPU directly using the reference alias
97-
98101
litert compile efficientnet --target sm8750
99102
100103
# Delete from the central cache
101-
102-
litert delete mobilenet ```
104+
litert delete mobilenet
105+
```
103106

104107
### 1. Run (Inference)
105108

@@ -110,10 +113,9 @@ Run a tflite model locally on desktop or on a adb connected Android device.
110113
* To enable C++ verbose debug setup logs, set the environment variable: `export LITERT_VERBOSE=1`.
111114
* `--gpu`: Use desktop GPU if available.
112115

113-
**Android Execution (CPU, GPU, or NPU):** `litert run <path_to_model> --android
114-
--cpu` * `--gpu`: Run on Android GPU using OpenCL/WebGPU. * `--npu`: Run on
115-
Android device NPU. Supports **two execution paradigms** based on the input
116-
model:
116+
**Android Execution (CPU, GPU, or NPU):** `litert run <path_to_model> --android --cpu`
117+
* `--gpu`: Run on Android GPU using OpenCL/WebGPU.
118+
* `--npu`: Run on Android device NPU. Supports **two execution paradigms** based on the input model:
117119

118120
**1. JIT (Just-In-Time) compilation mode:** Pass a standard, non-compiled
119121
`.tflite` model. The on-device LiteRT runtime will automatically download/invoke
@@ -126,24 +128,25 @@ loads the compiled binary block directly on the NPU. This avoids
126128
graph-compilation warmup overhead, leading to **sub-millisecond initialization
127129
latency**. `bash litert run resnet18_compiled_sm8750.tflite --android --npu`
128130

129-
**Multi-Input Formats (Literals or Arrays):** `litert run model.tflite --desktop
130-
--input inputs="[0.5, 0.5, 0.5]" --print-tensors`
131+
**Multi-Input Formats (Literals or Arrays):** `bash litert run model.tflite
132+
--desktop --input inputs="[0.5, 0.5, 0.5]" --print-tensors`
131133

132-
**Multi-Input Formats (Files - .npy, .raw, .png):** `litert run model.tflite
133-
--desktop --input inputs="test_input.npy" --print-tensors`
134+
**Multi-Input Formats (Files - .npy, .raw, .png):** `bash litert run
135+
model.tflite --desktop --input inputs="test_input.npy" --print-tensors`
134136

135137
### 2. Quantize
136138

137-
**Standard Selection:** `litert quantize <path_to_model> --output <output_path>`
139+
**Standard Selection:** `bash litert quantize <path_to_model> --output
140+
<output_path>`
138141

139-
**Dynamic Quantization (int8_dynamic):** `litert quantize model.tflite
142+
**Dynamic Quantization (int8_dynamic):** `bash litert quantize model.tflite
140143
--type int8_dynamic --output dynamic.tflite`
141144

142-
**Static Quantization with Calibration Data:** `litert quantize
145+
**Static Quantization with Calibration Data:** `bash litert quantize
143146
model.tflite --type static --calibration-data "calib_data.py" --output
144147
static.tflite`
145148

146-
**Recipe-based Quantization:** `litert quantize model.tflite --recipe
149+
**Recipe-based Quantization:** `bash litert quantize model.tflite --recipe
147150
"recipe.json" --output recipe.tflite`
148151

149152
### 3. Visualize
@@ -175,14 +178,14 @@ litert download <repo_id_or_url> --output <output_dir>
175178
* **HuggingFace Downloads (Default Central Cache)**: If `--output` is **omitted**, it downloads to `~/.cache/litert-cli/models/` and **automatically** creates `metadata.json` to catalog the model for CLI commands (like `litert list`).
176179
* **HuggingFace Downloads (Custom Folder)**: If `--output` is **provided**, it acts as a pure, clean download of only the model files. It **does not** generate a `metadata.json` file in the output folder.
177180

178-
**Filter by File Type:** `bash litert download
179-
litert-community/MobileNet-v3-large --file "*.tflite" --output ./models`
180-
181-
**With Custom Model Reference:**
181+
**Filter by File Type:**
182182
```bash
183-
litert download litert-community/MobileNet-v3-large --model-ref my_model_ref
183+
litert download litert-community/MobileNet-v3-large --file "*.tflite" --output ./models
184184
```
185185

186+
**With Custom Model Reference:** `bash litert download
187+
litert-community/MobileNet-v3-large --model-ref my_model_ref`
188+
186189
### 5. Import
187190

188191
Import a local file or directory into the centralized cache.
@@ -232,14 +235,13 @@ Interact with LLM generative models (like Qwen 1.5 or Gemma 4) using native `lit
232235
litert lm run <model_path_or_reference_id> < /dev/null
233236
```
234237

235-
**Run with model file path:** ```bash
236-
238+
**Run with model file path:**
239+
```bash
237240
# Generative LLM models require the path to the compiled .litertlm model file or directory.
238-
239241
# Append < /dev/null to exit immediately after printing the answer.
240242

241-
litert lm run <model_dir>/model.litertlm --prompt "What is edge AI?" < /dev/null
242-
```
243+
litert lm run <model_dir>/model.litertlm --prompt "What is edge AI?" <
244+
/dev/null ```
243245
244246
**Download and run with HuggingFace repo:** `bash litert lm run \
245247
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
@@ -248,40 +250,44 @@ gemma-4-E2B-it.litertlm \ --prompt="What is the capital of France?" \ <
248250
249251
### 9. Benchmark
250252
251-
Benchmark LiteRT models on different platforms (Android, Google Cloud, or Desktop).
252-
253-
**On connected Android device via ADB (CPU, GPU, or NPU):** ```bash
253+
Benchmark LiteRT models on different platforms (Android, Google Cloud, or
254+
Desktop).
254255
256+
**On connected Android device via ADB (CPU, GPU, or NPU):**
257+
```bash
255258
# Benchmark on CPU (Default)
256-
257259
litert benchmark model.tflite --android --cpu
258260

259261
# Benchmark on NPU (Requires compiling for NPU first)
260262

261263
litert benchmark model.tflite --android --npu
262264

263265
# Benchmark on GPU (using OpenCL/OpenGL delegates)
264-
265-
litert benchmark model.tflite --android --gpu ```
266+
litert benchmark model.tflite --android --gpu
267+
```
266268

267269
**On Macbook (CPU):** `bash litert benchmark my_model_ref --desktop --cpu`
268270

269271
**On Google AI Edge Portal in Google Cloud (GCP):**
270272

271-
> [!IMPORTANT] **Prerequisites for GCP Benchmarking:** 1. Joint Google AI Edge
273+
> [!IMPORTANT] **Prerequisites for GCP Benchmarking:** 1. Join Google AI Edge
272274
> Portal early access program at: https://ai.google.dev/edge/ai-edge-portal 2.
273275
> Authenticate your terminal session by running: `gcloud auth login` 3.
274-
> Configure the Project ID for the GCP Project. You can either: * Set the
275-
> environment variable: `export LITERT_GCP_PROJECT="your-gcp-project-id"` * Or
276-
> explicitly pass the `--gcp-project` option in the command.
276+
> Configure the GCP Project ID. You can either: * Set the environment variable:
277+
> `export LITERT_GCP_PROJECT="your-gcp-project-id"` * Or explicitly pass the
278+
> `--gcp-project` option in the command. 4. Configure the Google Cloud Storage
279+
> (GCS) Bucket for model uploading. The CLI resolves it via: * Explicit
280+
> `--gcp-bucket` CLI option. * `LITERT_GCP_BUCKET` environment variable. *
281+
> Default fallback: Automatically creates and uses
282+
> `gs://{gcp_project}-litert-models`.
277283
278284
```bash
279-
# Benchmark on GCP Pixel 7 CPU (using environment variable for Project ID)
280-
litert benchmark model.tflite --gcp --device "pixel 7"
281-
282-
# Benchmark on GCP Pixel 7 CPU (specifying Project ID explicitly)
285+
# Benchmark on GCP Pixel 7 CPU (using default auto-created project bucket)
283286
litert benchmark model.tflite --gcp --device "pixel 7" --gcp-project "your-gcp-project-id"
284287

288+
# Benchmark on GCP Pixel 7 CPU (specifying custom GCS bucket explicitly)
289+
litert benchmark model.tflite --gcp --device "pixel 7" --gcp-project "your-gcp-project-id" --gcp-bucket "your-custom-bucket"
290+
285291
# Benchmark on multiple devices at once on GPU
286292
litert benchmark model.tflite --gcp --devices "pixel 7, sm-s931u1" --gpu --gcp-project "your-gcp-project-id"
287293
```
@@ -290,8 +296,10 @@ litert benchmark model.tflite --gcp --devices "pixel 7, sm-s931u1" --gpu --gcp-p
290296

291297
Apply Ahead-of-Time (AOT) offline compilation to a standard LiteRT (.tflite) model for specific edge SoC target NPUs (e.g., Qualcomm sm8550, MediaTek mt6989).
292298

293-
**Basic target NPU compilation:** `bash litert compile my_model.tflite --target
294-
sm8750`
299+
**Basic target NPU compilation:**
300+
```bash
301+
litert compile my_model.tflite --target sm8750
302+
```
295303

296304
**Compile for multiple NPU targets and export an Android AI Pack (for PODAI
297305
deployment):** `bash litert compile my_model.tflite --target sm8550 --target
@@ -345,7 +353,7 @@ use them directly in agent queries:
345353
### Prompt 1: Dynamic Quantization & Android GPU Benchmarking
346354

347355
> "Download the FP32 EfficientNet model `litert-community/efficientnet_b1` from
348-
> HuggingFace Hub. Quantize it to INT8 dynamic range (`--type int8_dynamic`),
356+
> HuggingFace. Quantize it to INT8 dynamic range (`--type int8_dynamic`),
349357
> then benchmark both the original FP32 model and the newly quantized INT8 model
350358
> on the GPU of my connected Android device. Compare the average latency and
351359
> report the throughput speedup."

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,10 @@ dependency resolution) or standard **`pip`** within a virtual environment.
3535
#### 1. Create and Activate Virtual Environment
3636

3737
```bash
38-
# Create a virtual environment with Python 3.13 in the current directory
39-
uv venv --clear --python=3.13
38+
# Create a virtual environment with Python 3.13 in the current directory.
39+
# When meeting dependency resolution error, try to set environment variable:
40+
# UV_INDEX_URL=https://pypi.org/simple
41+
uv venv --clear --python=3.13 --seed
4042
source .venv/bin/activate
4143
```
4244

@@ -69,10 +71,10 @@ Check more comprehensive usage examples under the `test_scripts/` directory
6971

7072
```bash
7173
# Run help command
72-
uv run litert --help
74+
litert --help
7375

7476
# Download a LiteRT model
75-
uv run litert download litert-community/MobileNet-v3-large --file "*.tflite" --output mobilenet
77+
litert download litert-community/MobileNet-v3-large --file "*.tflite" --output mobilenet
7678
```
7779

7880
--------------------------------------------------------------------------------
@@ -122,7 +124,7 @@ litert download litert-community/MobileNet-v3-large --file "*.tflite" --output m
122124
* Linux (Ubuntu) with Python 3.13
123125
* macOS (Apple Silicon) with Python 3.13
124126
* **Android Devices**:
125-
* Xiaomi 15 Pro (Qualcomm Snapdragon 8750)
127+
* Qualcomm Snapdragon 8750
126128

127129
--------------------------------------------------------------------------------
128130

litert_cli/commands/benchmark/cli.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,12 +112,18 @@
112112
type=str,
113113
help="GCP project ID for benchmarking (Only for GCP target).",
114114
)
115+
@click.option(
116+
"--gcp-bucket",
117+
type=str,
118+
help="GCS bucket name for uploading model (Only for GCP target).",
119+
)
115120
def benchmark_cmd(
116121
model: str,
117122
target: str,
118123
accelerator: str,
119124
devices: tuple[str, ...],
120125
gcp_project: str | None = None,
126+
gcp_bucket: str | None = None,
121127
) -> None:
122128
"""Benchmarks LiteRT models on different platforms.
123129
@@ -127,6 +133,7 @@ def benchmark_cmd(
127133
accelerator: Accelerator to use (cpu, gpu, npu).
128134
devices: Target device model(s) (e.g., 'pixel 7').
129135
gcp_project: GCP project ID for benchmarking.
136+
gcp_bucket: GCS bucket name for uploading model.
130137
"""
131138
from litert_cli.core import models as core_models
132139

@@ -158,6 +165,7 @@ def benchmark_cmd(
158165
accelerator,
159166
devices,
160167
gcp_project,
168+
gcp_bucket,
161169
)
162170
else:
163171
click.secho(f"Target '{target}' is not yet supported.", fg="red")

0 commit comments

Comments
 (0)