@@ -16,17 +16,21 @@ virtual environment and `litert-cli` is installed.
1616
1717### 1. Check/Create Virtual Environment
1818
19- We highly recommend using ** ` uv ` ** (written in Rust) for extremely fast
20- environment management and package installs.
19+ We highly recommend using ** ` uv ` ** (written in Rust) for extremely fast environment management and package installs.
2120
22- ** Option A: Use UV (Recommended - Super Fast):** ```bash
21+ ** Option A: Use UV (Recommended - Super Fast):**
22+ ``` bash
2323
2424# Create a virtual environment with Python 3.13.
2525
2626# We use --seed to pre-install pip, setuptools, and wheel inside the venv.
2727
2828# This is critical to allow our CLI dynamic dependency auto-installers (deps.py) to function.
2929
30+ # When meeting dependency resolution error, try to set environment variable:
31+
32+ # UV_INDEX_URL=https://pypi.org/simple
33+
3034uv venv --clear --python=3.13 --seed source .venv/bin/activate ` ` `
3135
3236** Option B: Use Standard Pip/Venv:** ` ` ` bash
@@ -45,15 +49,15 @@ pip install --upgrade pip setuptools wheel ```
4549
4650Ensure ` litert-cli` and any required optional extensions (extras) are installed:
4751
48- ** Using UV:** ```bash
49-
52+ ** Using UV:**
53+ ` ` ` bash
5054# Install in editable mode from local source
51-
5255uv pip install -e .
5356
5457# Or install from local source with extras (e.g., convert, lm, compile)
5558
56- uv pip install -e ".[ convert,lm,compile] " ```
59+ uv pip install -e " .[convert,lm,compile]"
60+ ```
5761
5862** Using standard Pip:** ```bash
5963
@@ -63,7 +67,8 @@ pip install -e .
6367
6468# Or install with extras
6569
66- pip install -e ".[ convert,lm,compile] " ```
70+ pip install -e ".[ convert,lm,compile] "
71+ ```
6772
6873## Core Commands
6974
@@ -83,23 +88,21 @@ Once a model is registered, **all CLI commands** (including `run`, `benchmark`,
8388file path! The CLI will automatically resolve it to the correct absolute cache
8489file path on the fly.
8590
86- ** Examples:** ```bash
87-
91+ **Examples:**
92+ ```bash
8893# Run inference using the central alias directly
89-
9094litert run mobilenet --android --cpu
9195
9296# Benchmark using a specific sub-reference GPU file
9397
9498litert benchmark resnet18:gpu --android --gpu
9599
96100# Compile for NPU directly using the reference alias
97-
98101litert compile efficientnet --target sm8750
99102
100103# Delete from the central cache
101-
102- litert delete mobilenet ```
104+ litert delete mobilenet
105+ ```
103106
104107### 1. Run (Inference)
105108
@@ -110,10 +113,9 @@ Run a tflite model locally on desktop or on a adb connected Android device.
110113* To enable C++ verbose debug setup logs, set the environment variable: ` export LITERT_VERBOSE=1 ` .
111114* ` --gpu ` : Use desktop GPU if available.
112115
113- ** Android Execution (CPU, GPU, or NPU):** `litert run <path_to_model> --android
114- --cpu` * ` --gpu` : Run on Android GPU using OpenCL/WebGPU. * ` --npu`: Run on
115- Android device NPU. Supports ** two execution paradigms** based on the input
116- model:
116+ ** Android Execution (CPU, GPU, or NPU):** ` litert run <path_to_model> --android --cpu `
117+ * ` --gpu ` : Run on Android GPU using OpenCL/WebGPU.
118+ * ` --npu ` : Run on Android device NPU. Supports ** two execution paradigms** based on the input model:
117119
118120** 1. JIT (Just-In-Time) compilation mode:** Pass a standard, non-compiled
119121` .tflite ` model. The on-device LiteRT runtime will automatically download/invoke
@@ -126,24 +128,25 @@ loads the compiled binary block directly on the NPU. This avoids
126128graph-compilation warmup overhead, leading to ** sub-millisecond initialization
127129latency** . ` bash litert run resnet18_compiled_sm8750.tflite --android --npu `
128130
129- ** Multi-Input Formats (Literals or Arrays):** `litert run model.tflite --desktop
130- --input inputs="[ 0.5, 0.5, 0.5] " --print-tensors`
131+ ** Multi-Input Formats (Literals or Arrays):** `bash litert run model.tflite
132+ --desktop -- input inputs="[ 0.5, 0.5, 0.5] " --print-tensors`
131133
132- ** Multi-Input Formats (Files - .npy, .raw, .png):** `litert run model.tflite
133- --desktop --input inputs="test_input.npy" --print-tensors`
134+ ** Multi-Input Formats (Files - .npy, .raw, .png):** `bash litert run
135+ model.tflite --desktop --input inputs="test_input.npy" --print-tensors`
134136
135137### 2. Quantize
136138
137- ** Standard Selection:** ` litert quantize <path_to_model> --output <output_path> `
139+ ** Standard Selection:** `bash litert quantize <path_to_model> --output
140+ <output_path>`
138141
139- ** Dynamic Quantization (int8_dynamic):** `litert quantize model.tflite
142+ ** Dynamic Quantization (int8_dynamic):** `bash litert quantize model.tflite
140143--type int8_dynamic --output dynamic.tflite`
141144
142- ** Static Quantization with Calibration Data:** `litert quantize
145+ ** Static Quantization with Calibration Data:** `bash litert quantize
143146model.tflite --type static --calibration-data "calib_data.py" --output
144147static.tflite`
145148
146- ** Recipe-based Quantization:** `litert quantize model.tflite --recipe
149+ ** Recipe-based Quantization:** `bash litert quantize model.tflite --recipe
147150"recipe.json" --output recipe.tflite`
148151
149152### 3. Visualize
@@ -175,14 +178,14 @@ litert download <repo_id_or_url> --output <output_dir>
175178* ** HuggingFace Downloads (Default Central Cache)** : If ` --output ` is ** omitted** , it downloads to ` ~/.cache/litert-cli/models/ ` and ** automatically** creates ` metadata.json ` to catalog the model for CLI commands (like ` litert list ` ).
176179* ** HuggingFace Downloads (Custom Folder)** : If ` --output ` is ** provided** , it acts as a pure, clean download of only the model files. It ** does not** generate a ` metadata.json ` file in the output folder.
177180
178- ** Filter by File Type:** `bash litert download
179- litert-community/MobileNet-v3-large --file "* .tflite" --output ./models`
180-
181- ** With Custom Model Reference:**
181+ ** Filter by File Type:**
182182``` bash
183- litert download litert-community/MobileNet-v3-large --model-ref my_model_ref
183+ litert download litert-community/MobileNet-v3-large --file " *.tflite " --output ./models
184184```
185185
186+ ** With Custom Model Reference:** `bash litert download
187+ litert-community/MobileNet-v3-large --model-ref my_model_ref`
188+
186189### 5. Import
187190
188191Import a local file or directory into the centralized cache.
@@ -232,14 +235,13 @@ Interact with LLM generative models (like Qwen 1.5 or Gemma 4) using native `lit
232235litert lm run < model_path_or_reference_id> < /dev/null
233236```
234237
235- ** Run with model file path:** ```bash
236-
238+ ** Run with model file path:**
239+ ``` bash
237240# Generative LLM models require the path to the compiled .litertlm model file or directory.
238-
239241# Append < /dev/null to exit immediately after printing the answer.
240242
241- litert lm run <model_dir>/model.litertlm --prompt "What is edge AI?" < /dev/null
242- ```
243+ litert lm run < model_dir> /model.litertlm --prompt " What is edge AI?" <
244+ /dev/null ` ` `
243245
244246** Download and run with HuggingFace repo:** ` bash litert lm run \
245247--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
@@ -248,40 +250,44 @@ gemma-4-E2B-it.litertlm \ --prompt="What is the capital of France?" \ <
248250
249251# ## 9. Benchmark
250252
251- Benchmark LiteRT models on different platforms (Android, Google Cloud, or Desktop).
252-
253- **On connected Android device via ADB (CPU, GPU, or NPU):** ```bash
253+ Benchmark LiteRT models on different platforms (Android, Google Cloud, or
254+ Desktop).
254255
256+ ** On connected Android device via ADB (CPU, GPU, or NPU):**
257+ ` ` ` bash
255258# Benchmark on CPU (Default)
256-
257259litert benchmark model.tflite --android --cpu
258260
259261# Benchmark on NPU (Requires compiling for NPU first)
260262
261263litert benchmark model.tflite --android --npu
262264
263265# Benchmark on GPU (using OpenCL/OpenGL delegates)
264-
265- litert benchmark model.tflite --android --gpu ```
266+ litert benchmark model.tflite --android --gpu
267+ ```
266268
267269** On Macbook (CPU):** ` bash litert benchmark my_model_ref --desktop --cpu `
268270
269271** On Google AI Edge Portal in Google Cloud (GCP):**
270272
271- > [!IMPORTANT] **Prerequisites for GCP Benchmarking:** 1. Joint Google AI Edge
273+ > [ !IMPORTANT] ** Prerequisites for GCP Benchmarking:** 1. Join Google AI Edge
272274> Portal early access program at: https://ai.google.dev/edge/ai-edge-portal 2.
273275> Authenticate your terminal session by running: ` gcloud auth login ` 3.
274- > Configure the Project ID for the GCP Project. You can either: * Set the
275- > environment variable: `export LITERT_GCP_PROJECT="your-gcp-project-id"` * Or
276- > explicitly pass the `--gcp-project` option in the command.
276+ > Configure the GCP Project ID. You can either: * Set the environment variable:
277+ > ` export LITERT_GCP_PROJECT="your-gcp-project-id" ` * Or explicitly pass the
278+ > ` --gcp-project ` option in the command. 4. Configure the Google Cloud Storage
279+ > (GCS) Bucket for model uploading. The CLI resolves it via: * Explicit
280+ > ` --gcp-bucket ` CLI option. * ` LITERT_GCP_BUCKET ` environment variable. *
281+ > Default fallback: Automatically creates and uses
282+ > ` gs://{gcp_project}-litert-models ` .
277283
278284``` bash
279- # Benchmark on GCP Pixel 7 CPU (using environment variable for Project ID)
280- litert benchmark model.tflite --gcp --device "pixel 7"
281-
282- # Benchmark on GCP Pixel 7 CPU (specifying Project ID explicitly)
285+ # Benchmark on GCP Pixel 7 CPU (using default auto-created project bucket)
283286litert benchmark model.tflite --gcp --device " pixel 7" --gcp-project " your-gcp-project-id"
284287
288+ # Benchmark on GCP Pixel 7 CPU (specifying custom GCS bucket explicitly)
289+ litert benchmark model.tflite --gcp --device " pixel 7" --gcp-project " your-gcp-project-id" --gcp-bucket " your-custom-bucket"
290+
285291# Benchmark on multiple devices at once on GPU
286292litert benchmark model.tflite --gcp --devices " pixel 7, sm-s931u1" --gpu --gcp-project " your-gcp-project-id"
287293```
@@ -290,8 +296,10 @@ litert benchmark model.tflite --gcp --devices "pixel 7, sm-s931u1" --gpu --gcp-p
290296
291297Apply Ahead-of-Time (AOT) offline compilation to a standard LiteRT (.tflite) model for specific edge SoC target NPUs (e.g., Qualcomm sm8550, MediaTek mt6989).
292298
293- ** Basic target NPU compilation:** `bash litert compile my_model.tflite --target
294- sm8750`
299+ ** Basic target NPU compilation:**
300+ ``` bash
301+ litert compile my_model.tflite --target sm8750
302+ ```
295303
296304** Compile for multiple NPU targets and export an Android AI Pack (for PODAI
297305deployment):** `bash litert compile my_model.tflite --target sm8550 --target
@@ -345,7 +353,7 @@ use them directly in agent queries:
345353### Prompt 1: Dynamic Quantization & Android GPU Benchmarking
346354
347355> "Download the FP32 EfficientNet model ` litert-community/efficientnet_b1 ` from
348- > HuggingFace Hub . Quantize it to INT8 dynamic range (` --type int8_dynamic ` ),
356+ > HuggingFace. Quantize it to INT8 dynamic range (` --type int8_dynamic ` ),
349357> then benchmark both the original FP32 model and the newly quantized INT8 model
350358> on the GPU of my connected Android device. Compare the average latency and
351359> report the throughput speedup."
0 commit comments