You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -383,6 +383,7 @@ The MCP server is configured by environment variables only — pass them through
383
383
|`MAX_FILE_SIZE`|`--max-file-size`|`104857600` (100MB) | Maximum file size in bytes |
384
384
|`CHUNK_MIN_LENGTH`|`--chunk-min-length`|`50`| Minimum chunk length in characters (1–10000) |
385
385
|`RAG_DEVICE`| — |`cpu`| Execution device. Passed straight to ONNX Runtime. See the [Transformers.js device source code](https://github.com/huggingface/transformers.js/blob/main/packages/transformers/src/utils/devices.js) for the live list of supported backend names. If initialization fails, the server throws an error. |
386
+
|`RAG_DTYPE`| — |`fp32`| Embedding quantization dtype. Opt-in and passed straight through; accepts any dtype the chosen model provides (`fp32`, `fp16`, `q8`, `int8`, …). If the model lacks the requested variant, the server throws an error naming the dtypes it does provide. Changing `RAG_DEVICE`/`RAG_DTYPE` changes the embedding space — re-ingest existing data. |
@@ -607,6 +608,9 @@ Yes, but you must delete your database and re-ingest all documents. Different mo
607
608
**GPU acceleration?**
608
609
Opt-in via `RAG_DEVICE`. Devices are passed straight to ONNX Runtime. GPU support is highly dependent on your system, Node.js version, and the underlying ONNX backend. See the [Transformers.js device source code](https://github.com/huggingface/transformers.js/blob/main/packages/transformers/src/utils/devices.js) for the live list of supported backend names. If the requested device fails to initialize, the server throws an error — set `RAG_DEVICE=cpu` to revert.
609
610
611
+
**Can I change the embedding precision (dtype)?**
612
+
Opt-in via `RAG_DTYPE` (default `fp32`); accepted values are in the env-var table above. A recognized dtype the model lacks errors and lists the available ones; an unrecognized value (a typo) silently falls back to `fp32`. Changing `RAG_DEVICE`/`RAG_DTYPE` changes the embedding space — delete `DB_PATH` and re-ingest.
613
+
610
614
**Multi-user support?**
611
615
No. Designed for single-user, local access. Multi-user would require authentication/access control.
Copy file name to clipboardExpand all lines: server.json
+9-2Lines changed: 9 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,13 @@
8
8
"url": "https://github.com/shinpr/mcp-local-rag",
9
9
"source": "github"
10
10
},
11
-
"version": "0.15.0",
11
+
"version": "0.15.1",
12
12
"packages": [
13
13
{
14
14
"registryType": "npm",
15
15
"registryBaseUrl": "https://registry.npmjs.org",
16
16
"identifier": "mcp-local-rag",
17
-
"version": "0.15.0",
17
+
"version": "0.15.1",
18
18
"transport": {
19
19
"type": "stdio"
20
20
},
@@ -96,6 +96,13 @@
96
96
"format": "string",
97
97
"isSecret": false
98
98
},
99
+
{
100
+
"name": "RAG_DTYPE",
101
+
"description": "Embedding quantization dtype for the embedder (defaults to fp32). Opt-in and pass-through; accepts any dtype the chosen model provides (fp32, fp16, q8, int8, ...). If the model has no variant for the requested dtype, the server throws an error. Changing this changes the embedding space — re-ingest existing data.",
102
+
"isRequired": false,
103
+
"format": "string",
104
+
"isSecret": false
105
+
},
99
106
{
100
107
"name": "RAG_HYBRID_WEIGHT",
101
108
"description": "Keyword boost factor for hybrid search (0.0-1.0, defaults to 0.6). 0 means semantic similarity only; higher values increase the keyword-match contribution to the final score.",
0 commit comments