Skip to content

Commit 5955ea0

Browse files
chore: bump version to 1.8.0 and update docs (#145)
- CHANGELOG.md: v1.8.0 entries for all 8 issue fixes (#136-#142) - docs/upgrades.md: v1.8.0 upgrade notes with migration guidance - docs/tool-reference.md: add enumerate_ssh_keys tool documentation - docs/cli-reference.md: add syslog-analysis template, update coverage - docs/troubleshooting.md: ReAct fallback, quant-aware param estimation - docs/live-mode-operations.md: KV-cache fix, EP override detection - docs/getting-started.md: quantization-aware tier description - docs/RELEASE_PLAN.md: update roadmap through v1.8.0 - Cargo.toml: version 1.7.1 -> 1.8.0
1 parent b8921f6 commit 5955ea0

10 files changed

Lines changed: 89 additions & 25 deletions

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,27 @@ The format is inspired by Keep a Changelog and this project follows Semantic Ver
1818

1919
- (none yet)
2020

21+
## 1.8.0 - 2026-04-05
22+
23+
### Added
24+
25+
- New `syslog-analysis` investigation template matching keywords: log, syslog, journal, event, audit. Runs `read_syslog``audit_account_changes``inspect_persistence_locations` (#141).
26+
- New `enumerate_ssh_keys` tool: cross-platform SSH key enumeration scanning `.ssh` directories for authorized_keys, private keys, and public keys (#141).
27+
- New `--task-template syslog-summary` maps to the `syslog-analysis` investigation template (#141).
28+
29+
### Changed
30+
31+
- **Severity calibration**: raised listener thresholds (Info <50, Low 50–149, Medium 150–249, High ≥250), lowered account severity (1 account → Low, 3–4 → Medium, ≥5 → High), raised persistence thresholds (Low <3, Medium 3–7, High ≥8). Normal desktops no longer trigger spurious high-severity findings (#139).
32+
- **Findings detail**: finding titles now include specifics — account names, persistence entry text, and SSH directory info — instead of bare counts (#140).
33+
- **Parameter estimation**: quantization-aware divisor replaces the hardcoded 2.2. Q4 models use 0.55 bytes/param, Q8 uses 1.1, FP16 uses 2.2, FP32 uses 4.4. Detected from model filename conventions (#138).
34+
- **Template tool ordering**: `file-integrity-check` now leads with `hash_binary` (was `audit_account_changes`), `ssh-key-investigation` now leads with `enumerate_ssh_keys` (#141).
35+
36+
### Fixed
37+
38+
- **KV-cache attention mask**: prefill attention length now accounts for forced cache padding when the model lacks a `use_cache` toggle, preventing shape broadcast errors on models like Qwen2.5 and Llama 3.2 (#136).
39+
- **ReAct garbage output**: when the model produces a `<final>` tag at step 0 without calling any tools, the agent falls back to template-driven execution. Quality guard now detects hallucinated `<call>` tags and `[observation]` markers inside final answers and replaces them with a deterministic summary (#137).
40+
- **EP reporting**: `detect_execution_provider()` now recognises DirectML and CUDA backend overrides instead of always reporting CPU (#142).
41+
2142
## 1.7.1 - 2026-04-05
2243

2344
### Changed

Cargo.lock

Lines changed: 5 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ resolver = "2"
1010

1111
[workspace.package]
1212
edition = "2021"
13-
version = "1.7.1"
13+
version = "1.8.0"
1414
license = "MIT"
1515

1616
[workspace.dependencies]

docs/RELEASE_PLAN.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -91,31 +91,30 @@ Release should be blocked when:
9191
Define `ExecutionProviderBackend` trait, provider registry, provider-agnostic config, extract Vitis/CPU backends, provider-aware doctor, CLI `--backend` flag, and multi-backend test harness.
9292
- `v1.4.0` Concrete Hardware Backends (milestone #17, tracking: #55):
9393
DirectML (Windows GPU), CoreML (macOS/Apple Silicon), CUDA/TensorRT (NVIDIA), QNN (Qualcomm Hexagon), non-ONNX formats (GGUF/SafeTensors), and quantization-aware loading.
94+
- `v1.5.0` Concrete Hardware Backends (completed).
95+
- `v1.6.0` Agentic Investigation Engine (completed): ReAct agent loop, task-aware LLM synthesis, temperature-scaled sampling, EP-aware debug logs, session caching, KV-cache prefix reuse.
96+
- `v1.7.0` Live Evaluation Hardening (completed): per-tool timing, LLM reasoning capture, evidence-derived confidence, task-specific synthesis, expanded privilege/persistence checks, tokenizer discovery.
97+
- `v1.7.1` Dependency Bumps (completed): toml 1.1, thiserror 2.0, sha2 0.11, CI actions v6–v8.
98+
- `v1.8.0` Live Evaluation Fixes (completed): KV-cache attention mask fix, ReAct garbage fallback, quantization-aware param estimation, severity recalibration, findings detail, template/tool fixes, EP reporting, syslog-analysis template, enumerate_ssh_keys tool.
9499

95-
## Immediate Next Steps for v1.0.0
100+
## Immediate Next Steps
96101

97102
Use this runbook to execute the active next milestone end-to-end.
98103

99104
1. Create a tracking issue from the Release Checklist template.
100-
2. Apply labels `release`, `milestone:v1.0.0`, and priority labels as needed.
101-
3. Run milestone bootstrap workflow:
102-
- Workflow: `Milestone Bootstrap`
103-
- Inputs:
104-
- `seed_roadmap`: `true` (upserts canonical milestones for the active roadmap set)
105-
- `title`: `v1.0.0`
106-
- `description`: `Local API and Web UI MVP: local server endpoints, security baseline, durable local data model, and initial triage UI.`
107-
- `due_date`: optional (`YYYY-MM-DD`)
108-
4. Verify quality gates locally:
105+
2. Apply labels `release` and the target milestone label.
106+
3. Verify quality gates locally:
109107
- `cargo check`
110108
- `cargo test --workspace`
109+
- `cargo clippy --all-targets -- -D warnings`
111110
- `cargo check -p inference_bridge --features vitis`
112-
5. Verify GitHub Actions CI is green on latest `main`.
113-
6. Tag and publish:
114-
- `git tag -a v1.0.0 -m "Release v1.0.0"`
115-
- `git push origin v1.0.0`
116-
7. Confirm `Release` workflow completed and assets are attached.
117-
8. Close the milestone and open a follow-on milestone.
118-
9. Open planning issue for the next milestone scope.
111+
4. Verify GitHub Actions CI is green on latest `main`.
112+
5. Tag and publish:
113+
- `git tag -a vX.Y.Z -m "Release vX.Y.Z"`
114+
- `git push origin vX.Y.Z`
115+
6. Confirm `Release` workflow completed and assets are attached.
116+
7. Close the milestone and open a follow-on milestone.
117+
8. Open planning issue for the next milestone scope.
119118

120119
## Labels and Milestones
121120

docs/cli-reference.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ Behavior:
143143

144144
`--list-tools` output includes tool names, descriptions, and JSON argument schemas.
145145

146-
Current built-in coverage includes log tailing, listener inventory, file hashing, privilege vectors, persistence inventory, account-role snapshots, process-network correlation, and baseline capture for drift workflows.
146+
Current built-in coverage includes log tailing, listener inventory, file hashing, privilege vectors, persistence inventory, account-role snapshots, process-network correlation, SSH key enumeration, and baseline capture for drift workflows.
147147

148148
Coverage tool argument highlights:
149149

@@ -465,11 +465,12 @@ When a free-text `--task` is provided, the agent resolves a declarative investig
465465
Built-in investigation templates:
466466

467467
- **broad-host-triage**: default fallback. Runs all host-level tools.
468-
- **ssh-key-investigation**: SSH key and account audit focus.
468+
- **ssh-key-investigation**: SSH key enumeration and account audit focus.
469469
- **persistence-analysis**: autorun and persistence mechanism checks.
470470
- **network-exposure-audit**: listener and network binding analysis.
471471
- **privilege-escalation-check**: privilege escalation indicator checks.
472472
- **file-integrity-check**: hash verification and file integrity analysis.
473+
- **syslog-analysis**: log review, account audit, and persistence checks. Matches keywords: log, syslog, journal, event, audit.
473474

474475
List investigation templates via `--list-task-templates`.
475476

docs/getting-started.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,8 @@ When running in live mode, WraithRun automatically probes the loaded model to cl
141141
- **Moderate**: medium models. Agent uses a ReAct (Reason + Act) loop, iteratively choosing tools based on observations, then synthesizes findings via LLM.
142142
- **Strong**: large models (≥10B params and ≤50ms latency). Agent uses a full ReAct loop with the complete evidence window for deep iterative reasoning and synthesis.
143143

144+
Since v1.8.0, parameter estimation is quantization-aware: Q4 models use 0.55 bytes/param, Q8 uses 1.1, FP16 uses 2.2, FP32 uses 4.4. This means Q4 models are classified more accurately — a 750 MB Q4 file now correctly estimates ~1.4B parameters instead of ~0.3B.
145+
144146
Override automatic classification when you know your model's capability:
145147

146148
```powershell

docs/live-mode-operations.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,10 @@ WraithRun caches the ONNX session and tokenizer across investigation steps withi
128128

129129
The agent also tracks prompt prefix reuse across steps. When consecutive prompts share a common prefix (e.g., system prompt + prior context), the prefix hit/miss ratio is logged for observability. Full KV-state reuse is scaffolded for a future release.
130130

131+
Since v1.8.0, the prefill attention mask correctly accounts for forced cache padding on models that lack a `use_cache` branch toggle (#136). Previously, models like Qwen2.5 and Llama 3.2 could crash with a shape broadcast error during prefill because the attention mask length did not include the initial cache dimension.
132+
133+
Also since v1.8.0, execution provider reporting now detects DirectML and CUDA backend overrides (#142), so `model_capability.execution_provider` in JSON output accurately reflects the active backend instead of always showing `CPUExecutionProvider`.
134+
131135
Temperature controls affect live inference behavior:
132136

133137
- `--temperature 0` (or omit): greedy decoding — fastest, fully deterministic output.

docs/tool-reference.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,23 @@ Output fields:
142142
- `network_risk_level`
143143
- `records`
144144

145+
## enumerate_ssh_keys
146+
147+
Purpose:
148+
149+
- Enumerates SSH key material across user home directories. Cross-platform: scans Windows `%USERPROFILE%\.ssh`, `ProgramData\ssh`, and other user profiles; on Linux/macOS scans `/root/.ssh` and `/home/*/.ssh`.
150+
151+
Arguments:
152+
153+
- none
154+
155+
Output fields:
156+
157+
- `directories` (array): per-directory summary including path, `has_authorized_keys`, `private_key_count`, and `public_key_count`.
158+
- `total_authorized_keys_files` (integer)
159+
- `total_private_keys` (integer)
160+
- `total_public_keys` (integer)
161+
145162
## capture_coverage_baseline
146163

147164
Purpose:

docs/troubleshooting.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ wraithrun --task "Investigate ..." --live --model C:/models/llm.onnx --tokenizer
168168
```
169169

170170
- Tier thresholds: Basic ≤2B params or ≥200ms latency; Strong ≥10B params and ≤50ms latency; Moderate is everything in between.
171+
- Since v1.8.0, parameter estimation is quantization-aware. Q4 models use 0.55 bytes/param, Q8 uses 1.1, FP16 uses 2.2, FP32 uses 4.4. This may reclassify models that were previously under-estimated (e.g., a Q4 model that reported 0.5B may now correctly report ~2B and shift from Basic to Moderate).
171172

172173
## Final answer looks generic or templated
173174

@@ -179,6 +180,7 @@ Fix:
179180

180181
- This happens when the model is classified as Basic tier (deterministic summary) or when LLM output quality is detected as low.
181182
- Since v1.6.0, Moderate/Strong tiers use a ReAct loop that typically produces richer output. If output is still generic, try `--capability-override strong` or increase `--temperature` slightly (e.g., `0.1`).
183+
- Since v1.8.0, the quality guard also catches hallucinated `<call>` tags and `[observation]` markers inside the final answer. When detected, the agent replaces the garbage with a deterministic summary built from real findings. This means even Moderate/Strong tier runs may show a structured summary if the model hallucinates.
182184

183185
## Agent not calling expected tools
184186

@@ -191,6 +193,7 @@ Fix:
191193
- Moderate/Strong tiers use a ReAct loop where the LLM decides which tools to call. The model may not choose the same tools as the template-driven Basic tier.
192194
- Increase `--max-steps` if the agent is exhausting its step budget before reaching all relevant tools.
193195
- If the model is too small, it may produce a `<final>` answer immediately. Try `--capability-override strong` to allow full iterative reasoning.
196+
- Since v1.8.0, if the model produces `<final>` at step 0 without calling any tools, the agent automatically falls back to template-driven execution so that real host data is still collected.
194197
- Check `RUST_LOG=debug` output for `react_step` entries showing the agent's reasoning at each step.
195198

196199
## Task returned a scope-boundary finding instead of running

docs/upgrades.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,22 @@
11
# Upgrade Notes
22

3+
## v1.8.0
4+
5+
### Breaking/visible changes
6+
7+
- **Severity thresholds recalibrated** (#139): listener, account, and persistence findings now use higher thresholds. A normal desktop with ~100 listeners and 1 non-default admin account will report Low instead of High. Automation that keys on specific severity values should be reviewed.
8+
- **Finding titles include specifics** (#140): finding `title` fields now embed account names, persistence entry text, and SSH directory details (e.g., `"Non-default privileged accounts observed (1): shrey"` instead of `"Non-default privileged accounts observed (1)"`). Parsers matching on exact title strings must be updated.
9+
- **Parameter estimation changed** (#138): quantization-aware sizing means `estimated_params_b` in `model_capability` output may change. Q4 models now report ~4× higher param counts than before. This may reclassify some models into a higher capability tier.
10+
- **New tool and template** (#141): `enumerate_ssh_keys` tool added to the registry; `syslog-analysis` investigation template added. Template tool ordering changed for `file-integrity-check` and `ssh-key-investigation`.
11+
- **ReAct fallback behavior** (#137): Moderate/Strong tier runs may now produce a deterministic summary instead of LLM-generated text when the model hallucinates. The `final_answer` field will contain a structured SUMMARY block in these cases.
12+
13+
### Migration
14+
15+
- No TOML config changes required.
16+
- If you parse `RunReport` JSON `findings[].title` strings, update matchers — titles now include entity names and entry details.
17+
- If automation relies on severity thresholds, review the new calibration: listener counts below 50 are now Info (was Low at 25), single non-default admin accounts are Low (was Medium).
18+
- The `enumerate_ssh_keys` tool is automatically included in `ssh-key-investigation` template runs. No opt-in needed.
19+
320
## v1.7.1
421

522
- Dependency-only release. No breaking API changes.

0 commit comments

Comments
 (0)