log-analyse is a portable uv-managed project for preparing support bundles for retrieval workflows.
The repository currently contains:
- run-based bundle ingestion under
data/input,data/work, anddata/output - bundle discovery and profile-driven content preprocessing entry points
- a small internal Python module under
src/log_analyse/for bundle handlers and generic orchestration - bundle discovery, content preprocessing, content chunking, chunks validation, embedding preparation, lexical vectorization, semantic vectorization, embeddings packaging, vector store load, evidence retrieval, and analysis stages
- incident-centric metadata propagation for
run_id,incident_id,bundle_id,bundle_type,system_id, and related payload fields - a new local dense embedding stage powered by
sentence-transformers - an optional cloud semantic embedding backend for Google Vertex AI
- a hierarchical JSON configuration file for switching embedding providers and models
- a Textual-based terminal config editor for day-to-day configuration work
- Install Python 3.14 or newer.
- Install
uv. - Sync the environment:
uv syncTo use Google Vertex AI embeddings, install the matching optional extra:
uv sync --extra google- Run the dense embedding smoke test:
uv run python scripts/extra/show-privacy-status.py
uv run python scripts/extra/smoke-test.py- Review or edit the config.
You can edit config/pipeline.json directly or use the TUI:
uv run log-analyse-config-uiThe TUI saves pipeline.json, rebuilds the compiled cache automatically, and stores its own local UI preferences in config/pipeline.ui.json.
Useful TUI actions:
Save: writespipeline.jsonand refreshespipeline.config.cache.binRevert: discards unsaved form changesDefault: loads the shipped safe baseline presetSecure: loads a stricter local-only privacy-first preset
-
Place a bundle under
data/input/run-001/bundle/or pass--bundle-path. The current pipeline processes one unpacked bundle per run. -
Run the idempotent main pipeline:
uv run log-analyse --run-id run-001If you want to rebuild the local compiled config cache explicitly:
uv run log-analyse-config-compileThe dense embedding stage reads its defaults from config/pipeline.json.
The main dense-provider setting looks like this:
{
"embedding": {
"semantic": {
"provider": {
"kind": "sentence-transformers",
"model_name": "BAAI/bge-small-en-v1.5"
}
}
}
}Change provider.kind to switch providers and provider.model_name to switch models without rewriting the script.
The two currently implemented provider families are:
sentence-transformersgoogle-vertex-ai
Provider-specific settings live inside the provider object. For example, local_files_only and trust_remote_code apply to sentence-transformers, while project, location, and credentials_env apply to google-vertex-ai.
config/pipeline.json is the source of truth.
The runtime may generate config/pipeline.config.cache.bin as a local compiled cache, but that file is not meant for editing, review, or committing to git.
It is safe to delete; the next run will recreate it from pipeline.json.
The TUI may also generate config/pipeline.ui.json for local interface preferences such as the selected theme.
That file is also local-only and is not meant for committing to git.
- Project Overview
- Quick Start
- Operator Runbook
- Pipeline Diagram And Flow
- Configuration
- Run Metadata
- Privacy And Offline Mode
- Corpus Scope
- Preprocessing Playbook
- Evaluation Suite
- Bundle Types
- Bundle Handler Architecture
- Add A New Bundle Type
- Multiple Bundles In One Incident
- Qdrant Data Model
- Payload Schema
- Idempotent Orchestration
- Phase Docs Directory
The main log-analyse command prepares retrieval-ready artifacts on disk.
Publishing those artifacts into Qdrant and running retrieval/evaluation workflows are separate commands.
- The repository is now configured in offline-first mode for dense embeddings.
- Dense embedding runs should use only local model files and should not try to contact Hugging Face.
- If the dense model is missing locally, the dense stage should fail instead of downloading it.
- Today the implemented bundle profile is
linux-sosreport. - Today the only fully implemented bundle handler is also
linux-sosreport. - Future bundle types already planned in config and docs include ESXi vm-support, vCenter support bundles, storage support bundles, and Jenkins logs bundles.