Skip to content

Latest commit

 

History

History
610 lines (472 loc) · 20.4 KB

File metadata and controls

610 lines (472 loc) · 20.4 KB

Usage Guide

This guide covers the current command-line workflows. Commands assume the ontocellia conda environment is active.

Install

conda env create -f environment.yml
conda activate ontocellia

Interactive TUI

Start Ontocellia without arguments to enter the Soft Lab Console TUI:

python -m ontocellia

You can type a natural language task directly:

Fix failing tests while preserving behavior.

The TUI induces specs, seeds a tissue, runs an initial collaboration pass, and shows agents, action intents, matrix records, handoffs, and a session report. TUI sessions write artifacts to artifacts/tui_sessions/<session-id>/.

Framework tissues start from one stem-origin cell by default. The runtime proliferates first, then differentiates cells into task-specific fates before emitting action intents.

Useful TUI commands:

/setup
/models
/config
/new <task>
/run [ticks]
/step
/agents
/intents
/matrix
/handoffs
/tools
/report
/benchmark
/mock
/clear
/exit

/agents, /intents, /matrix, /handoffs, /tools, /report, and /config refresh the corresponding panels and record the selected view in the event feed. Non-TTY input uses a lightweight boxed shell for script compatibility.

You can also force the Textual/Rich TUI with:

python -m ontocellia tui

The setup flow uses numbered provider and model choices. It stores local model configuration under ~/.ontocellia/. API keys are stored in ~/.ontocellia/secrets.env with user-only file permissions when entered through /setup.

App Server

Start the local HTTP/WebSocket server:

python -m ontocellia server --host 127.0.0.1 --port 8765

Core endpoints:

GET  /health
GET  /projects
GET  /projects/{project}/sessions
GET  /projects/{project}/tool-policy-profiles
POST /v1/chat/completions
POST /sessions
GET  /sessions
GET  /sessions/{id}
POST /sessions/{id}/task
POST /sessions/{id}/change-medium
POST /sessions/{id}/run
POST /sessions/{id}/step
POST /sessions/{id}/interventions
GET  /sessions/{id}/agents
GET  /sessions/{id}/intents
GET  /sessions/{id}/matrix
GET  /sessions/{id}/handoffs
GET  /sessions/{id}/tools
GET  /sessions/{id}/tool-approvals
POST /sessions/{id}/tool-approvals
GET  /sessions/{id}/artifacts/{name}
WS   /sessions/{id}/events

The server writes artifacts under artifacts/server_sessions/<session-id>/. It uses the mock provider by default; pass --real-provider only when you want it to use configured model profiles. WebSocket clients receive an initial snapshot followed by live session and trace events. The browser Web Lab product direction is documented in web-lab-design.md.

Natural language entered into an existing session is treated as a culture-medium change. It deposits a matrix record, emits task morphogens, advances the tissue, and preserves the session's lineage rather than replacing the tissue. The first supported interventions are morphogen injection, cell clearing, cell freezing, pause, and resume.

Tool approvals remain safe by default. Pending ToolInvocation records can be approved through the API, but API approvals are forced through a dry-run policy unless the server was started with explicit project policy profiles. Client payloads can narrow allowlists, but they cannot authorize real writes, commands, network calls, browser actions, or MCP calls beyond a named server-side profile.

Example local policy config:

{
  "profiles": {
    "local-dev": {
      "description": "Local repository edits for a trusted development session.",
      "dry_run": false,
      "allowed_interfaces": ["workspace.apply_patch", "pytest.run", "git.diff"],
      "allowed_commands": ["python -m pytest -q"],
      "allowed_write_globs": ["src/**/*.py", "tests/**/*.py"],
      "timeout_seconds": 30
    }
  }
}

Start the server with python -m ontocellia server --tool-policy-config path/to/tool-policy.json. Approval requests then use "policy_profile": "local-dev"; omit it to keep dry-run approval behavior.

Non-interactive equivalents are available:

python -m ontocellia config setup
python -m ontocellia config models list
python -m ontocellia config models status
python -m ontocellia config models add
python -m ontocellia config models set deepseek
python -m ontocellia config models test deepseek
python -m ontocellia config validate
python -m ontocellia config get models.default
python -m ontocellia config file

Run A Framework Tissue

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --steps 4 \
  --output artifacts/repo_repair_tissue

Use --stem-cells N only when you want an experiment to start with a larger initial stem pool.

Outputs:

  • tissue_summary.json
  • tissue_trace.json

Induce Specs From A Task

python -m ontocellia induce \
  --task "Fix failing tests while preserving behavior" \
  --domain repo_repair \
  --output artifacts/induced

Run the induced tissue:

python -m ontocellia tissue \
  --genome-spec artifacts/induced/genome.yaml \
  --environment-spec artifacts/induced/environment.yaml \
  --effector mock-llm \
  --output artifacts/induced_tissue

Use Organ Selection Results

Organ selection consumes structured validation results. By default, validation hooks remain metadata and are not executed.

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --validation-result examples/framework/validation_failed.json \
  --steps 4 \
  --output artifacts/organ_selection_tissue

To execute hooks, use the opt-in Validation Hook Runner and explicitly allow each command:

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --steps 4 \
  --run-validation-hooks \
  --allow-validation-hook "python -m pytest -q" \
  --output artifacts/validation_runner_tissue

The runner uses exact command allowlisting and does not execute through a shell. It writes validation_results.json, records hook events in tissue_trace.json, and feeds the resulting OrganValidationResult records back into organ selection.

MCP Adapter Specs

MCP entries live inside the environment spec. Ontocellia maps them into biological interfaces without starting external MCP servers.

mcp:
  servers:
    - id: repo
      tools:
        - name: read_file
          description: Read a workspace file.
          accepts_fates: [explorer, repair]
          input_schema:
            type: object
      resources:
        - id: failing-log
          uri: file://pytest.log
          content: 3 failing tests
          tags: [test_failure, repo]
          position:
            node_id: repair-niche
      prompts:
        - id: repair-protocol
          template: Inspect failure, patch narrow, validate.
          tags: [repair]

Loaded tools appear as mcp:<server>:tool:<name> membrane channels, resources become matrix records, and prompts become induction-factor interfaces. The tissue summary includes mcp_interfaces.

Mutation Selection

Mutation selection compares baseline and candidate validation results. It writes mutation candidates, a decision report, and a solidified genome.

python -m ontocellia mutate \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --baseline-validation examples/framework/validation_failed.json \
  --candidate-validation examples/framework/validation_passed.json \
  --output artifacts/mutation_selection

The input genome is never overwritten. If candidate validation does not improve, solidified_genome.yaml contains the original genome and the report marks the decision as not_selected.

Reference End-To-End Demo

The deterministic repo repair demo writes induced specs, tissue trace, mock LLM intents, validation evidence, mutation outputs, and a final report.

python -m ontocellia demo \
  --task "Fix failing tests while preserving behavior" \
  --steps 4 \
  --output artifacts/complete_repo_repair_demo

Benchmark A Tissue

The built-in MiniBench suite measures Ontocellia-native agent tissue capabilities with mock LLM effectors.

python -m ontocellia benchmark \
  --suite ontocellia_minibench_v1 \
  --effector mock-llm \
  --output artifacts/benchmarks/minibench

Outputs:

  • benchmark_summary.json
  • benchmark_results.csv
  • benchmark_report.md
  • per-task tissue traces, summaries, and action intents

The TUI also supports /benchmark, which runs the mock MiniBench and prints a score summary.

Search Tissue Structures

Structure search compares deterministic tissue variants induced from the same task. It is used to ask which organization fits the current environment better, not which provider is strongest.

python -m ontocellia structure-search \
  --task "Fix failing tests while preserving behavior." \
  --domain repo_repair \
  --effector mock-llm \
  --steps 6 \
  --with-attribution \
  --seed 7 \
  --output artifacts/structure_search

Outputs:

  • structure_search_summary.json
  • structure_trials.csv
  • structure_search_report.md
  • selected_variant.json
  • per-variant tissue summaries, traces, and action intents under variants/

When --with-attribution is enabled, each variant also writes an attribution/ directory and structure_search_summary.json includes selected_variant_explanation.

Attribute A Tissue Run

Attribution can also be run after the fact from saved artifacts:

python -m ontocellia attribute \
  --trace artifacts/repo_repair_tissue/tissue_trace.json \
  --summary artifacts/repo_repair_tissue/tissue_summary.json \
  --output artifacts/repo_repair_tissue/attribution

Add --actions, --tool-results, --execution-results, and --validation-results when those files exist.

Outputs:

  • contribution_graph.json
  • contribution_summary.json
  • cell_contributions.csv
  • gene_contributions.csv
  • matrix_contributions.csv
  • contribution_report.md

For new tissue runs, add --with-attribution to write the same report under the tissue output directory and add an attribution block to tissue_summary.json.

Solidify A Selected Structure

After structure search finds a consistently useful variant, solidification can turn that structure into reusable developmental bias:

python -m ontocellia solidify \
  --structure-search artifacts/structure_search/structure_search_summary.json \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --min-score 0.65 \
  --min-margin 0.03 \
  --output artifacts/solidification

Outputs:

  • solidified_tendencies.json
  • solidification_report.md
  • solidified_genome.yaml

The input genome is not modified. When selected, the output genome records the tendency in metadata and adds enhancer-style regulatory elements for matching genes.

Replay A Task Family

Longitudinal replay compares Ontocellia's adaptive tissue against simpler baselines across a related task family. It is the deterministic first step toward proving whether structure search and solidification make later sessions more efficient or robust.

python -m ontocellia longitudinal-replay \
  --task "Fix failing tests while preserving behavior." \
  --task "Fix a regression without broad rewrites." \
  --domain repo_repair \
  --effector mock-llm \
  --steps 6 \
  --output artifacts/longitudinal_replay

When no --task values are provided, Ontocellia uses the built-in repo-repair replay family. The runner compares:

  • direct_agent
  • single_cell
  • fixed_tissue
  • adaptive_tissue

Outputs:

  • longitudinal_replay_summary.json
  • longitudinal_trials.csv
  • longitudinal_replay_report.md
  • solidification/solidification_report.md
  • per-task artifacts under tasks/<task-id>/

Run Official Benchmark Data

Official benchmark runs use upstream task shapes and report Ontocellia tissue metrics separately from external scorer status. The default mode for non-BFCL benchmarks is adaptive-tissue.

python -m ontocellia official-benchmark run \
  --benchmark tau-bench \
  --model-profile deepseek \
  --limit 1 \
  --mode adaptive-tissue \
  --tau-domain airline \
  --structure-search \
  --with-attribution \
  --output artifacts/official_benchmarks/tau_bench/deepseek_smoke

Outputs include:

  • official_tasks.jsonl
  • official_task_manifest.json
  • scoring_status.json
  • ontocellia_predictions.jsonl
  • official_results.json
  • structure_report.json
  • adaptation_report.md
  • ontocellia_summary.json
  • per-task tissue traces under tissue_traces/

Use --task-id for one specific official task, or --full only when you intend to run the full selected benchmark. Use --run-official-scorer when external scorer execution is intentional. Scorer commands must emit machine-readable scores before Ontocellia records official pass/fail. See official-benchmarks.md for Terminal-Bench custom agent, tau-bench bridge, SWE-bench scorer, and custom scorer details.

BFCL is kept as a provider/tool-call baseline:

python -m ontocellia official-benchmark run \
  --benchmark bfcl \
  --model-profile deepseek \
  --limit 50 \
  --mode provider-baseline \
  --output artifacts/official_benchmarks/bfcl/provider_baseline

Execute Action Intents

By default, tissue emits intents and communication artifacts without performing local effects. Add --execute-actions to route intents through the extracellular tool policy. Dry-run is enabled by default.

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --steps 4 \
  --effector mock-llm \
  --execute-actions \
  --execution-dry-run \
  --allow-interface workspace.search \
  --allow-interface git.diff \
  --output artifacts/execution_dry_run

This writes execution_results.json and deposits execution evidence into the matrix. To allow real local execution, keep the allowlist exact:

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --effector mock-llm \
  --execute-actions \
  --no-execution-dry-run \
  --allow-interface pytest.run \
  --allow-command "python -m pytest -q" \
  --allow-write "src/**/*.py" \
  --output artifacts/execution_allowed

The execution layer only performs work allowed by the active policy. It does not commit, push, install dependencies, or download benchmark data as part of action execution. It writes:

  • tool_invocations.json
  • tool_results.json
  • execution_results.json for compatibility
  • matrix records containing execution evidence

Long tool and validation output is handled by output metabolism. Full raw text goes under raw_outputs/; result JSON and matrix records keep bounded digests plus artifact references. See communication.md for context and output metadata details.

Additional adapter gates are explicit:

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --effector mock-llm \
  --execute-actions \
  --allow-interface mcp:repo:tool:read_file \
  --allow-mcp-tool mcp:repo:tool:read_file \
  --allow-interface http.request \
  --allow-network-host api.example.com \
  --enable-http-tools \
  --output artifacts/tool_runtime

MCP, HTTP, and browser adapters are disabled until their specific policy flags are present. Browser support is currently an adapter boundary for future richer automation.

Inspect Context

Use communication.context_budget_chars and communication.context_metabolism in an environment spec to tune approximate context size and matrix remodeling:

communication:
  matrix_query_limit: 5
  context_budget_chars: 1600
  context_metabolism:
    enabled: true
    window_ticks: 3
    max_metabolites_per_tick: 4
    max_metabolite_chars: 700
    min_source_records: 2
    source_salience_decay: 0.15

Cells receive bounded matrix context instead of the full tissue history. Inspect tissue_trace.json for llm_effector, context_metabolite_deposited, and context_metabolism events. See communication.md for record lifecycle fields, context packets, and output digest metadata.

Tune Resource Competition

Resource competition is enabled by default with weak maintenance cost. Configure stronger pressure in an environment spec when you want structure-search or tissue runs to penalize oversized, expensive, or low-energy organizations:

resources:
  population_cap: 6
  maintenance_cost: 0.01
  differentiated_cost: 0.02
  action_intent_cost: 0.015
  contribution_reward: 0.08
  negative_contribution_penalty: 0.08
  over_cap_pressure_weight: 0.25
  allow_quiescence: false
  allow_apoptosis: false

Tissue summaries include resource_competition, and traces include resource_competition events. Structure-search and official benchmark metrics include resource_efficiency, average_cell_energy, and population_pressure.

Tune Developmental Annealing

Developmental annealing controls plasticity over time. Early high temperature favors exploration; later commitment raises fate locks. Repeated validation failure can unlock and reprogram a bounded number of differentiated cells:

annealing:
  warmup_ticks: 3
  stabilization_ticks: 8
  initial_temperature: 1.0
  final_temperature: 0.15
  fate_lock_growth: 0.04
  failure_unlock: 0.12
  repeated_failure_threshold: 2
  reprogramming_pressure_threshold: 0.85
  max_reprogramming_per_tick: 1

Tissue summaries include annealing, and traces include developmental_annealing plus annealing_reprogramming when local reprogramming occurs. Structure-search and official benchmark metrics include annealing_temperature, average_fate_lock, and reprogramming_events.

LLM Effectors

Mock LLM mode is deterministic:

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --effector mock-llm \
  --output artifacts/mock_llm_tissue

Real providers are optional. API keys are read from environment variables and are not written into trace artifacts.

Provider Environment variable Default base URL
deepseek DEEPSEEK_API_KEY https://api.deepseek.com
kimi MOONSHOT_API_KEY or KIMI_API_KEY https://api.moonshot.ai/v1
minimax MINIMAX_API_KEY https://api.minimax.io/v1
openai OPENAI_API_KEY https://api.openai.com/v1
openrouter OPENROUTER_API_KEY https://openrouter.ai/api/v1
ollama OLLAMA_API_KEY http://localhost:11434/v1
custom-openai-compatible ONTOCELLIA_CUSTOM_API_KEY configured in setup

DeepSeek setup includes default profile choices such as deepseek-v4-flash and deepseek-v4-pro.

After configuring a default model profile, use the simplified LLM effector:

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --effector llm \
  --output artifacts/configured_llm_tissue

MiniMax token-plan keys may require a regional host:

python -m ontocellia tissue \
  --genome-spec examples/framework/repo_repair_genome.yaml \
  --environment-spec examples/framework/failing_tests_environment.yaml \
  --effector minimax \
  --llm-base-url https://api.minimax.chat/v1 \
  --output artifacts/minimax_tissue

Live provider tests are opt-in:

set -a
source .env.local
set +a
ONTOCELLIA_LIVE_LLM=1 conda run -n ontocellia python -m pytest -q tests/test_llm_live_e2e.py

Experiments

python -m ontocellia experiment \
  --experiment-spec examples/experiments/contact_ablation.yaml \
  --output artifacts/contact_ablation

Experiments write per-variant run directories plus comparison artifacts.

Validation And Schema Docs

python -m ontocellia validate \
  --genome-spec examples/specs/minimal_genome.yaml \
  --environment-spec examples/specs/minimal_environment.yaml

python -m ontocellia schema-docs --output docs/schema

Reference Simulation Runtime

python -m ontocellia run --steps 20 --output artifacts/demo
python -m ontocellia --steps 20 --output artifacts/legacy_demo