This guide covers the current command-line workflows. Commands assume the ontocellia conda environment is active.
conda env create -f environment.yml
conda activate ontocelliaStart Ontocellia without arguments to enter the Soft Lab Console TUI:
python -m ontocelliaYou can type a natural language task directly:
Fix failing tests while preserving behavior.
The TUI induces specs, seeds a tissue, runs an initial collaboration pass, and shows agents, action intents, matrix records, handoffs, and a session report. TUI sessions write artifacts to artifacts/tui_sessions/<session-id>/.
Framework tissues start from one stem-origin cell by default. The runtime proliferates first, then differentiates cells into task-specific fates before emitting action intents.
Useful TUI commands:
/setup
/models
/config
/new <task>
/run [ticks]
/step
/agents
/intents
/matrix
/handoffs
/tools
/report
/benchmark
/mock
/clear
/exit
/agents, /intents, /matrix, /handoffs, /tools, /report, and /config refresh the corresponding panels and record the selected view in the event feed. Non-TTY input uses a lightweight boxed shell for script compatibility.
You can also force the Textual/Rich TUI with:
python -m ontocellia tuiThe setup flow uses numbered provider and model choices. It stores local model configuration under ~/.ontocellia/. API keys are stored in ~/.ontocellia/secrets.env with user-only file permissions when entered through /setup.
Start the local HTTP/WebSocket server:
python -m ontocellia server --host 127.0.0.1 --port 8765Core endpoints:
GET /health
GET /projects
GET /projects/{project}/sessions
GET /projects/{project}/tool-policy-profiles
POST /v1/chat/completions
POST /sessions
GET /sessions
GET /sessions/{id}
POST /sessions/{id}/task
POST /sessions/{id}/change-medium
POST /sessions/{id}/run
POST /sessions/{id}/step
POST /sessions/{id}/interventions
GET /sessions/{id}/agents
GET /sessions/{id}/intents
GET /sessions/{id}/matrix
GET /sessions/{id}/handoffs
GET /sessions/{id}/tools
GET /sessions/{id}/tool-approvals
POST /sessions/{id}/tool-approvals
GET /sessions/{id}/artifacts/{name}
WS /sessions/{id}/events
The server writes artifacts under artifacts/server_sessions/<session-id>/. It uses the mock provider by default; pass --real-provider only when you want it to use configured model profiles. WebSocket clients receive an initial snapshot followed by live session and trace events. The browser Web Lab product direction is documented in web-lab-design.md.
Natural language entered into an existing session is treated as a culture-medium change. It deposits a matrix record, emits task morphogens, advances the tissue, and preserves the session's lineage rather than replacing the tissue. The first supported interventions are morphogen injection, cell clearing, cell freezing, pause, and resume.
Tool approvals remain safe by default. Pending ToolInvocation records can be approved through the API, but API approvals are forced through a dry-run policy unless the server was started with explicit project policy profiles. Client payloads can narrow allowlists, but they cannot authorize real writes, commands, network calls, browser actions, or MCP calls beyond a named server-side profile.
Example local policy config:
{
"profiles": {
"local-dev": {
"description": "Local repository edits for a trusted development session.",
"dry_run": false,
"allowed_interfaces": ["workspace.apply_patch", "pytest.run", "git.diff"],
"allowed_commands": ["python -m pytest -q"],
"allowed_write_globs": ["src/**/*.py", "tests/**/*.py"],
"timeout_seconds": 30
}
}
}Start the server with python -m ontocellia server --tool-policy-config path/to/tool-policy.json. Approval requests then use "policy_profile": "local-dev"; omit it to keep dry-run approval behavior.
Non-interactive equivalents are available:
python -m ontocellia config setup
python -m ontocellia config models list
python -m ontocellia config models status
python -m ontocellia config models add
python -m ontocellia config models set deepseek
python -m ontocellia config models test deepseek
python -m ontocellia config validate
python -m ontocellia config get models.default
python -m ontocellia config filepython -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--steps 4 \
--output artifacts/repo_repair_tissueUse --stem-cells N only when you want an experiment to start with a larger initial stem pool.
Outputs:
tissue_summary.jsontissue_trace.json
python -m ontocellia induce \
--task "Fix failing tests while preserving behavior" \
--domain repo_repair \
--output artifacts/inducedRun the induced tissue:
python -m ontocellia tissue \
--genome-spec artifacts/induced/genome.yaml \
--environment-spec artifacts/induced/environment.yaml \
--effector mock-llm \
--output artifacts/induced_tissueOrgan selection consumes structured validation results. By default, validation hooks remain metadata and are not executed.
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--validation-result examples/framework/validation_failed.json \
--steps 4 \
--output artifacts/organ_selection_tissueTo execute hooks, use the opt-in Validation Hook Runner and explicitly allow each command:
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--steps 4 \
--run-validation-hooks \
--allow-validation-hook "python -m pytest -q" \
--output artifacts/validation_runner_tissueThe runner uses exact command allowlisting and does not execute through a shell. It writes validation_results.json, records hook events in tissue_trace.json, and feeds the resulting OrganValidationResult records back into organ selection.
MCP entries live inside the environment spec. Ontocellia maps them into biological interfaces without starting external MCP servers.
mcp:
servers:
- id: repo
tools:
- name: read_file
description: Read a workspace file.
accepts_fates: [explorer, repair]
input_schema:
type: object
resources:
- id: failing-log
uri: file://pytest.log
content: 3 failing tests
tags: [test_failure, repo]
position:
node_id: repair-niche
prompts:
- id: repair-protocol
template: Inspect failure, patch narrow, validate.
tags: [repair]Loaded tools appear as mcp:<server>:tool:<name> membrane channels, resources become matrix records, and prompts become induction-factor interfaces. The tissue summary includes mcp_interfaces.
Mutation selection compares baseline and candidate validation results. It writes mutation candidates, a decision report, and a solidified genome.
python -m ontocellia mutate \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--baseline-validation examples/framework/validation_failed.json \
--candidate-validation examples/framework/validation_passed.json \
--output artifacts/mutation_selectionThe input genome is never overwritten. If candidate validation does not improve, solidified_genome.yaml contains the original genome and the report marks the decision as not_selected.
The deterministic repo repair demo writes induced specs, tissue trace, mock LLM intents, validation evidence, mutation outputs, and a final report.
python -m ontocellia demo \
--task "Fix failing tests while preserving behavior" \
--steps 4 \
--output artifacts/complete_repo_repair_demoThe built-in MiniBench suite measures Ontocellia-native agent tissue capabilities with mock LLM effectors.
python -m ontocellia benchmark \
--suite ontocellia_minibench_v1 \
--effector mock-llm \
--output artifacts/benchmarks/minibenchOutputs:
benchmark_summary.jsonbenchmark_results.csvbenchmark_report.md- per-task tissue traces, summaries, and action intents
The TUI also supports /benchmark, which runs the mock MiniBench and prints a score summary.
Structure search compares deterministic tissue variants induced from the same task. It is used to ask which organization fits the current environment better, not which provider is strongest.
python -m ontocellia structure-search \
--task "Fix failing tests while preserving behavior." \
--domain repo_repair \
--effector mock-llm \
--steps 6 \
--with-attribution \
--seed 7 \
--output artifacts/structure_searchOutputs:
structure_search_summary.jsonstructure_trials.csvstructure_search_report.mdselected_variant.json- per-variant tissue summaries, traces, and action intents under
variants/
When --with-attribution is enabled, each variant also writes an attribution/ directory and structure_search_summary.json includes selected_variant_explanation.
Attribution can also be run after the fact from saved artifacts:
python -m ontocellia attribute \
--trace artifacts/repo_repair_tissue/tissue_trace.json \
--summary artifacts/repo_repair_tissue/tissue_summary.json \
--output artifacts/repo_repair_tissue/attributionAdd --actions, --tool-results, --execution-results, and --validation-results when those files exist.
Outputs:
contribution_graph.jsoncontribution_summary.jsoncell_contributions.csvgene_contributions.csvmatrix_contributions.csvcontribution_report.md
For new tissue runs, add --with-attribution to write the same report under the tissue output directory and add an attribution block to tissue_summary.json.
After structure search finds a consistently useful variant, solidification can turn that structure into reusable developmental bias:
python -m ontocellia solidify \
--structure-search artifacts/structure_search/structure_search_summary.json \
--genome-spec examples/framework/repo_repair_genome.yaml \
--min-score 0.65 \
--min-margin 0.03 \
--output artifacts/solidificationOutputs:
solidified_tendencies.jsonsolidification_report.mdsolidified_genome.yaml
The input genome is not modified. When selected, the output genome records the tendency in metadata and adds enhancer-style regulatory elements for matching genes.
Longitudinal replay compares Ontocellia's adaptive tissue against simpler baselines across a related task family. It is the deterministic first step toward proving whether structure search and solidification make later sessions more efficient or robust.
python -m ontocellia longitudinal-replay \
--task "Fix failing tests while preserving behavior." \
--task "Fix a regression without broad rewrites." \
--domain repo_repair \
--effector mock-llm \
--steps 6 \
--output artifacts/longitudinal_replayWhen no --task values are provided, Ontocellia uses the built-in repo-repair replay family. The runner compares:
direct_agentsingle_cellfixed_tissueadaptive_tissue
Outputs:
longitudinal_replay_summary.jsonlongitudinal_trials.csvlongitudinal_replay_report.mdsolidification/solidification_report.md- per-task artifacts under
tasks/<task-id>/
Official benchmark runs use upstream task shapes and report Ontocellia tissue metrics separately from external scorer status. The default mode for non-BFCL benchmarks is adaptive-tissue.
python -m ontocellia official-benchmark run \
--benchmark tau-bench \
--model-profile deepseek \
--limit 1 \
--mode adaptive-tissue \
--tau-domain airline \
--structure-search \
--with-attribution \
--output artifacts/official_benchmarks/tau_bench/deepseek_smokeOutputs include:
official_tasks.jsonlofficial_task_manifest.jsonscoring_status.jsonontocellia_predictions.jsonlofficial_results.jsonstructure_report.jsonadaptation_report.mdontocellia_summary.json- per-task tissue traces under
tissue_traces/
Use --task-id for one specific official task, or --full only when you intend to run the full selected benchmark. Use --run-official-scorer when external scorer execution is intentional. Scorer commands must emit machine-readable scores before Ontocellia records official pass/fail. See official-benchmarks.md for Terminal-Bench custom agent, tau-bench bridge, SWE-bench scorer, and custom scorer details.
BFCL is kept as a provider/tool-call baseline:
python -m ontocellia official-benchmark run \
--benchmark bfcl \
--model-profile deepseek \
--limit 50 \
--mode provider-baseline \
--output artifacts/official_benchmarks/bfcl/provider_baselineBy default, tissue emits intents and communication artifacts without performing local effects. Add --execute-actions to route intents through the extracellular tool policy. Dry-run is enabled by default.
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--steps 4 \
--effector mock-llm \
--execute-actions \
--execution-dry-run \
--allow-interface workspace.search \
--allow-interface git.diff \
--output artifacts/execution_dry_runThis writes execution_results.json and deposits execution evidence into the matrix. To allow real local execution, keep the allowlist exact:
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--effector mock-llm \
--execute-actions \
--no-execution-dry-run \
--allow-interface pytest.run \
--allow-command "python -m pytest -q" \
--allow-write "src/**/*.py" \
--output artifacts/execution_allowedThe execution layer only performs work allowed by the active policy. It does not commit, push, install dependencies, or download benchmark data as part of action execution. It writes:
tool_invocations.jsontool_results.jsonexecution_results.jsonfor compatibility- matrix records containing execution evidence
Long tool and validation output is handled by output metabolism. Full raw text goes under raw_outputs/; result JSON and matrix records keep bounded digests plus artifact references. See communication.md for context and output metadata details.
Additional adapter gates are explicit:
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--effector mock-llm \
--execute-actions \
--allow-interface mcp:repo:tool:read_file \
--allow-mcp-tool mcp:repo:tool:read_file \
--allow-interface http.request \
--allow-network-host api.example.com \
--enable-http-tools \
--output artifacts/tool_runtimeMCP, HTTP, and browser adapters are disabled until their specific policy flags are present. Browser support is currently an adapter boundary for future richer automation.
Use communication.context_budget_chars and communication.context_metabolism in an environment spec to tune approximate context size and matrix remodeling:
communication:
matrix_query_limit: 5
context_budget_chars: 1600
context_metabolism:
enabled: true
window_ticks: 3
max_metabolites_per_tick: 4
max_metabolite_chars: 700
min_source_records: 2
source_salience_decay: 0.15Cells receive bounded matrix context instead of the full tissue history. Inspect tissue_trace.json for llm_effector, context_metabolite_deposited, and context_metabolism events. See communication.md for record lifecycle fields, context packets, and output digest metadata.
Resource competition is enabled by default with weak maintenance cost. Configure stronger pressure in an environment spec when you want structure-search or tissue runs to penalize oversized, expensive, or low-energy organizations:
resources:
population_cap: 6
maintenance_cost: 0.01
differentiated_cost: 0.02
action_intent_cost: 0.015
contribution_reward: 0.08
negative_contribution_penalty: 0.08
over_cap_pressure_weight: 0.25
allow_quiescence: false
allow_apoptosis: falseTissue summaries include resource_competition, and traces include resource_competition events. Structure-search and official benchmark metrics include resource_efficiency, average_cell_energy, and population_pressure.
Developmental annealing controls plasticity over time. Early high temperature favors exploration; later commitment raises fate locks. Repeated validation failure can unlock and reprogram a bounded number of differentiated cells:
annealing:
warmup_ticks: 3
stabilization_ticks: 8
initial_temperature: 1.0
final_temperature: 0.15
fate_lock_growth: 0.04
failure_unlock: 0.12
repeated_failure_threshold: 2
reprogramming_pressure_threshold: 0.85
max_reprogramming_per_tick: 1Tissue summaries include annealing, and traces include developmental_annealing plus annealing_reprogramming when local reprogramming occurs. Structure-search and official benchmark metrics include annealing_temperature, average_fate_lock, and reprogramming_events.
Mock LLM mode is deterministic:
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--effector mock-llm \
--output artifacts/mock_llm_tissueReal providers are optional. API keys are read from environment variables and are not written into trace artifacts.
| Provider | Environment variable | Default base URL |
|---|---|---|
deepseek |
DEEPSEEK_API_KEY |
https://api.deepseek.com |
kimi |
MOONSHOT_API_KEY or KIMI_API_KEY |
https://api.moonshot.ai/v1 |
minimax |
MINIMAX_API_KEY |
https://api.minimax.io/v1 |
openai |
OPENAI_API_KEY |
https://api.openai.com/v1 |
openrouter |
OPENROUTER_API_KEY |
https://openrouter.ai/api/v1 |
ollama |
OLLAMA_API_KEY |
http://localhost:11434/v1 |
custom-openai-compatible |
ONTOCELLIA_CUSTOM_API_KEY |
configured in setup |
DeepSeek setup includes default profile choices such as deepseek-v4-flash and deepseek-v4-pro.
After configuring a default model profile, use the simplified LLM effector:
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--effector llm \
--output artifacts/configured_llm_tissueMiniMax token-plan keys may require a regional host:
python -m ontocellia tissue \
--genome-spec examples/framework/repo_repair_genome.yaml \
--environment-spec examples/framework/failing_tests_environment.yaml \
--effector minimax \
--llm-base-url https://api.minimax.chat/v1 \
--output artifacts/minimax_tissueLive provider tests are opt-in:
set -a
source .env.local
set +a
ONTOCELLIA_LIVE_LLM=1 conda run -n ontocellia python -m pytest -q tests/test_llm_live_e2e.pypython -m ontocellia experiment \
--experiment-spec examples/experiments/contact_ablation.yaml \
--output artifacts/contact_ablationExperiments write per-variant run directories plus comparison artifacts.
python -m ontocellia validate \
--genome-spec examples/specs/minimal_genome.yaml \
--environment-spec examples/specs/minimal_environment.yaml
python -m ontocellia schema-docs --output docs/schemapython -m ontocellia run --steps 20 --output artifacts/demo
python -m ontocellia --steps 20 --output artifacts/legacy_demo