Status: current API tutorial Last updated: 2026-04-29 Language: English | 繁體中文
This document only describes pipeline mode behavior that is currently implemented and supported. It does not repeat older RFC or design-draft material. The focus is what you can execute now, save now, schedule now, and inspect through history now.
Pipeline mode currently has 3 practical entry points:
- Pass YAML or JSON directly into
unified_search(..., pipeline="...") - Save the config first with
manage_pipelineorsave_pipeline, then run it withsaved:<name> - Save it first, then schedule recurring runs with
schedule_pipelineormanage_pipeline(action="schedule")
unified_search(
query="",
pipeline="""
template: pico
params:
P: ICU patients requiring mechanical ventilation
I: remimazolam
C: propofol
O: sedation adequacy
output:
format: markdown
limit: 20
ranking: balanced
""",
)Notes:
unified_searchstill requiresqueryin the function signature, but oncepipelineis set, ordinary search parameters are generally ignored. The safe pattern isquery="".output_format="json"forces structured JSON.output.format: jsoninside the pipeline also returns structured JSON.dry_run=Truepreviews the resolved DAG without external searches.stop_at="<step_id>"executes only through that step.
manage_pipeline(
action="save",
name="icu_remi_vs_propofol",
config="""
template: pico
params:
P: ICU patients requiring mechanical ventilation
I: remimazolam
C: propofol
O: delirium, sedation quality
output:
limit: 25
ranking: quality
""",
tags="icu,sedation,remimazolam",
description="ICU sedation comparison",
)
unified_search(query="", pipeline="saved:icu_remi_vs_propofol")Pipeline reports now include filter diagnostics and next-step handoffs. At the bottom of a Markdown report, the recommended continuation tools are:
get_session_pmids()for the run PMID setprepare_export(pmids="last", format="ris")for Zotero/EndNote/Mendeley-style citation handoffsave_literature_notes(pmids="last", note_format="wiki")for local wiki/Foam-compatible Markdown notes
Use structured JSON when another agent or extension should consume articles directly:
output:
format: json
limit: 20
ranking: qualityThe JSON response contains summary, steps, per-step metadata, and structured articles.
| Situation | Recommended entry point |
|---|---|
| You only want to run it once quickly | inline unified_search(..., pipeline="...") |
| You want to reuse the same search strategy | manage_pipeline(action="save") |
| You want history diffs or scheduling | save first, then use saved:<name> or schedule_pipeline |
| You want to load a local YAML file and adjust it | load_pipeline(source="file:path/to/pipeline.yaml") |
Template pipelines cover roughly 80% of normal use cases. The YAML is not sent directly into the executor as-is. It is first expanded into a real step DAG.
| Template | Required parameters | Common optional parameters | Purpose |
|---|---|---|---|
pico |
P, I |
C, O, sources, limit |
Clinical comparison question |
comprehensive |
query |
sources, limit, min_year, max_year |
Broad multi-source search |
exploration |
pmid |
limit |
Explore outward from one seed paper |
gene_drug |
term |
sources, limit, min_year, max_year |
Gene- or drug-focused search |
- The shortest inline YAML usually uses
params save_pipeline,manage_pipeline(save), andload_pipelinealso accepttemplate_params- When a saved pipeline is loaded, the system will often output it as
template_params
Recommendation:
- Use
paramsfor handwritten inline pipelines - Use
template_paramsfor YAML that you want to save long term or review more formally
template: pico
params:
P: ICU patients requiring sedation
I: remimazolam
C: propofol
O: delirium incidence, time to extubation
sources: pubmed,europe_pmc
limit: 30
output:
ranking: qualityThis expands automatically into:
pico -> search_p
-> search_i
-> search_c # Only appears when C is provided
-> merged -> enriched
template: comprehensive
template_params:
query: CRISPR gene therapy clinical trials
sources: pubmed,openalex,europe_pmc
limit: 30
min_year: 2020
output:
ranking: qualityThe correct field today is query, not topic.
This runs expand first, then launches the original query and expanded query in parallel, and finally performs merge + metrics.
template: exploration
params:
pmid: "37076210"
limit: 25
output:
ranking: impactThis pulls related, citing, and references from the same seed paper.
template: gene_drug
template_params:
term: BRCA1 targeted therapy PARP inhibitors
sources: pubmed,openalex
limit: 20
min_year: 2020
output:
ranking: recencyThe correct field today is term, not topic.
You can directly inspect these examples:
data/pipeline_examples/pico_remimazolam_vs_propofol.yamldata/pipeline_examples/comprehensive_crispr_therapy.yamldata/pipeline_examples/exploration_seed_paper.yamldata/pipeline_examples/gene_drug_brca1.yaml
When templates are not enough, define steps directly.
name: ai_anesthesiology_scan
steps:
- id: expand
action: expand
params:
topic: artificial intelligence anesthesiology
- id: search_original
action: search
params:
query: artificial intelligence anesthesiology
sources: pubmed,openalex
limit: 60
min_year: 2020
- id: search_mesh
action: search
inputs: [expand]
params:
strategy: mesh
sources: pubmed,europe_pmc
limit: 60
min_year: 2020
- id: merged
action: merge
inputs: [search_original, search_mesh]
params:
method: rrf
- id: enriched
action: metrics
inputs: [merged]
- id: filtered
action: filter
inputs: [enriched]
params:
min_year: 2021
has_abstract: true
output:
format: markdown
limit: 30
ranking: quality| Field | Required? | Meaning |
|---|---|---|
id |
Recommended | It will be auto-fixed if missing, but you should name it yourself |
action |
Required | Only a fixed action set is currently accepted |
params |
Depends on action | Each action expects different parameters |
inputs |
Depends on action | Can only reference steps defined earlier |
on_error |
Optional | skip or abort, default is skip |
| Action | Common params | Meaning |
|---|---|---|
search |
query, sources, limit, min_year, max_year |
General literature search |
pico |
P, I, C, O |
Build PICO elements and a combined query |
expand |
topic |
Perform semantic expansion and MeSH strategy generation |
details |
pmids |
Fetch detailed article metadata |
related |
pmid, limit |
Find related articles |
citing |
pmid, limit |
Find citing articles |
references |
pmid, limit |
Find references |
metrics |
none | Add iCite metrics |
merge |
method=union / intersection / rrf |
Merge multiple result streams |
filter |
min_year, max_year, article_types, min_citations, has_abstract |
Post-processing filters with diagnostics |
Use globals for step parameter defaults and variables for ${name} placeholders. Step-level params override globals.
name: reusable_remi_pipeline
globals:
sources: pubmed,europe_pmc
limit: ${per_step_limit}
min_year: ${start_year}
variables:
topic: remimazolam ICU sedation
per_step_limit: 50
start_year: 2020
steps:
- id: search_topic
action: search
params:
query: ${topic}
- id: filtered
action: filter
inputs: [search_topic]
params:
article_types: [RCT, systematic review]
has_abstract: true
output:
limit: 20
ranking: qualityarticle_types accepts canonical values such as randomized-controlled-trial and common aliases such as RCT, randomized controlled trial, systematic review, and meta analysis. Unknown article type requests fail closed with a warning instead of silently disabling the filter. The filter report shows before/after counts, exclusion reasons, mappings, and examples of excluded articles.
search does not always need its own query. It can derive the query from an upstream step:
- When upstream is
pico, you can useelement: P|I|C|O - When upstream is
pico, you can also useuse_combined: precision|recall|intervention_outcome|comparison_outcome - When upstream is
expand, you can usestrategy: meshor another strategy name
Use dry_run=True before long pipelines or while editing variables:
unified_search(query="", pipeline="<yaml>", dry_run=True)Use stop_at to inspect an intermediate result set:
unified_search(query="", pipeline="<yaml>", stop_at="merged")stop_at is inclusive: the named step runs, downstream steps are skipped. This is useful when you want to inspect a PICO merge before adding filters or metrics.
For a longer multi-step example, see:
data/pipeline_examples/ai_in_anesthesiology.yaml
manage_pipeline is the recommended facade today. Legacy tools still exist, but new tutorials should prefer the facade.
manage_pipeline()
manage_pipeline(action="list")
manage_pipeline(action="list", tag="sedation")
manage_pipeline(action="list", scope="workspace")manage_pipeline(
action="save",
name="weekly_remimazolam",
config="""
template: comprehensive
template_params:
query: remimazolam ICU sedation
sources: pubmed,openalex,europe_pmc
limit: 30
""",
tags="sedation,icu",
description="Weekly remimazolam surveillance",
scope="workspace",
)config must parse to a YAML/JSON mapping, not a list or scalar. If a client has trouble quoting multi-line YAML through manage_pipeline(action="save"), call save_pipeline(name=..., config=...) with the same YAML string; both tools use the same validator.
scope behavior:
workspace: saved under.pubmed-search/pipelines/inside the projectglobal: saved under the user data directory~/.pubmed-search-mcp/pipelines/auto: save to workspace when available, otherwise global
manage_pipeline(action="load", source="weekly_remimazolam")
manage_pipeline(action="load", source="saved:weekly_remimazolam")
manage_pipeline(action="load", source="file:data/pipeline_examples/pico_remimazolam_vs_propofol.yaml")load_pipeline and manage_pipeline(load) currently support:
- saved names
saved:<name>file:path/to/pipeline.yaml
Direct URL loading is not currently part of the supported contract.
manage_pipeline(action="delete", name="weekly_remimazolam")manage_pipeline(action="history", name="weekly_remimazolam", limit=10)manage_pipeline(
action="schedule",
name="weekly_remimazolam",
cron="0 9 * * 1",
diff_mode=True,
notify=True,
)| Facade | Legacy tool |
|---|---|
manage_pipeline(action="save", ...) |
save_pipeline(...) |
manage_pipeline(action="list", ...) |
list_pipelines(...) |
manage_pipeline(action="load", ...) |
load_pipeline(...) |
manage_pipeline(action="delete", ...) |
delete_pipeline(...) |
manage_pipeline(action="history", ...) |
get_pipeline_history(...) |
manage_pipeline(action="schedule", ...) |
schedule_pipeline(...) |
- Save the pipeline first
- Use
unified_search(query="", pipeline="saved:<name>")for manual execution - Use
schedule_pipeline(...)ormanage_pipeline(action="schedule", ...)for recurring runs - Use
get_pipeline_history(name="...")or facadehistoryto inspect run history
schedule_pipeline(name="weekly_remimazolam", cron="0 9 * * 1")
schedule_pipeline(name="monthly_crispr_review", cron="0 8 1 * *")
schedule_pipeline(name="watch_icu_sedation", cron="0 */6 * * *")Cron format is the standard 5-field form:
minute hour day month weekday
To remove a schedule:
schedule_pipeline(name="weekly_remimazolam", cron="")get_pipeline_history(name="weekly_remimazolam", limit=5)
manage_pipeline(action="history", name="weekly_remimazolam", limit=5)History shows:
- execution time
- total article count
- how many articles were added compared with the previous run
- how many were removed
- success or failure status
- There is no standalone
list_schedules()MCP tool yet - If you want stable history and diffs, prefer saved pipelines over one-off inline pipelines
Auto-fix currently happens mainly during schema parsing and semantic validation. In practice, the system first repairs data shape and then repairs meaning when possible.
| Problem | Input | Auto-fixed result |
|---|---|---|
| action alias | find |
search |
| action typo | searc |
search |
| template alias | clinical |
pico |
| template typo | comprehensiv |
comprehensive |
| single-string inputs | inputs: s1 |
inputs: [s1] |
| non-dict params | params: "oops" |
params: {} |
| missing step id | id: "" |
auto-filled as step_1 and similar |
| duplicate step id | search, search |
second one becomes search_2 |
| reference to missing step | inputs: [missing] |
that reference is removed |
| reference to future step | inputs: [later_step] |
that reference is removed |
invalid on_error |
retry |
skip |
| invalid output format | xml |
markdown |
| mistyped output ranking | impac |
impact |
| invalid output limit | 0 or negative |
20 |
output.format: json is valid and is no longer auto-fixed to Markdown.
| Problem | Why it fails |
|---|---|
| template name is completely unrecognizable | no alias or fuzzy match applies |
| action name is completely unrecognizable | no alias or fuzzy match applies |
| template is missing required parameters | for example, pico without P or I |
there are no steps and no template |
nothing executable remains |
| more than 20 steps | exceeds the system limit |
template: clinical
template_params:
P: ICU patients
I: remimazolam
output:
format: xml
limit: 0
ranking: impacThe system currently auto-fixes it to the equivalent of:
template: pico
template_params:
P: ICU patients
I: remimazolam
output:
format: markdown
limit: 20
ranking: impact- If you want auto-fix, history, and scheduling, save first and run second.
- Use inline template pipelines only for small parameter sets. For review and versioning, save YAML files.
- Start custom DAGs from the smallest runnable graph, then add
merge,metrics, andfilterincrementally. - Use
scope="workspace"when the pipeline should be shared within a team or repo. - Use
scope="global"when you only want your own reusable search habits across projects. - Keep Zotero Keeper integration outside PubMed MCP core. PubMed MCP should produce RIS/CSL/JSON/wiki notes; Zotero Keeper or another external client should handle Zotero import, duplicate policy, and library-specific behavior.