Skip to content

Commit 5937469

Browse files
author
Admin
committed
✨ 新特性与重点升级 (Features)
多模态图片识别能力上线 新增 看报告 与 拍照问药 两类图片输入能力。用户可在聊天输入栏通过 + 上传检查报告、药盒或药品追溯码图片,系统会自动识别 图片内容并进入对应 AI Agent 流程。 图片报告解读 支持上传检查/检验报告图片,由火山多模态大模型提取报告类型、检查项目、数值、单位、参考范围、异常标记和报告日期,并自动 进入 报告解读模式,复用现有 report_agent 与 lab_interpreter 工具生成专业解读。 拍照问药与追溯码识别 支持上传药盒图片或追溯码图片,由多模态模型识别药品名称、通用名、规格、厂家、批准文号、有效期、用法用量、处方药属性等信 息,并自动进入 药管家模式,由 pharmacy_agent 生成用途、禁忌、注意事项和用药建议。
1 parent b0a6019 commit 5937469

14 files changed

Lines changed: 1257 additions & 248 deletions

File tree

AGENTS.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Repository Guidelines
2+
3+
## Project Structure & Module Organization
4+
5+
This is a Smart Health Assistant with a Python FastAPI/LangGraph backend and React/Vite frontend. Backend lives in `backend/`: `main.py` is the API/SSE entrypoint, `agents/` contains the router and specialist agents, `skills/` contains auto-discovered tools, and `rag/` contains retrieval code plus knowledge documents in `rag/documents/`. Backend `test_*.py` files sit at its root.
6+
7+
Frontend lives in `frontend/src/`: `screens/` are app views, `components/` contains domain UI, `store/` holds shared context, `services/` contains API clients, and `types/` defines shared contracts. Assets are in `frontend/public/` and `frontend/src/assets/`. Docs are under `docs/`, `backend/docs/`, and `frontend/docs/`.
8+
9+
## Build, Test, and Development Commands
10+
11+
- `cd backend && uv sync`: install Python 3.13 dependencies.
12+
- `cd backend && uv run python -m rag.ingest`: build the local RAG index; add `--rebuild` after document edits.
13+
- `cd backend && uv run uvicorn main:app --reload --port 8000`: start the API.
14+
- `cd backend && python -m pytest test_*.py`: run pytest-compatible backend tests.
15+
- `cd frontend && npm install`: install dependencies.
16+
- `cd frontend && npm run dev`: start Vite on `http://localhost:5173`.
17+
- `cd frontend && npm run build`: type-check and build assets.
18+
- `cd frontend && npm run lint`: run ESLint.
19+
- `cd frontend && node test_pw.cjs`: run the Playwright smoke check.
20+
21+
## Coding Style & Naming Conventions
22+
23+
Use 4-space indentation and type hints for Python. Keep agent modules named by role, such as `clinic.py` or `pharmacy.py`. Add skills as `backend/skills/<skill_name>/SKILL.md`, `skill.py`, and `__init__.py`. Use Pydantic models for schemas.
24+
25+
Use TypeScript, React function components, and PascalCase filenames such as `ChatCardRenderer.tsx`. Keep shared types in `frontend/src/types/index.ts`, domain UI in matching component folders, and Tailwind/CSS aligned with the mobile-first design.
26+
27+
## Testing Guidelines
28+
29+
Backend tests may require `.env` values such as `ARK_API_KEY`. Prefer pytest-compatible `test_*.py` files for deterministic logic, and keep manual async checks executable with `python test_name.py`. For frontend changes, run `npm run lint` and `npm run build`; add Playwright checks for UI flow changes.
30+
31+
## Commit & Pull Request Guidelines
32+
33+
Recent history uses prefixes like `docs:`, `refactor:`, and `fix`, with occasional Chinese summaries. Keep commits focused and name the area. Pull requests should include the change, commands run, linked issues, and screenshots or recordings for visible mobile UI changes.
34+
35+
## Security & Configuration Tips
36+
37+
Do not commit `.env`, API keys, generated vector stores such as `backend/rag/chroma_db/`, virtual environments, or frontend build output. Keep CORS and service URLs aligned with local dev ports.

backend/agents/vision.py

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
"""
2+
Vision input adapter for report interpretation and pharmacy photo flows.
3+
4+
This module keeps image recognition separate from medical/pharmacy reasoning:
5+
the Volcengine multimodal model extracts visible facts, then the existing
6+
report/pharmacy agents interpret those facts.
7+
"""
8+
from __future__ import annotations
9+
10+
import base64
11+
import os
12+
import re
13+
from typing import Literal
14+
15+
16+
VisionScanType = Literal["report", "drug_box", "trace_code"]
17+
18+
ALLOWED_IMAGE_TYPES = {"image/jpeg", "image/png", "image/webp"}
19+
MAX_IMAGE_BYTES = 8 * 1024 * 1024
20+
21+
SCAN_TYPE_TO_AGENT = {
22+
"report": "report_agent",
23+
"drug_box": "pharmacy_agent",
24+
"trace_code": "pharmacy_agent",
25+
}
26+
27+
_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
28+
29+
30+
class VisionInputError(ValueError):
31+
"""Raised when an uploaded image or scan type is invalid."""
32+
33+
34+
def normalize_scan_type(scan_type: str) -> VisionScanType:
35+
normalized = scan_type.strip().lower()
36+
if normalized in SCAN_TYPE_TO_AGENT:
37+
return normalized # type: ignore[return-value]
38+
raise VisionInputError("scan_type must be one of: report, drug_box, trace_code")
39+
40+
41+
def validate_image_upload(content_type: str | None, size_bytes: int) -> None:
42+
if content_type not in ALLOWED_IMAGE_TYPES:
43+
raise VisionInputError("Only JPEG, PNG, and WebP images are supported")
44+
if size_bytes <= 0:
45+
raise VisionInputError("Uploaded image is empty")
46+
if size_bytes > MAX_IMAGE_BYTES:
47+
raise VisionInputError("Uploaded image must be 8 MB or smaller")
48+
49+
50+
def build_vision_prompt(scan_type: VisionScanType) -> str:
51+
if scan_type == "report":
52+
return (
53+
"请识别这张检查/检验报告图片中的医学信息。\n"
54+
"只提取报告类型、检查项目、数值、单位、参考范围、异常标记、报告日期。\n"
55+
"不要提取或输出姓名、身份证号、手机号、住址、就诊卡号、病历号、条形码号等个人身份信息。\n"
56+
"如果这些信息出现在图片中,请统一写为「已隐藏」。\n"
57+
"如果字段看不清,请写「无法确认」。\n"
58+
"不要诊断疾病,不要给治疗方案,只输出可用于后续报告解读的结构化内容。"
59+
)
60+
61+
if scan_type == "trace_code":
62+
return (
63+
"请识别这张药品追溯码或药品包装图片中的可见信息。\n"
64+
"提取药品名称、通用名、规格、生产厂家、批准文号、有效期、批号、追溯码可见内容。\n"
65+
"如果字段看不清,请写「无法确认」。\n"
66+
"不要判断真伪,不要编造监管查询结果,只输出图片中可确认的信息。"
67+
)
68+
69+
return (
70+
"请识别这张药品包装图片中的药品信息。\n"
71+
"提取药品名称、通用名、规格、生产厂家、批准文号、有效期、用法用量、是否处方药。\n"
72+
"如果字段看不清,请写「无法确认」。\n"
73+
"不要编造说明书内容,不要给超出图片和药品知识的结论。"
74+
)
75+
76+
77+
_LABEL_PATTERN = re.compile(
78+
r"(姓名|身份证号?|手机号|电话|住址|地址|就诊卡号|病历号|条形码号?|患者ID|门诊号|住院号)"
79+
r"\s*[::]\s*[^\n,,;;]+"
80+
)
81+
_PHONE_PATTERN = re.compile(r"(?<!\d)1[3-9]\d{9}(?!\d)")
82+
_ID_PATTERN = re.compile(r"(?<![0-9A-Za-z])\d{6}(?:19|20)?\d{2}(?:0[1-9]|1[0-2])(?:0[1-9]|[12]\d|3[01])\d{3}[\dXx](?![0-9A-Za-z])")
83+
84+
85+
def redact_sensitive_text(text: str) -> str:
86+
"""Best-effort redaction for common PII that a vision model may return."""
87+
88+
def replace_label(match: re.Match[str]) -> str:
89+
label = match.group(1)
90+
return f"{label}:已隐藏"
91+
92+
redacted = _LABEL_PATTERN.sub(replace_label, text)
93+
redacted = _PHONE_PATTERN.sub("已隐藏手机号", redacted)
94+
redacted = _ID_PATTERN.sub("已隐藏身份证号", redacted)
95+
return redacted
96+
97+
98+
def compose_agent_message(scan_type: VisionScanType, vision_text: str) -> str:
99+
safe_text = redact_sensitive_text(vision_text).strip() or "无法确认"
100+
if scan_type == "report":
101+
return (
102+
"用户上传了一张检验报告图片,个人身份信息已隐藏。火山视觉模型识别结果:\n"
103+
f"{safe_text}\n\n"
104+
"请基于以上内容进行报告解读。"
105+
)
106+
107+
if scan_type == "trace_code":
108+
return (
109+
"用户上传了一张药品追溯码或药品包装图片。火山视觉模型识别结果:\n"
110+
f"{safe_text}\n\n"
111+
"请说明可识别出的药品信息、用药注意事项,并提醒用户真伪需以正规追溯平台查询为准。"
112+
)
113+
114+
return (
115+
"用户上传了一张药盒图片。火山视觉模型识别结果:\n"
116+
f"{safe_text}\n\n"
117+
"请说明这个药的用途、用法用量、禁忌、注意事项,以及是否适合当前用户。"
118+
)
119+
120+
121+
def image_to_data_url(image_bytes: bytes, content_type: str) -> str:
122+
encoded = base64.b64encode(image_bytes).decode("ascii")
123+
return f"data:{content_type};base64,{encoded}"
124+
125+
126+
def _ark_api_key() -> str:
127+
return os.environ.get("ARK_API_KEY", "")
128+
129+
130+
def _vision_model_id() -> str:
131+
return os.environ.get("ARK_VISION_MODEL_ID") or os.environ.get("ARK_MODEL_ID", "doubao-seed-1-6-flash-250828")
132+
133+
134+
async def recognize_image(image_bytes: bytes, content_type: str, scan_type: VisionScanType) -> str:
135+
"""
136+
Call Volcengine ARK multimodal model and return redacted visible facts.
137+
138+
The returned content is user-provided context for downstream agents, never a
139+
system prompt.
140+
"""
141+
validate_image_upload(content_type, len(image_bytes))
142+
143+
api_key = _ark_api_key()
144+
if not api_key:
145+
raise VisionInputError("ARK_API_KEY is not configured")
146+
147+
from openai import AsyncOpenAI
148+
149+
client = AsyncOpenAI(api_key=api_key, base_url=_BASE_URL)
150+
response = await client.chat.completions.create(
151+
model=_vision_model_id(),
152+
temperature=0,
153+
messages=[
154+
{
155+
"role": "user",
156+
"content": [
157+
{"type": "text", "text": build_vision_prompt(scan_type)},
158+
{
159+
"type": "image_url",
160+
"image_url": {"url": image_to_data_url(image_bytes, content_type)},
161+
},
162+
],
163+
}
164+
],
165+
extra_body={"thinking": {"type": "disabled"}},
166+
)
167+
168+
content = response.choices[0].message.content if response.choices else ""
169+
if isinstance(content, list):
170+
content = "\n".join(str(part) for part in content)
171+
return redact_sensitive_text(str(content))

0 commit comments

Comments
 (0)