feat: update skill-auditor scoring rubric, evaluate script, and SKILL metadata

GiaoLee · claude · GiaoLee · commit c74c7b779333 · 2026-05-18T16:59:35.000+08:00
- Revamp scoring_rubric.md with expanded criteria and section weights
- Enhance evaluate_skill.py with improved evaluation logic
- Update skill-auditor SKILL.md with latest capabilities
- Refresh report_json_schema.md, scientific_veto.md, and academic writing evaluation references
- Minor metadata updates across multiple SKILL.md files in awesome-med-research-skills and scientific-skills

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/awesome-med-research-skills/Data Analysis/differential-expression-analysis/SKILL.md b/awesome-med-research-skills/Data Analysis/differential-expression-analysis/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: differential-expression-analysis
-description: Use when analyzing bulk RNA-seq or microarray expression data to identify differentially expressed genes between two biological groups (case vs control), with volcano plots and heatmap visualization. NOT for: single-cell RNA-seq, methylation analysis, non-expression data.
+description: Use when analyzing bulk RNA-seq or microarray expression data to identify differentially expressed genes between two biological groups (case vs control), with volcano plots and heatmap visualization. NOT for:single-cell RNA-seq, methylation analysis, non-expression data.
 license: MIT
 author: AIPOCH
 ---
diff --git a/awesome-med-research-skills/Evidence Insight/figure-first-paper-reader/SKILL.md b/awesome-med-research-skills/Evidence Insight/figure-first-paper-reader/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: figure-first-paper-reader
-description: Reads a paper figure by figure before re-integrating the full narrative, so the user can identify the core findings quickly and check whether each visual actually supports the authors' main claims. Always separate figure content, figure-linked claim, evidentiary strength, and unsupported interpretation. Never fabricate references, PMIDs, DOIs, figure content, panel labels, result values, or study details that were not actually provided.
+description: "Reads a paper figure by figure before re-integrating the full narrative, so the user can identify the core findings quickly and check whether each visual actually supports the authors' main claims. Always separate figure content, figure-linked claim, evidentiary strength, and unsupported interpretation. Never fabricate references, PMIDs, DOIs, figure content, panel labels, result values, or study details that were not actually provided."
 license: MIT
 author: AIPOCH
 ---
@@ -270,3 +270,4 @@ A strong output from this skill should:
 - and leave the user with a clear sense of whether the paper's story still holds after a figure-first audit.
 
 A weak output merely paraphrases captions, repeats the abstract, or praises figures without checking whether they actually support the claims.
+
diff --git a/awesome-med-research-skills/Evidence Insight/litbase/SKILL.md b/awesome-med-research-skills/Evidence Insight/litbase/SKILL.md
@@ -3,20 +3,6 @@ name: litbase
 description: "Academic paper reading and research development system for biomedical researchers. Finds papers via Semantic Scholar, reads with structured notes, tracks discussion insights, and synthesizes literature into a Research Foundation Document (RFD) for downstream protocol design skills. 8 commands: /setup /feed /read /discuss /recap /update /sync /propose"
 license: MIT
 author: AIPOCH
-metadata:
-  openclaw:
-    optional_bins:
-      - python      # optional: accelerates paper search if available; falls back to inline WebFetch calls
-      - pdftotext   # optional: accelerates PDF text extraction if available; falls back to Claude native PDF reading
-  capability_tiers:
-    tier_a: "Conversation mode — Web Claude or any LLM chat interface; no file system; notes output as Artifacts; state maintained via session card"
-    tier_b: "File mode — Manus or any file-capable agent; file read/write available; no Python/bash required"
-    tier_c: "Full mode — OpenClaw / Claude Code; full file system, bash, and optional Python acceleration"
-  downstream_skills:
-    - clinical-cohort-protocol-designer
-    - translational-study-blueprint
-    - statistical-analysis-plan-writer
-    - protocol-writer
 ---
 > **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
 
@@ -163,3 +149,5 @@ data_dir/                           ← path set in config.json
     proposal/
       YYYY-MM-DD_RFD.md
 ```
+
+
diff --git a/awesome-med-research-skills/Evidence Insight/medical-research-literature-reader-pro/SKILL.md b/awesome-med-research-skills/Evidence Insight/medical-research-literature-reader-pro/SKILL.md
@@ -1,9 +1,8 @@
 ---
 name: medical-research-literature-reader-pro
-description: A medical-research-native literature reading skill for users with clinical, bioinformatics, translational, and basic experimental backgrounds. Use this skill whenever a user wants to read, analyze, critique, or interpret a medical or scientific paper — whether they provide a PDF, abstract, DOI, PMID, or just a title. Triggers include requests like "analyze this paper", "critique this study", "is this a strong paper?", "give me similar studies", "prepare me for journal club", "help me understand this bioinformatics paper", "what are the weaknesses here?", or "turn this into a mind map". Also activate for any downstream deliverables such as journal club kits, comparison tables, PI decision briefs, replication starters, or follow-up experiment designs. Do NOT treat as a generic summarizer — this skill performs structured evidence-type classification, track-specific critical appraisal, interpretation-boundary judgment, and research-grade follow-up generation.
-version: 1.0.0 
-author: AIPOCH
+description: "A medical-research-native literature reading skill for users with clinical, bioinformatics, translational, and basic experimental backgrounds. Use this skill whenever a user wants to read, analyze, critique, or interpret a medical or scientific paper — whether they provide a PDF, abstract, DOI, PMID, or just a title. Triggers include requests like \\\"analyze this paper\\\", \\\"critique this study\\\", \\\"is this a strong paper?\\\", \\\"give me similar studies\\\", \\\"prepare me for journal club\\\", \\\"help me understand this bioinformatics paper\\\", \\\"what are the weaknesses here?\\\", or \\\"turn this into a mind map\\\". Also activate for any downstream deliverables such as journal club kits, comparison tables, PI decision briefs, replication starters, or follow-up experiment designs. Do NOT treat as a generic summarizer — this skill performs structured evidence-type classification, track-specific critical appraisal, interpretation-boundary judgment, and research-grade follow-up generation."
 license: MIT
+author: AIPOCH
 ---
 > **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
 
@@ -234,3 +233,5 @@ This skill is designed to connect with other skills in a research workflow:
 Close every Standard and Expert report with a brief offer of relevant next steps, for example:
 
 > I can also generate a same-type study comparison table, turn this paper into a journal club kit, design follow-up experiments based on the weakest link, or build a replication starter for the computational section. Just let me know.
+
+
diff --git a/scientific-skills/Academic Writing/paper-2-web/SKILL.md b/scientific-skills/Academic Writing/paper-2-web/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: paper-2-web
-description: Use this skill when converting academic papers to promotional and presentation formats, including interactive websites (Paper2Web), presentation videos (Paper2Video), and conference posters (Paper2Poster). This skill is suitable for paper dissemination, conference preparation, creating explorable academic homepages, generating video abstracts, or producing printable posters from LaTeX or PDF source.
+name: paper-web
+description: "Use this skill when converting academic papers to promotional and presentation formats, including interactive websites (Paper2Web), presentation videos (Paper2Video), and conference posters (Paper2Poster). This skill is suitable for paper dissemination, conference preparation, creating explorable academic homepages, generating video abstracts, or producing printable posters from LaTeX or PDF source."
 license: MIT
 author: AIPOCH
 ---
@@ -597,3 +597,4 @@ Result file: paper_2_web_result.md
 Validation summary: PASS/FAIL with brief notes
 Assumptions: explicit list if any
 ```
+
diff --git a/scientific-skills/Data Analysis/3d-molecule-ray-tracer/SKILL.md b/scientific-skills/Data Analysis/3d-molecule-ray-tracer/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: 3d-molecule-ray-tracer
-description: Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
+name: d-molecule-ray-tracer
+description: "Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX."
 license: MIT
 author: AIPOCH
 ---
@@ -253,7 +253,7 @@ To render:
 - **Current Stage**: Draft
 - **Next Review Date**: 2026-03-15
 - **Known Issues**: None
-- **Planned Improvements**: 
+- **Planned Improvements**:
   - Blender integration
   - AI-assisted composition suggestions
   - Real-time preview mode
@@ -318,3 +318,4 @@ Use the following fixed structure for non-trivial requests:
 7. Next Checks
 
 If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
+
diff --git a/scientific-skills/Data Analysis/diagnostic-study-quality-assessment-quadas-2/SKILL.md b/scientific-skills/Data Analysis/diagnostic-study-quality-assessment-quadas-2/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: diagnostic-study-quality-assessment-quadas-2
-description: Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2").
+name: diagnostic-study-quality-assessment-quadas
+description: "Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., \"Assess this paper using QUADAS-2\")."
 license: MIT
 author: AIPOCH
 ---
@@ -123,3 +123,4 @@ The script will automatically extract the text, which you can then copy and send
 ## References
 
 -   [QUADAS-2 Criteria](references/quadas_2_criteria.md): Detailed signaling questions and judgment guidelines.
+
diff --git a/scientific-skills/Data Analysis/meta-rob2-plot/SKILL.md b/scientific-skills/Data Analysis/meta-rob2-plot/SKILL.md
@@ -1,5 +1,5 @@
 ---
-name: meta-rob2-plot
+name: meta-rob-plot
 description: "Draw ROB2 risk-of-bias plots, including a Traffic Light Plot and a Summary Bar Plot. Input is a CSV file with ROB2 assessments for each study; output are two PNG plot files."
 license: MIT
 author: AIPOCH
@@ -282,3 +282,4 @@ Assumptions: explicit list if any
 - Confirm the supported execution path completed without unresolved errors.
 - Confirm the final deliverable matches the documented format exactly.
 - Confirm assumptions, limitations, and warnings are surfaced explicitly.
+
diff --git a/scientific-skills/Data Analysis/neurokit2/SKILL.md b/scientific-skills/Data Analysis/neurokit2/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: neurokit2
-description: Comprehensive biosignal processing for ECG/PPG/EEG/EDA/RSP/EMG/EOG; use when you need to clean, segment, and extract physiological features for HRV, event-related responses, complexity metrics, or multimodal psychophysiology pipelines.
+name: neurokit
+description: "Comprehensive biosignal processing for ECG/PPG/EEG/EDA/RSP/EMG/EOG; use when you need to clean, segment, and extract physiological features for HRV, event-related responses, complexity metrics, or multimodal psychophysiology pipelines."
 license: MIT
 author: AIPOCH
 ---
@@ -117,9 +117,9 @@ print("Grand average shape:", grand_average.shape)
 ### Processing pipelines (typical pattern)
 Most modalities follow a consistent structure:
 
-1. `*_process(signal, sampling_rate=...)`  
+1. `*_process(signal, sampling_rate=...)`
    Produces a cleaned signal plus intermediate channels (e.g., peaks, phases) and an `info` dict with indices/metadata.
-2. `*_analyze(processed_signals, sampling_rate=...)`  
+2. `*_analyze(processed_signals, sampling_rate=...)`
    Computes summary features and automatically selects an analysis mode based on recording length.
 
 Examples:
@@ -172,4 +172,4 @@ Example:
 indices = nk.complexity(x, sampling_rate=1000)
 apen = nk.entropy_approximate(x)
 dfa = nk.fractal_dfa(x)
-```
+```
diff --git a/scientific-skills/Data Analysis/protocol-deviation-classifier/SKILL.md b/scientific-skills/Data Analysis/protocol-deviation-classifier/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: protocol-deviation-classifier
-description: Determine whether an incident in a clinical trial is a "major deviation.
+description: "Determine whether an incident in a clinical trial is a \"major deviation."
 license: MIT
 author: AIPOCH
 ---
@@ -319,7 +319,7 @@ pip install -r requirements.txt
 - **Current Stage**: Draft
 - **Next Review Date**: 2026-03-06
 - **Known Issues**: None
-- **Planned Improvements**: 
+- **Planned Improvements**:
   - Performance optimization
   - Additional feature support
 
@@ -381,3 +381,4 @@ If the request is simple, you may compress the structure, but still keep assumpt
 - Do not fabricate results, metrics, citations, or downstream conclusions.
 - Use safe fallback behavior when dependencies, credentials, or required inputs are missing.
 - Surface any execution failure with a concise diagnosis and recovery path.
+
diff --git a/scientific-skills/Data Analysis/pydeseq2/SKILL.md b/scientific-skills/Data Analysis/pydeseq2/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: pydeseq2
-description: Differential gene expression analysis for bulk RNA-seq count matrices using a DESeq2-like workflow in Python; use when you need Wald tests, FDR correction, and optional LFC shrinkage for condition/batch/covariate designs.
+name: pydeseq
+description: "Differential gene expression analysis for bulk RNA-seq count matrices using a DESeq2-like workflow in Python; use when you need Wald tests, FDR correction, and optional LFC shrinkage for condition/batch/covariate designs."
 license: MIT
 author: AIPOCH
 ---
@@ -215,4 +215,4 @@ The fitting pipeline typically includes:
 If your repository includes them, use:
 - `references/api_reference.md` for parameter/object details.
 - `references/workflow_guide.md` for extended workflows and troubleshooting.
-- `scripts/run_deseq2_analysis.py` for a CLI-style batch workflow (counts/metadata/design/contrast/output, optional plots).
+- `scripts/run_deseq2_analysis.py` for a CLI-style batch workflow (counts/metadata/design/contrast/output, optional plots).
diff --git a/scientific-skills/Data Analysis/rct-bias-assessment-rob2/SKILL.md b/scientific-skills/Data Analysis/rct-bias-assessment-rob2/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: rct-bias-assessment-rob2
-description: Automates Risk of Bias 2 (ROB2) assessment for RCT papers by analyzing text against specific domains and synthesizing a report. Use when you need to assess the quality of a clinical trial paper or evaluate risk of bias.
+name: rct-bias-assessment-rob
+description: "Automates Risk of Bias 2 (ROB2) assessment for RCT papers by analyzing text against specific domains and synthesizing a report. Use when you need to assess the quality of a clinical trial paper or evaluate risk of bias."
 license: MIT
 author: AIPOCH
 ---
@@ -129,3 +129,4 @@ from scripts.assess_rob2 import clean_text
 # usage
 cleaned_json = clean_text(llm_output)
 ```
+
diff --git a/scientific-skills/Data Analysis/table-1-generator/SKILL.md b/scientific-skills/Data Analysis/table-1-generator/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: table-1-generator
-description: Automated generation of baseline characteristics tables (Table 1) for clinical research papers.
+name: table-generator
+description: "Automated generation of baseline characteristics tables (Table 1) for clinical research papers."
 license: MIT
 author: AIPOCH
 ---
@@ -161,7 +161,7 @@ pip install -r requirements.txt
 - **Current Stage**: Draft
 - **Next Review Date**: 2026-03-06
 - **Known Issues**: None
-- **Planned Improvements**: 
+- **Planned Improvements**:
   - Performance optimization
   - Additional feature support
 
@@ -204,3 +204,4 @@ Use the following fixed structure for non-trivial requests:
 7. Next Checks
 
 If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
+
diff --git a/scientific-skills/Other/co2-tank-monitor/SKILL.md b/scientific-skills/Other/co2-tank-monitor/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: co2-tank-monitor
-description: IoT monitoring simulation to predict CO2 tank depletion and prevent weekend gas outages in cell culture facilities. Monitors cylinder pressure, calculates consumption rates, provides early warnings, and supports automated scheduling via cron.
+name: co-tank-monitor
+description: "IoT monitoring simulation to predict CO2 tank depletion and prevent weekend gas outages in cell culture facilities. Monitors cylinder pressure, calculates consumption rates, provides early warnings, and supports automated scheduling via cron."
 license: MIT
 author: AIPOCH
 ---
@@ -177,3 +177,4 @@ Every final response must make these explicit:
 
 - Cell Culture CO2 Guidelines: https://www.thermofisher.com/cellculture
 - Gas Cylinder Safety: https://www.osha.gov/gascylinders
+
diff --git a/scientific-skills/Other/iso-13485-certification/SKILL.md b/scientific-skills/Other/iso-13485-certification/SKILL.md
@@ -1,6 +1,6 @@
 ---
-name: iso-13485-certification
-description: A toolkit for preparing ISO 13485:2016 certification documentation for medical device QMS. Use when you need to perform a documentation gap analysis, draft or update a Quality Manual, create required procedures/work instructions, build Medical Device Files (MDF), interpret ISO 13485 clauses, or identify missing documents for certification (often triggered by ISO 13485, QMS certification, FDA QMSR, EU MDR, or quality system documentation requests).
+name: iso-certification
+description: "A toolkit for preparing ISO 13485:2016 certification documentation for medical device QMS. Use when you need to perform a documentation gap analysis, draft or update a Quality Manual, create required procedures/work instructions, build Medical Device Files (MDF), interpret ISO 13485 clauses, or identify missing documents for certification (often triggered by ISO 13485, QMS certification, FDA QMSR, EU MDR, or quality system documentation requests)."
 license: MIT
 author: AIPOCH
 ---
@@ -139,4 +139,4 @@ Use `references/mandatory-documents.md` as the source of truth for:
 - How to justify non-applicability
 - What evidence/records each procedure should produce
 
-For detailed clause interpretation, use `references/iso-13485-requirements.md`.
+For detailed clause interpretation, use `references/iso-13485-requirements.md`.
diff --git a/skill-auditor/SKILL.md b/skill-auditor/SKILL.md
diff --git a/skill-auditor/references/report_json_schema.md b/skill-auditor/references/report_json_schema.md
diff --git a/skill-auditor/references/scientific_veto.md b/skill-auditor/references/scientific_veto.md
diff --git a/skill-auditor/references/scoring_rubric.md b/skill-auditor/references/scoring_rubric.md
diff --git a/skill-auditor/references/specialized_evaluation_academic_writing.md b/skill-auditor/references/specialized_evaluation_academic_writing.md
diff --git a/skill-auditor/scripts/evaluate_skill.py b/skill-auditor/scripts/evaluate_skill.py