feat: add Python skill registry validation tooling#718
Conversation
Introduces opt-in quality scoring, security scanning, drift detection, and registry health reporting for the skill ecosystem. - security_scanner.py: 12 security patterns (curl|bash, RCE, hardcoded creds, etc.) - score_skills.py: 0-100 quality score across metadata, docs, and security - detect_drift.py: hash-based drift detection with persistent baseline - generate_registry_report.py: consolidated registry health report - 81 Python tests across 4 suites, all passing - skill-score.v1.schema.json: JSON Schema for score records - docs/contributors/skill-scoring.md: full contributor documentation - tools/SkillRegistry/: .NET 9 standalone tooling (score/scan/report CLI) - 9 new npm scripts (score:skills, security:scan, drift:check, registry:report, etc.)
- score_all_skills, scan_all_skills, and build_current_entries now use
rglob("SKILL.md") instead of iterdir(), discovering skills at any
nesting depth (e.g. skills/security/aws-iam-best-practices/)
- skill_id uses path relative to skills/ to avoid name collisions
- score_skills.py banner suppressed when --json to keep stdout valid JSON
Fixes review feedback from PR sickn33#702 (chatgpt-codex-connector P2 items).
Remove tools/SkillRegistry (.NET 9 CLI) — moving to a separate PR to keep blast radius small per reviewer guidance. Add explicit baseline/output ownership table to skill-scoring.md and list the three generated files (drift-baseline.json, registry-report.json, scores.json) in .gitignore so contributors cannot accidentally commit them. All scoring, security, and drift outputs remain strictly informational and non-blocking.
|
Hey, thanks for the detailed feedback — it genuinely helped me see the blast radius problem. Here's my reasoning: I work primarily with C#, so when I started designing the scoring and scanning logic, I prototyped it in .NET first — it's just the environment where I think most naturally The new branch (feat/skill-registry-python-tooling) contains only the Python scripts, their tests, the JSON schema, and updated docs. I also added the three generated output files to .gitignore and updated skill-scoring.md with an explicit baseline ownership table, since that was another gap you called out. Sorry for the noise on the first submission. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4683231a8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Allowlist: change HTML marker to prefix match so both bare
(<!-- security-allowlist -->) and colon forms
(<!-- security-allowlist: reason -->) suppress the line
- SEC002: expand regex to catch curl piped to sh and zsh, not only bash
- SEC006: narrow chmod regex to modes where the other-write bit is set
(last octal digit in {2,3,6,7}), eliminating false positives on safe
modes like 755 and 700
- Tests: add cases for curl|sh, curl|zsh, colon allowlist, world-writable
modes (777/722/0777), and safe modes (755/700/644/750/4755)
Summary
This PR introduces an opt-in AI Skill Registry validation layer that coexists with the existing marketplace architecture without breaking any existing skills or CI gates.
New tools (all informational — never blocking):
security_scanner.py— scans skill body for 12 dangerous command patterns (curl|bash,Invoke-Expression, hardcoded credentials, fork bombs, etc.) with allowlist support andoffensive-skill awareness
score_skills.py— computes a 0–100 quality score per skill across three weighted dimensions: metadata completeness (30%), documentation structure (40%), and security posture (30%)detect_drift.py— detects content drift via normalized SHA-256 hashes compared against a persistent baselinegenerate_registry_report.py— consolidates scoring, security, and drift into a single registry health reportTests: 81 Python tests across 4 suites, all passing locally.
Docs:
docs/contributors/skill-scoring.md— explains scoring dimensions, penalties, bonuses, CLI usage, and security patterns reference. Includes an explicit baseline ownership tableclarifying that
data/drift-baseline.json,data/registry-report.json, anddata/scores.jsonare generated files listed in.gitignoreand never committed in PRs — maintainer-owned onmain.Note: The .NET CLI (
tools/SkillRegistry/) present in the previous submission has been removed. It was an exploratory prototype used to draft the logic before porting to Python. It will besent as a separate design PR if there is interest.
Change Classification
Quality Bar Checklist ✅
docs/contributors/quality-bar.mdanddocs/contributors/security-guardrails.md.npm run validate).CATALOG.md,skills_index.json,data/*.json) in this PR.npm run validate:references— no broken references introduced.data/drift-baseline.json,data/registry-report.json,data/scores.json) are listed in.gitignoreand will never appear in contributor PRs.