Commit 1ca543d
feat: adversarial hardening — expand invisible chars, homoglyphs, and NFD bypass defense (#7)
* ci: add Grippy code review workflow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: install grippy-code-review from GitHub repo
Package is not yet published to PyPI, so install directly from the
Project-Navi/grippy-code-review repository.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address code review feedback on Grippy workflow
- Add concurrency group to cancel superseded runs
- Skip job on fork PRs (secrets unavailable)
- Set persist-credentials: false on checkout
- Pin grippy-code-review to commit SHA
- Use python -I to prevent module shadowing
- Fix action version comments to match repo convention
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: strip C0/C1 control characters (terminal injection defense)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: strip 19 high-confidence invisible chars (math, deprecated, braille, hangul, mongolian)
Add to the stripping pipeline:
- U+2061-U+2064: invisible math operators (function application, times, separator, plus)
- U+206A-U+206F: deprecated Unicode format controls
- U+2800: braille pattern blank
- U+1680: Ogham space mark
- U+115F-U+1160: Hangul Choseong/Jungseong fillers
- U+3164, U+FFA0: Hangul fillers (pre-NFKC forms that normalize to U+1160)
- U+180B-U+180D, U+180F: Mongolian free variation selectors
- U+061C: Arabic letter mark
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: NFD decompose before homoglyph scan to defeat NFKC composition bypass
Combining marks could hide mapped homoglyph base characters when NFKC
composed them into precomposed forms not in the map (e.g., Cyrillic
U+0430 + breve -> U+04D1). Now _replace_homoglyphs decomposes to NFD
first to expose base characters, then recomposes to NFC to maintain
idempotency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: expand homoglyph map — 12 new pairs (Greek lowercase, Cyrillic extended, dotless i)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update counts — invisible chars 411→492, homoglyphs 54→66, tests 309→357
Update all documentation to reflect adversarial hardening (Tasks 1-5):
- README.md: comparison table and key differences
- CLAUDE.md: pipeline description and data file descriptions
- whitepaper: abstract, pipeline table, curation section, verification,
comparison, limitations, and commit hash on title page
- _invisible.py: module docstring now lists all 8 categories
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add .gitkeep to docs/plans for future collaboration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version to 0.2.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add CHANGELOG entry for 0.2.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add persona-targeted example scripts (LLM, FastAPI, log sanitizer)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: restructure README — add invisible attack demo, reorder scenarios, link examples
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address audit findings — docs accuracy and packaging
- Fix decode_evasion intermediate comment ("../etc/passwd" → "../../etc/passwd")
- Qualify "never errors" claim with TypeError exception note
- Fix warning log format (remove extra space after logger name)
- Add coverage.xml to .gitignore
- Add Changelog/Documentation URLs to pyproject.toml
- Fix stage comment numbering (Stage 5→6 for escaper, add Stage 5 for re-NFKC)
- Update stale homoglyph counts in test comments (42→66)
- Fix tag block example to encode full "ignore previous instructions"
- Fix examples/README.md dependency comment
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review feedback — deterministic regex, docstring, example
- Sort set joins in regex construction for deterministic pattern order
- Fix docstring: "ASCII equivalents" → "Latin equivalents"
- Use `prompt` variable in llm_pipeline.py example (print final prompt)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add regex coverage regression test and Unicode version assertion
- Verify INVISIBLE_RE matches all 492 intended codepoints (guards against
silent regex regressions from merge conflicts or range edits)
- Verify regex does not false-positive on printable ASCII, TAB, LF, CR, NUL
- Assert Unicode version >= 15 to catch normalization behavior changes
- 361 tests total (up from 357)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: final docs/code audit — fix counts, comments, and test accuracy
Whitepaper:
- Test count 357→361, "eight categories"→"nine categories" (4 locations)
Source code:
- _pipeline.py: module docstring "Five stages"→"Six stages", expand
Stage 2 and Stage 4 descriptions to match actual coverage
- _invisible.py: fix Tag block comment (U+E0001 is LANGUAGE TAG, not
U+E0000), fix "no logic" claim, fix Mongolian FVS "identical"→"analogous"
- path_escaper: document backslash normalization in docstring
Tests:
- Fix stage numbering: "stage_5"→"stage_6" for escaper (2 locations)
- Fix stale "54-pair"→"66-pair" in test_adversarial_homoglyphs.py
- Update "Gaps" class docstrings — chars now covered by map
- Fix "all six zero-width"→"all eight" and add U+200E/U+200F to test strings
- Fix Tag block comment U+E0001→U+E0000 range start
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 664f3ad commit 1ca543d
23 files changed
Lines changed: 737 additions & 64 deletions
File tree
- docs
- plans
- whitepaper
- examples
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
8 | 34 | | |
9 | 35 | | |
10 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
82 | | - | |
| 81 | + | |
| 82 | + | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
23 | 31 | | |
24 | 32 | | |
25 | 33 | | |
| |||
32 | 40 | | |
33 | 41 | | |
34 | 42 | | |
35 | | - | |
| 43 | + | |
36 | 44 | | |
37 | 45 | | |
38 | 46 | | |
39 | | - | |
| 47 | + | |
40 | 48 | | |
41 | 49 | | |
42 | 50 | | |
| |||
45 | 53 | | |
46 | 54 | | |
47 | 55 | | |
48 | | - | |
49 | | - | |
| 56 | + | |
| 57 | + | |
50 | 58 | | |
51 | 59 | | |
52 | 60 | | |
| |||
55 | 63 | | |
56 | 64 | | |
57 | 65 | | |
58 | | - | |
| 66 | + | |
59 | 67 | | |
60 | 68 | | |
61 | 69 | | |
| |||
128 | 136 | | |
129 | 137 | | |
130 | 138 | | |
| 139 | + | |
| 140 | + | |
131 | 141 | | |
132 | 142 | | |
133 | 143 | | |
| |||
154 | 164 | | |
155 | 165 | | |
156 | 166 | | |
157 | | - | |
| 167 | + | |
158 | 168 | | |
159 | 169 | | |
160 | 170 | | |
| |||
183 | 193 | | |
184 | 194 | | |
185 | 195 | | |
186 | | - | |
| 196 | + | |
187 | 197 | | |
188 | 198 | | |
189 | 199 | | |
190 | 200 | | |
191 | 201 | | |
192 | 202 | | |
193 | | - | |
| 203 | + | |
194 | 204 | | |
195 | 205 | | |
196 | 206 | | |
| |||
Whitespace-only changes.
0 commit comments