Commit 64c3ef7
committed
fix(audit): fold curly apostrophe in citation name matching
Citation auditor flagged `de Chaisemartin & D'Haultfœuille` (DOI
10.1162/rest_a_01414) as missing co-author because Crossref returns
"Xavier D’Haultfœuille" with curly apostrophe U+2019 while the
docstring uses straight U+0027. The previous _normalise stripped
curly U+2019 as punctuation (only U+0027 was preserved by
[^\w\s'-]), so the surname tokenised differently on each side and
the name fold missed.
Add an APOSTROPHE_FOLD table that maps typographic apostrophe /
hyphen variants (U+2019, U+2018, U+02BC, U+2032, U+00B4, U+2010,
U+2011, U+2013) to their ASCII forms before normalisation. After
the fold, fresh strict audit reports 477 ok / 0 mismatch / 0
unresolved (was 475 ok / 2 mismatch).1 parent f89a396 commit 64c3ef7
1 file changed
Lines changed: 19 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
265 | 280 | | |
266 | 281 | | |
267 | 282 | | |
| |||
274 | 289 | | |
275 | 290 | | |
276 | 291 | | |
277 | | - | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
278 | 295 | | |
| 296 | + | |
279 | 297 | | |
280 | 298 | | |
281 | 299 | | |
| |||
0 commit comments