Skip to content

Escape command name in _remove_command regex#123

Open
Nas01010101 wants to merge 1 commit into
google-research:mainfrom
Nas01010101:fix/remove-command-escape-regex
Open

Escape command name in _remove_command regex#123
Nas01010101 wants to merge 1 commit into
google-research:mainfrom
Nas01010101:fix/remove-command-escape-regex

Conversation

@Nas01010101

Copy link
Copy Markdown

Problem

_remove_command interpolates the user-supplied command directly into a regex base_pattern:

base_pattern = (
    r'\\'
    + command
    + r'(?:\[(?:.*?)\])*\{((?:[^{}]+|\{(?1)\})*)\}(?:\[(?:.*?)\])*'
)

So a command name containing regex metacharacters is matched as a pattern rather than literally. For example, with commands_to_delete=['todo.']:

  • . matches any character, so \todoX{...} is wrongly removed alongside \todo.{...} (over-removal).
  • Quantifiers like */+ in a name make the pattern match the wrong span (or fail to match), so the targeted command may not be removed at all (under-removal).

Fix

Wrap the name with regex.escape() so it is matched literally:

    + regex.escape(command)

This is consistent with how the rest of this module already builds patterns from
dynamic strings (e.g. the filename handling in _replace_includegraphics, which
uses regex.escape in several places).

commands_to_delete / commands_only_to_delete are documented as literal command
names (CLI --help and cleaner_config.yaml), so literal matching is the intended
behavior; this does not change handling of normal alphabetic command names.

Tests

Adds a parameterized regression test (test_remove_command_escapes_regex_metacharacters_in_name)
covering ., *, and + in command names. Each case asserts the look-alike command is
preserved and the targeted command is removed. The cases fail before this change and pass after;
the full unit-test suite remains green.

`_remove_command` interpolated the user-supplied `command` directly into a
regex pattern, so a command name containing regex metacharacters was matched
as a pattern rather than literally. For example, deleting a command named
`todo.` would also strip `\todoX{...}` because `.` matches any character.

Wrap the name with `regex.escape()`, consistent with the other pattern
constructions in this module (e.g. the filename handling in
`_replace_includegraphics`). Add parameterized regression tests covering `.`,
`*`, and `+` in command names; they fail before this change and pass after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant