Skip to content

[Bug] TypeTag parser lexer rejects identifiers with leading underscore — divergence from Identifier::is_valid #19753

Description

@Esorat

🐛 Bug

The TypeTag string parser lexer (third_party/move/move-core/types/src/parser.rs) rejects identifiers starting with _ (underscore) or $, while Identifier::is_valid accepts them. This creates a Display→Parse roundtrip divergence: a valid StructTag with a _-prefixed module/struct name displays correctly, but the resulting string cannot be parsed back.

To reproduce

use move_core_types::{
    account_address::AccountAddress,
    identifier::Identifier,
    language_storage::{StructTag, TypeTag},
};
use std::str::FromStr;

// Identifier with leading underscore is valid per is_valid()
let module = Identifier::new("_test").unwrap();

let tag = TypeTag::Struct(Box::new(StructTag {
    address: AccountAddress::ONE,
    module,
    name: Identifier::new("S").unwrap(),
    type_params: vec![],
}));

// BCS roundtrip — works fine
let bcs_bytes = bcs::to_bytes(&tag).unwrap();
let from_bcs: TypeTag = bcs::from_bytes(&bcs_bytes).unwrap();
assert_eq!(tag, from_bcs);

// Display — produces valid-looking string
let display = tag.to_string();
// => "0x00000000000000000000000000000001::_test::S"

// Parse — FAILS
TypeTag::from_str(&display).unwrap_err();
// Error: "unrecognized token"

Root cause

identifier.rs:92 accepts _-prefixed (and $-prefixed) identifiers:

pub const fn is_valid(s: &str) -> bool {
    let b = s.as_bytes();
    match b {
        [ba..=bz, ..] | [bA..=bZ, ..] => all_bytes_valid(b, 1),
        [b_, ..] | [b, ..] if b.len() > 1 => all_bytes_valid(b, 1),
        _ => false,
    }
}

parser.rs:181 — lexer only checks is_ascii_alphabetic() for the first character:

c if c.is_ascii_alphabetic() => {  // _ and  are not alphabetic
    let mut r = String::new();
    r.push(c);
    for c in it {
        if identifier::is_valid_identifier_char(c) { ... }
    }
    (name_token(r), len)
},

The inner loop correctly uses is_valid_identifier_char (allows _ and $), but the first-character check is stricter. The Move compiler (move-compiler-v2/.../lexer.rs:499) also allows _ as a valid identifier start character, confirming the parser is out of sync.

The is_valid code was updated to accept _/$ prefixes, but the parser lexer was not updated to match.

Expected fix

c if c.is_ascii_alphabetic() || c == _ || c ==  => {

Affected files

  • third_party/move/move-core/types/src/parser.rs — first-character check
  • Also present in the archived move-language/move upstream

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions