|
1 | 1 | # llmcc-rust |
2 | 2 |
|
3 | | -This crate provides Rust language support for the **llmcc** project. It implements the language-specific logic for parsing, symbol collection, and semantic analysis (binding) of Rust code. |
4 | | - |
5 | | -## Overview |
6 | | - |
7 | | -`llmcc-rust` integrates with the core compiler infrastructure to provide: |
8 | | -- **Parsing**: Uses `tree-sitter-rust` to generate an AST. |
9 | | -- **Symbol Collection**: A first pass to declare all symbols (structs, functions, variables) in the scope graph. |
10 | | -- **Symbol Binding**: A second pass to resolve references, infer types, and build the dependency graph. |
11 | | - |
12 | | -## Architecture |
13 | | - |
14 | | -The analysis pipeline consists of three main stages, orchestrated by the `LangRust` implementation in `src/token.rs`. |
15 | | - |
16 | | -### 1. Parsing (`src/token.rs`) |
17 | | -The entry point for the crate. It defines the `LangRust` struct which implements the `LanguageImpl`. |
18 | | -- Wraps `tree-sitter-rust` to produce a concrete syntax tree. |
19 | | -- Maps Tree-sitter nodes to LLMCC's internal HIR (High-level Intermediate Representation). |
20 | | -- Auto-generates token definitions via `build.rs`. |
21 | | - |
22 | | -### 2. Symbol Collection (`src/collect.rs`) |
23 | | -The **Collection Pass** walks the AST to identify and declare definitions. |
24 | | -- **Visitor**: `CollectorVisitor` traverses the AST. |
25 | | -- **Scopes**: Creates scopes for modules, functions, structs, and blocks. |
26 | | -- **Declarations**: Registers symbols for: |
27 | | - - Primitives (`i32`, `bool`, etc.) |
28 | | - - Modules and Crates (parsing `Cargo.toml` via `src/util.rs`) |
29 | | - - Functions, Structs, Enums, Traits |
30 | | - - Variables (via pattern matching in `let` bindings and parameters) |
31 | | -- **Visibility**: Handles `pub` and `pub(crate)` modifiers to determine global symbol visibility. |
32 | | - |
33 | | -### 3. Symbol Binding (`src/bind/`) |
34 | | -The **Binding Pass** resolves identifiers to their definitions and builds the call graph. This module is split into focused components: |
35 | | - |
36 | | -- **Visitor (`src/bind/visitor.rs`)**: The main driver, `BinderVisitor`, walks the AST again. |
37 | | -- **Resolution (`src/bind/resolution.rs`)**: `SymbolResolver` handles complex name lookups, including: |
38 | | - - Lexical scoping (variables). |
39 | | - - Path resolution (`std::collections::HashMap`). |
40 | | - - Method resolution (looking up methods in `impl` blocks). |
41 | | -- **Inference (`src/bind/inference.rs`)**: `ExprResolver` determines the types of expressions to support accurate method resolution. |
42 | | -- **Linking (`src/bind/linker.rs`)**: `SymbolLinker` connects usage sites to definition sites, forming the dependency graph used by downstream LLM tasks. |
43 | | - |
44 | | -### Utilities (`src/util.rs`) |
45 | | -Helper functions for filesystem and project structure analysis: |
46 | | -- `parse_crate_name`: Extracts crate names from `Cargo.toml`. |
47 | | -- `parse_module_name`: Handles Rust's module system conventions (e.g., `mod.rs`). |
| 3 | +Rust language support for llmcc. |
48 | 4 |
|
49 | | -## Development |
| 5 | +The public API is intentionally small: use `LangRust` with the generic APIs from `llmcc-core` and `llmcc-resolver`. The collector, binder, inference, and pattern helpers are implementation details of the language adapter. |
| 6 | + |
| 7 | +## Pipeline |
| 8 | + |
| 9 | +`LangRust` implements the core language contract in `src/token.rs`: |
| 10 | + |
| 11 | +- parses Rust source with `tree-sitter-rust` |
| 12 | +- maps tree-sitter nodes and fields to llmcc HIR/block kinds from `src/token_map.toml` |
| 13 | +- creates Rust primitive symbols in the initial global scope |
| 14 | +- dispatches symbol collection and binding to the internal passes |
| 15 | + |
| 16 | +The internal passes are split by responsibility: |
50 | 17 |
|
51 | | -### Testing |
52 | | -The crate includes extensive unit tests ensuring correct symbol resolution and dependency tracking. |
| 18 | +- `collect.rs`: declares Rust symbols and attaches lexical/semantic scopes |
| 19 | +- `bind.rs`: resolves references, associates symbols with types, and records graph-relevant relationships |
| 20 | +- `infer.rs`: infers local expression/type symbols needed by binding |
| 21 | +- `pattern.rs`: propagates known types through Rust binding patterns |
| 22 | + |
| 23 | +## Conventions |
| 24 | + |
| 25 | +- Keep Rust-specific syntax decisions in this crate, not in `llmcc-core` or `llmcc-resolver`. |
| 26 | +- Prefer collection-time publication of global symbols; binding may run per unit in parallel. |
| 27 | +- Avoid panics for recoverable HIR shape drift. Skip or warn when a tree-sitter node is not shaped as expected. |
| 28 | +- Add token-map entries before implementing visitors for new Rust syntax. |
| 29 | + |
| 30 | +## Development |
53 | 31 |
|
54 | 32 | ```bash |
55 | | -# Run all tests for this crate |
56 | 33 | cargo test -p llmcc-rust |
| 34 | +cargo clippy -p llmcc-rust --all-targets -- -D warnings |
57 | 35 | ``` |
58 | | - |
59 | | -### Adding New Features |
60 | | -1. **New Syntax**: Update `src/token.rs` (or the build script) if new token types are needed. |
61 | | -2. **New Declarations**: Update `CollectorVisitor` in `src/collect.rs` to register new symbol kinds. |
62 | | -3. **New Resolution Logic**: Update `src/bind/` modules to handle new scoping rules or reference types. |
0 commit comments