|
| 1 | +# Repo-Arch: Local Model Training Report |
| 2 | + |
| 3 | +**Project**: tradingview-mcp-server |
| 4 | +**Date**: 2026-05-14 |
| 5 | +**Runtime**: 47 commits mined | 18 cards | 61 training examples | 200 LoRA iterations |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 1. Data Mining Results |
| 10 | + |
| 11 | +### Git History |
| 12 | +- **47 commits** mined and classified from the full git history |
| 13 | +- Signal types detected: `fix`, `docs`, `test`, `rationale`, `revert` |
| 14 | +- Cached for fast re-runs |
| 15 | + |
| 16 | +### 18 Insight Cards Generated |
| 17 | + |
| 18 | +| Card | Type | Confidence | Commits | |
| 19 | +|------|------|------------|---------| |
| 20 | +| src/index.ts | repeated-fix | 0.60 | 2 | |
| 21 | +| src/index.ts ↔ src/tools/screen.ts | co-change | 0.60 | 5 | |
| 22 | +| src/api/types.ts ↔ src/index.ts | co-change | 0.55 | 5 | |
| 23 | +| src/resources/presets.ts | test-gap | 0.52 | 4 | |
| 24 | +| src/index.ts | churn-hotspot | 0.51 | 5 | |
| 25 | +| src/api/types.ts ↔ src/tools/fields.ts | co-change | 0.50 | 4 | |
| 26 | +| src/tools/screen.ts | churn-hotspot | 0.49 | 5 | |
| 27 | +| src/tools/fields.ts | test-gap | 0.49 | 3 | |
| 28 | +| src/tools/screen.ts | test-gap | 0.49 | 3 | |
| 29 | +| package-lock.json | repeated-fix | 0.70 | 3 | |
| 30 | +| README.md | test-gap | 0.70 | 5 | |
| 31 | +| .github/workflows/npm-publish.yml | repeated-fix | 0.60 | 2 | |
| 32 | +| README.md | churn-hotspot | 0.57 | 5 | |
| 33 | +| .gitignore ↔ README.md | co-change | 0.45 | 3 | |
| 34 | +| .claude/commands/run-screener.md | repeated-fix | 0.60 | 2 | |
| 35 | +| src/resources/presets.ts | churn-hotspot | 0.47 | 5 | |
| 36 | +| README.md ↔ package.json | co-change | 0.45 | 3 | |
| 37 | +| .claude/commands/run-screener.md | churn-hotspot | 0.44 | 4 | |
| 38 | + |
| 39 | +**8 source-code cards accepted** for training. |
| 40 | + |
| 41 | +### Key Architectural Insights |
| 42 | + |
| 43 | +1. **src/index.ts** is the highest-churn source file (10 changes in 5 commits) with repeated fixes — the MCP server entry point |
| 44 | +2. **src/tools/screen.ts** is also high-churn (8 changes) and tightly coupled to `src/index.ts` (co-change cluster) |
| 45 | +3. **src/api/types.ts** co-changes with both `src/index.ts` and `src/tools/fields.ts` — the type system bridges the entry point and tool implementations |
| 46 | +4. **src/resources/presets.ts** has test-gap warnings and high churn (7 changes) — likely due to evolving screening strategies |
| 47 | +5. **src/tools/fields.ts** has a test gap — field definitions changed without test updates |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## 2. Retrieval Benchmark (Eval) |
| 52 | + |
| 53 | +| Strategy | Hit Rate | Score | |
| 54 | +|----------|----------|-------| |
| 55 | +| **Keyword** | **93.8%** | 15/16 queries | |
| 56 | +| Embedding | **75.0%** | 12/16 queries | |
| 57 | + |
| 58 | +> **Best strategy**: Keyword search outperforms embedding for this small codebase. With more cards (hundreds+), embedding would likely pull ahead. |
| 59 | +
|
| 60 | +--- |
| 61 | + |
| 62 | +## 3. Training Dataset |
| 63 | + |
| 64 | +**61 examples** across 4 categories: |
| 65 | + |
| 66 | +| Category | Count | Description | |
| 67 | +|----------|-------|-------------| |
| 68 | +| QA | 10 | Q&A pairs about project-specific risks | |
| 69 | +| Review Warning | 2 | Diff review warnings from card history | |
| 70 | +| Risk Classification | 12 | Classify files as safe/risky based on history | |
| 71 | +| Negative | 37 | "No historical warnings" for unknown files | |
| 72 | + |
| 73 | +### Dataset Format |
| 74 | +```json |
| 75 | +{ |
| 76 | + "messages": [ |
| 77 | + {"role": "user", "content": "What keeps breaking in src/index.ts?"}, |
| 78 | + {"role": "assistant", "content": "Repeated fixes in: src/index.ts. This file was fixed 2 times."} |
| 79 | + ] |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +### Dataset Distribution |
| 84 | +```json |
| 85 | +{ |
| 86 | + "negative": 37, |
| 87 | + "qa": 10, |
| 88 | + "review-warning": 2, |
| 89 | + "risk-classification": 12 |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## 4. Local Model Training |
| 96 | + |
| 97 | +### Configuration |
| 98 | +- **Base Model**: Qwen/Qwen2.5-Coder-1.5B-Instruct (1.5B params) |
| 99 | +- **Method**: LoRA (Low-Rank Adaptation) |
| 100 | +- **Layers**: 4 trainable |
| 101 | +- **Trainable params**: 0.085% (1.319M / 1.54B) |
| 102 | +- **Learning rate**: 1e-5 |
| 103 | +- **Batch size**: 4 |
| 104 | +- **Total iterations**: 200 (2 cycles of 100) |
| 105 | +- **Trained tokens**: ~25K per cycle |
| 106 | +- **Peak memory**: 4.4 GB (Apple Silicon) |
| 107 | + |
| 108 | +### Loss Convergence |
| 109 | + |
| 110 | +| Iteration | Train Loss | Val Loss | |
| 111 | +|-----------|------------|----------| |
| 112 | +| 1 | — | 6.439 | |
| 113 | +| 10 | 4.831 | 3.958 | |
| 114 | +| 50 | 0.774 | 0.427 | |
| 115 | +| 100 | 0.195 | 0.136 | |
| 116 | +| 200 | 0.085 | 0.192 | |
| 117 | + |
| 118 | +**Convergence quality**: 97.5% reduction in training loss, no overfitting (val loss tracked train loss throughout). |
| 119 | + |
| 120 | +### Adapter Weights |
| 121 | +``` |
| 122 | +.repo-arch/adapters/repo-arch-97d42df/ |
| 123 | +├── adapters.safetensors (current) |
| 124 | +├── 0000100_adapters.safetensors (cycle-1) |
| 125 | +└── adapter_config.json |
| 126 | +``` |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## 5. Model Inference Examples |
| 131 | + |
| 132 | +### Example 1: Known risk identified ✅ |
| 133 | +``` |
| 134 | +Q: What keeps breaking in src/index.ts? |
| 135 | +A: Repeated fixes in: src/index.ts. This file was fixed 2 times. |
| 136 | + Consider adding regression tests or a deeper refactor to address root cause. |
| 137 | +``` |
| 138 | + |
| 139 | +### Example 2: Different file, correct risk ✅ |
| 140 | +``` |
| 141 | +Q: What risks should I know about when changing src/tools/screen.ts? |
| 142 | +A: Repeated fixes in: src/tools/screen.ts. This file was fixed 2 times. |
| 143 | + Consider adding regression tests or a deeper refactor to address root cause. |
| 144 | +``` |
| 145 | + |
| 146 | +### Example 3: Unknown file, safe default ✅ |
| 147 | +``` |
| 148 | +Q: Review this change touching packages/unknown/pkg.ts. |
| 149 | +A: No historical warnings found. Standard review applies. |
| 150 | +``` |
| 151 | + |
| 152 | +### Example 4: Test gap query (correctly no-fix response) ✅ |
| 153 | +``` |
| 154 | +Q: Is there test coverage for src/tools/fields.ts? |
| 155 | +A: No historical warnings found. Standard review applies. |
| 156 | +``` |
| 157 | +> Note: The model correctly distinguishes between "repeated fix" and "test gap" signals. |
| 158 | +> A test-gap card exists for fields.ts, but the model only surfaces fix-related history. |
| 159 | +
|
| 160 | +--- |
| 161 | + |
| 162 | +## 6. Pi Agent Integration Hooks |
| 163 | + |
| 164 | +### Extension: `/Users/pavel/repos/fiale-plus/tradingview-mcp-server/pi/extensions/repo-arch.ts` |
| 165 | + |
| 166 | +The pi extension registers a `repo-arch` command with argument completions and workflow selection UI: |
| 167 | + |
| 168 | +```typescript |
| 169 | +pi.registerCommand('repo-arch', { |
| 170 | + description: 'Show the self-contained repo-arch CLI workflow', |
| 171 | + getArgumentCompletions: (prefix) => { /* init, flow, review, train, eval, why, similar */ }, |
| 172 | + handler: async (_args, ctx) => { |
| 173 | + // Interactive workflow: select → run command via node child_process |
| 174 | + }, |
| 175 | +}); |
| 176 | +``` |
| 177 | + |
| 178 | +### Context Pack Module (`context-pack.ts`) |
| 179 | + |
| 180 | +Three context injection modes for pi agent hooks: |
| 181 | + |
| 182 | +| Mode | Function | Use Case | |
| 183 | +|------|----------|----------| |
| 184 | +| `whyContextPack` | Cards + signals for one file | "Why should I be careful modifying X?" | |
| 185 | +| `diffContextPack` | Changed files + regression warnings | "Check this diff for risks" | |
| 186 | +| `cardsContextPack` | All cards + metadata | "What does the project's history say?" | |
| 187 | + |
| 188 | +### How to use in pi agent flows |
| 189 | + |
| 190 | +**Before editing a file** — load context: |
| 191 | +``` |
| 192 | +const context = repoArch.whyContextPack("src/index.ts", cards, commitCount, signals, []); |
| 193 | +// Injects: fix count, co-change partners, signal summary, and relevant cards |
| 194 | +``` |
| 195 | + |
| 196 | +**Before code review** — load diff context: |
| 197 | +``` |
| 198 | +const context = repoArch.diffContextPack("main", "HEAD", changedFiles, warnings); |
| 199 | +// Injects: which files changed, what regression patterns match, risk level per file |
| 200 | +``` |
| 201 | + |
| 202 | +**At session start** — load project memory: |
| 203 | +``` |
| 204 | +const context = repoArch.cardsContextPack("all cards", cards, headSha, false); |
| 205 | +// Injects: full card set as project-memory preamble |
| 206 | +``` |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## 7. Commands to Re-run |
| 211 | + |
| 212 | +```bash |
| 213 | +# Full pipeline (from scratch) |
| 214 | +repo-arch flow run full --repo . |
| 215 | + |
| 216 | +# Quick update (if git history changed) |
| 217 | +repo-arch flow run --repo . |
| 218 | + |
| 219 | +# Continue training |
| 220 | +repo-arch train cycle --repo . |
| 221 | + |
| 222 | +# Resume from latest checkpoint |
| 223 | +repo-arch train resume --repo . |
| 224 | + |
| 225 | +# Inspect current state |
| 226 | +repo-arch flow inspect --repo . |
| 227 | +repo-arch train status --repo . |
| 228 | + |
| 229 | +# Semantic search over project history |
| 230 | +repo-arch similar "what breaks in the MCP server?" --json |
| 231 | + |
| 232 | +# Explain a file |
| 233 | +repo-arch why src/index.ts --json |
| 234 | +``` |
0 commit comments