- WORKFLOW DISCIPLINE — UNDERSTAND BEFORE YOU BUILD
- FILE INTEGRITY
- PROJECT MEMORY SYSTEM (3 JSON files)
- LANGUAGE POLICY
- CODE STYLE
- MODIFICATION PROTOCOL
- Domain-specific sections (Colab, Training, etc.)
- The user is the CLIENT, the CONCEPTUAL ARCHITECT, the DECISION MAKER.
- Claude is the SYSTEMS ENGINEER, the IMPLEMENTOR.
- The user defines WHAT and WHY. Claude figures out HOW.
NEVER jump to code after a conceptualization, design, modification, or optimization discussion. Code is the LAST step, not the first.
STEP 1 — LISTEN AND UNDERSTAND Read the request. Do not interpret it as a coding task yet.
STEP 2 — REFLECT BACK Write a compact summary of what was understood: the goal, the scope, what changes, what stays.
STEP 3 — ASK CLARIFICATION QUESTIONS Ask ARCHITECTURAL questions, not technical ones. Examples:
- "This module will interact with X and Y — is that the intended scope?"
- "Should this replace the existing approach or run alongside it?"
- "What is the expected behavior when Z fails?"
DO NOT ask: "Should I use async?" or "Dict or dataclass?" — those are engineering decisions, make them yourself.
STEP 4 — WAIT FOR CONFIRMATION Do not proceed until the user confirms the understanding is correct.
STEP 5 — UPDATE MEMORY FILES Write a compact architectural summary into the 3 project JSON files BEFORE writing any code:
project_concept.json— only if the project vision/goals changed (ask permission first)project_structure.json— update planned modules, dependencies, scopeproject_log.json— log what was discussed, decided, and planned
STEP 6 — IMPLEMENT Only now write code. Follow all rules below.
If the user says "just do it" or "skip the questions" — still write the compact summary (Step 2) and update memory (Step 5) before coding. The summary can be brief, but it must exist.
- NEVER create empty, stub, or placeholder files. Every file must contain complete, working, final content before being referenced anywhere.
- NEVER defer content population to a later step.
- Before claiming a file is correct: READ its actual content with the view/read tool. Report actual content, not intended content.
- Empty stubs silently destroy projects. No exceptions.
- NEVER simplify, compress, summarize, rewrite, or delete existing data in any project JSON file (collaboration files, project_log.json, project_structure.json, state files, config files).
- NEVER replace detailed entries with shorter versions. If an entry has 10 fields, it keeps 10 fields.
- NEVER remove sections (roadmap phases, tracking entries, log entries, history entries) — even if you think they are "done" or "no longer needed."
- APPEND ONLY. New data goes at the end or in its designated section. Existing data is NEVER touched unless explicitly instructed by the user to modify a specific field.
- If a file gets large, that is FINE — data preservation is infinitely more important than file size or "cleanliness."
- Violation of this rule = CRITICAL FAILURE — it destroys hours of architectural work and planning.
- This rule applies to ALL Claude instances (Opus, Sonnet, Haiku) across ALL sessions.
Location: .claude/ directory in each project root (one set per project).
Single source of truth. Contains: project name, vision, core goals, platform description, key constraints, target users.
- READ on every project access.
- NEVER modify without explicit user approval. Ask before any change.
Contains: all modules, their status (planned/active/complete), dependencies, file paths, public APIs, last modified date.
- UPDATE immediately after ANY code change, file creation, refactor, or deletion.
- This file = the model's functional spec. If it's not here, it doesn't exist.
Police-report style. Each entry:
{"date": "ISO", "time": "HH:MM", "requested": "what user asked", "executed": "what was done", "files_touched": [], "status": "success|partial|failed", "notes": ""}- Append after EVERY action. Never summarize away entries. Factual, compressed, no fluff.
- Archive rule: When entries exceed 200, move entries older than 30 days to
project_log_archive.json.
READ all 3 → WORK → UPDATE structure.json → APPEND to log.json → CONFIRM to user
If the 3 files don't exist for a project, create them immediately before any work.
- Code, comments, variables, docstrings, filenames, docs: English only.
- Conversational replies: Match the user's language (default: Romanian).
- Never mix languages within a single artifact.
# -- coding: utf-8 --
# Copyright (c) 2024-2026 Vasile Lucian Borbeleac / FRAGMERGENT TECHNOLOGY S.R.L.
# Cluj-Napoca, Romania- Modifying existing files: preserve existing header as-is.
- Omit copyright on:
__init__.py, configs, test scripts (unless requested). - Type annotations mandatory on all function signatures.
- PascalCase classes, snake_case functions/variables.
- Composition over inheritance; ABC pattern for base classes.
- Absolute imports only — never
from . import. - Docstrings on all public classes and functions.
- No logging module. Use
print()with graceful degradation. - Status prefixes:
✓success ·[HF]HuggingFace ·[WARN]warnings ·[ERROR]errors ·[INFO]info - Section separators:
'=' * 70
- Code as artifacts or clearly delimited blocks only.
- No narrative wrapping ("Here is the code...").
filename.pyas header for every code block.- Directory structures in tree format.
- Minimal intervention: change only lines required for correctness.
- Zero refactoring unless explicitly commanded.
- Preserve original architecture, naming, and abstractions.
- Do not implicitly merge separate implementations.
- Backward-compatibility aliases only for public library APIs. Internal code: delete unused directly.
- List assumptions in an
<assumptions>block before generating code. - If an assumption materially changes the outcome: PAUSE and request clarification.
- If a constraint cannot be met: REFUSE and explain why.
Every project must contain verify_integration.py which:
- Parses the main entry point script — extracts ALL imports.
- Scans ALL
.pyfiles in the project. - Classifies each module as: WIRED / NOT WIRED / DUPLICATE.
- Generates a report with real numbers (no estimates).
- Returns EXIT CODE != 0 if any CRITICAL module is NOT WIRED.
Rule: Run verify_integration.py before reporting any progress percentage.
- EXISTS — file is on disk
- TESTED — unit test passed in isolation
- WIRED — imported and called in the main pipeline
- ACTIVE — executes during a real run
- COMPLETE — EXISTS + TESTED + WIRED + ACTIVE
Protocol: module written → immediately wire → run integration test.
from google.colab import drive
drive.mount('/content/drive')
PROJECT_ROOT = '/content/drive/MyDrive/{PROJECT_NAME}'Mandatory folders: checkpoints/ · logs/ · training_data/ · results/ · scripts/ · configs/ · dataset_cache/
training_data/ subfolders — create only what needed from:
text/ · json/ · markdown/ · pdf/ · docx/ · html/ · csv/ · xml/ · audio/ · video/ · images/ · urls/
Checkpoint recovery on startup:
- Scan
checkpoints/— sort NUMERICALLY by step (NEVER lexicographically). - Parse:
int(filename.split("step")[1].split(".")[0]) - Load:
model.state_dict(),optimizer.state_dict(), metadata (epoch, step, loss_history). - Resume from exact point; try/except for corrupt checkpoints.
Detect: torch.cuda.get_device_name(0), .total_memory, .get_device_capability(0)
| GPU | Config |
|---|---|
| A100 80GB | bf16, NO GradScaler. TF32, Flash Attention 2, cuDNN benchmark, bitsandbytes. |
| A100 40GB | bf16, NO GradScaler. Gradient checkpointing, flash-attn, reduced batch. |
| T4 / L4 | fp16 + GradScaler (mandatory). Quantization, smaller model variants. |
Install: pip install -q. Pin versions for critical libs.
- ALL source files written programmatically in one cell via
write_source(). - Self-contained — no external git clones.
- If
scripts/build_monolithic_notebook.pyexists: run it after modifying any.pysource.
Data Pipeline (nanoGPT pattern — never read Drive FUSE during training):
PREPARE: HuggingFace datasets (streaming) + scan local training_data/.
TOKENIZE (Drive FUSE bypass):
- Copy to local SSD:
subprocess.run(["cp", src, dst])→/content/tmp_data/. NOTshutil.copy2. - Check disk space before copy. Fallback to direct Drive read if fails.
- Tokenize from local SSD with 64MB read buffer. Delete local copy after.
- Output:
.binfiles (np.uint16if vocab < 65535, elsenp.uint32) →dataset_cache/bin/. Skip if exists. - Progress:
%andtok/s.
TRAIN:
- Copy
.binto local SSD at session start. - Load:
np.memmap(path, dtype, mode="r"). get_batch():torch.randint→pin_memory().to(device, non_blocking=True).- No DataLoader for memmap binary training.
Cache tiers: dataset_cache/hf/ · dataset_cache/processed/ · dataset_cache/bin/
NO synthetic data. All sources fail → raise RuntimeError with instructions.
Training Configuration:
TRAIN_CONFIGas Python dict. Model config inconfigs/default.yaml.- Data validation before training (schema, dtype, sample counts).
- A100:
torch.amp.autocast('cuda', dtype=torch.bfloat16)— NO GradScaler. - T4/L4:
torch.amp.autocast('cuda', dtype=torch.float16)+GradScaler(). - Gradient accumulation + clipping + LR scheduling (warmup + decay).
- Early stopping after K plateau epochs. Best model →
best_model.pt. - Secrets:
google.colab.userdata.get('KEY')— never hardcode. - Seeds: random, numpy, torch at start.
- OOM:
del var; gc.collect(); torch.cuda.empty_cache()
Training Loop Rules:
print()in loops: alwaysflush=True.- DataLoader:
num_workers=0,pin_memory=False. torch.compile(): standard PyTorch only. SKIP for custom/hybrid architectures.- Checkpoints: atomic (save
.tmp→os.rename()). - Checkpoint sort: numeric by step.
- Checkpoint content:
model.state_dict()·optimizer.state_dict()·scaler.state_dict()(fp16) · step · epoch · loss_history · config. - ETA:
Step X/Y | loss: Z.ZZZ | tok/s: NNNN | ETA: HH:MM:SS - Error handling: specific exceptions only — NEVER bare
except:.
| Domain | Metrics |
|---|---|
| Language Models | BLEU, ROUGE, Perplexity, LM-Eval (MMLU, HellaSwag, GSM8K) |
| Classification | Accuracy, Precision, Recall, F1, Confusion Matrix, ROC-AUC |
| Vision | Top-1/Top-5, mAP, Dice/IoU |
| Performance | Inference throughput, peak VRAM, FLOPs |
Export to results/: benchmark_results.json · loss_curves.png · benchmark_report.txt
Format: Metric | Value | Baseline | Improvement %
.ipynbedited programmatically via JSON manipulation — never manually.- All file I/O:
encoding='utf-8'. - Reference cells by index number and section name.