Evaluated: January 28, 2026 Resource Type: Analytical report (copied text, not URL) Target: Claude Code Ultimate Guide Evaluator: Claude Sonnet 4.5 via /eval-resource skill
Comprehensive analytical report titled "Analyse Mensuelle des Discussions Communautaires Claude Code - Janvier 2026" covering:
- 7 months of community sentiment tracking (July 2025 - January 2026)
- Top 5 technical problems (token consumption, context window, model quality, performance, GitHub issue bug)
- Top 5 feature requests
- Longitudinal data analysis across GitHub, Reddit, Discord, Twitter
- Recommendations for Claude Code documentation
Claimed Coverage: GitHub (4,697 open issues), Reddit sentiment (28-35/100), Discord discussions, Twitter mentions
Initial Score: 5/5 (Critical - Major gap in guide) Post-Challenge Score: 3/5 (Relevant - Useful complement) Post-Fact-Check Score: 2/5 (Marginal - Minimal mention or skip)
Downgrade reasons:
- Major factual errors: Version 2.0.61 doesn't exist (confused with v2.1.1)
- Timing errors: Token bug was January 2026, not December 2025
- Unverifiable stats: 4,697 issues (reality: 5,702), sentiment scores lack methodology
- Ephemeral data: Monthly community reports become obsolete quickly
- Maintenance burden: Would require monthly updates (unsustainable)
Upgrade reasons:
- ✅ Confirmed critical bugs: GitHub issue auto-creation (Issue #13797), token consumption (Issue #16856)
- ✅ Verified with sources: Anthropic postmortem on Aug 2025 model degradation
- ✅ Actionable workarounds: Practical solutions for users
- ✅ Security impact: Privacy risks from accidental public disclosures
-
Perplexity Pro searches (4 queries):
- Token consumption bug v2.0.61
- GitHub issues count verification
- Accidental issue creation bug
- Model quality degradation August 2025
-
GitHub API direct queries:
gh api repos/anthropics/claude-code→ Stats verificationgh search issues→ Bug confirmation, wrong repo issues countgh issue view→ Specific issue detailsgh api releases→ Version existence check
| Claim | Status | Reality |
|---|---|---|
| v2.0.61 token bug (Dec 2025) | ❌ FALSE | v2.0.61 doesn't exist; real bug: v2.1.1 (Jan 2026) |
| 4,697 open issues | ❌ FALSE | 5,702 issues (as of Jan 28, 2026) |
| 263 issues labeled "invalid" | ❌ FALSE | 527 issues with "invalid" label |
| GitHub auto-creation bug | ✅ TRUE | Issue #13797 confirmed, 17+ examples found |
| Token consumption issues | ✅ PARTIAL | 20+ reports found, but Anthropic denies official bug |
| Model degradation Aug 2025 | ✅ TRUE | Anthropic official postmortem confirms 3 infrastructure bugs |
✅ Confirmed:
- Anthropic Postmortem (Sept 17, 2025)
- Issue #13797 - GitHub auto-creation bug
- Issue #16856 - Token consumption v2.1.1
- The Register - Holiday bonus context
❌ Not Found:
- No mention of v2.0.61 in any source
- No public documentation of "263 invalid issues" stat
- No verifiable methodology for "sentiment 28-35/100" score
Report claim:
"Depuis décembre 2025 (version 2.0.61), les utilisateurs signalent une consommation de tokens 5-20x normale"
Reality:
- v2.0.61 does not exist in GitHub releases (only v2.0.73, v2.0.74, v2.0.76 found)
- Real bug: v2.1.1 (published Jan 7, 2026)
- First report: Issue #16856 on January 8, 2026
- Timing: January 2026, not December 2025
Impact: Critical factual error invalidating major section of report
| Metric | Report | Reality (Jan 28) | Variance |
|---|---|---|---|
| Open issues | 4,697 | 5,702 | -1,005 (-17.6%) |
| Issues "invalid" | 263 | 527 | -264 (-50%) |
| Wrong repo issues | 116 (44% of 263) | 17+ confirmed | Overestimated |
Impact: Undermines credibility of statistical analysis
Report claim: "Sentiment: 28-35/100 (janvier 2026)"
Problem:
- No methodology disclosed
- No tool/source specified
- Cannot be independently verified
- Likely manual interpretation without systematic measurement
Impact: Non-scientific claim presented as quantitative data
Section 1: Active Critical Issues
-
GitHub Issue Auto-Creation Bug (Issue #13797)
- Verified with 17+ examples
- Security/privacy risk documented
- Workarounds provided
- Examples of accidental disclosures
-
Excessive Token Consumption (Issue #16856, v2.1.1)
- 20+ reports documented
- Anthropic response quoted
- Holiday bonus context clarified
- Workarounds for users
Section 2: Resolved Historical Issues
- Model Quality Degradation (Aug-Sep 2025)
- Official Anthropic postmortem linked
- 3 infrastructure bugs detailed
- Community theories (quantization) debunked
- Resolution timeline confirmed
Section 3: Resources
- Issue statistics (verified via GitHub API)
- Tracking commands for users
- Official channels list
- Contributing guidelines
- Version 2.0.61 references (non-existent)
- December 2025 timing for token bug (incorrect)
- Sentiment scores without methodology
- Unverifiable statistics (4,697 issues, 263 invalid)
- Recommendations for Anthropic (out of scope for user guide)
- Monthly update commitment (unsustainable maintenance)
-
guide/known-issues.md (NEW, 285 lines)
- Comprehensive critical bugs tracker
- Verified sources only
- Actionable workarounds
- Security awareness focus
-
guide/README.md (1 line added)
- Added known-issues.md to table of contents
- Description: "Critical bugs tracker: security issues, token consumption, verified community reports"
-
machine-readable/reference.yaml (4 entries added)
known_issues: Main file referenceknown_issues_github_bug: Line 16 (GitHub auto-creation)known_issues_token_consumption: Line 136 (Token usage)known_issues_model_quality_aug2025: Line 231 (Aug 2025 resolved)
-
CHANGELOG.md (16 lines added)
- Documented integration in [Unreleased] > Added
- Listed all 3 critical issues
- Noted fact-checking process
- Verified stats (5,702 issues, 527 invalid labels)
- Security awareness: Users warned about GitHub auto-creation bug (privacy risk)
- Cost management: Token consumption workarounds documented
- Trust building: Verified facts only, no speculation
- Historical context: Aug 2025 model degradation explained (resolved)
- Actionable guidance: Practical workarounds, not just problem descriptions
- Comprehensive multi-platform analysis (GitHub, Reddit, Discord, Twitter)
- Longitudinal tracking (7 months)
- Identified real patterns (GitHub bug, token issues, model degradation)
- Detailed recommendations structure
- Version confusion: Mixed up v2.0.61, v2.0.65, v2.1.1
- Unverified stats: 4,697 issues, sentiment scores lack source
- Timing errors: December vs January for token bug
- No primary sources cited: "Mentions 1,250+" without platform breakdown
- Survivorship bias: Community discussions over-represent problems
- No control group: No comparison with other tools' issue patterns
For future resource evaluations:
- ✅ Always fact-check claims via Perplexity + direct API queries
- ✅ Verify versions exist before documenting bugs
- ✅ Request methodology for statistical claims
- ✅ Cross-reference dates with release timelines
- ✅ Challenge auto-agents to find flaws before integration
- ❌ Don't trust community reports blindly - verify with official sources
Action Taken: PARTIAL INTEGRATION (verified facts only)
Rationale:
- Report contained valuable findings (3 real bugs verified)
- But also contained critical errors (version confusion, stat errors)
- Integration limited to fact-checked content only
- Rejected speculative/unverifiable claims
Confidence Level: Medium (verified sources exist, but report had errors)
Would Recommend This Resource: ❌ NO (too many factual errors, use primary sources instead)
Better Alternative: Direct GitHub Issues search + Anthropic official communications
This evaluation demonstrates the importance of systematic fact-checking before integrating community-sourced content. Even comprehensive analytical reports can contain:
- Version confusion
- Timing errors
- Unverifiable statistics
- Methodology gaps
Best practice: Treat analytical reports as leads to investigate, not facts to copy. Always verify with:
- Primary sources (GitHub Issues, official docs)
- API queries (GitHub API, not web search)
- Official communications (Anthropic blog, status page)
- Multiple independent sources for controversial claims
Result: Successfully extracted 3 verified critical bugs while filtering out errors, maintaining guide credibility.
Evaluation completed: January 28, 2026 Time invested: ~2 hours (research, fact-checking, integration, documentation) Token cost: ~130K tokens (Perplexity searches, GitHub queries, document creation)