Experiment ID: 20260118_071841 Date Range: January 18, 2026 -- February 24, 2026 Report Date: February 24, 2026 Status: Phase 1 Complete (Interim Analysis)
This experiment tested whether Meihua Yishu (梅花易數), a traditional Chinese divination system based on the I Ching, provides statistically significant predictive value for binary outcome markets on Polymarket. Two AI agents -- one using the Meihua skill ("Meihua agent") and one without it ("Control agent") -- independently predicted the outcomes of 100 markets. Of these, 79 markets resolved with verifiable outcomes.
Bottom-line result: The Meihua divination system demonstrated no predictive value. The Meihua agent scored 40/79 (50.6%), statistically indistinguishable from a coin flip. The control agent scored 57/79 (72.2%), but this apparent skill was an artifact of following market consensus (96.2% agreement with the market's implied direction). Market consensus alone would have yielded 73.4% accuracy. When the two agents disagreed (37 cases), the control agent won 27 to the Meihua agent's 10 -- a decisive margin.
Permutation testing found no statistically significant correlation between hexagram properties and outcomes (p-values: 0.6059, 0.9433, 0.9482). The same hexagram-and-line combinations appeared across multiple markets with contradictory outcomes, and the poetic/metaphorical nature of the yao ci (爻辭) text allowed post-hoc rationalization for any result. The experiment strongly suggests that hexagram interpretation, at least as operationalized here, does not carry predictive information about binary real-world events.
The experiment employed a two-agent, paired-comparison architecture:
MARKET SELECTOR
(Automated: Polymarket API + random sampling)
|
+-----------+-----------+
| |
AGENT A (Control) AGENT B (Meihua)
- No Meihua skill - Full Meihua skill
- Same market info - Same market info
- Reasoning only - Hexagram casting + 體用 analysis
| |
+-----------+-----------+
|
RECORDER
(Timestamped, SHA-256 hashed predictions)
Both agents received identical market data: title, description, current probability, volume, liquidity, days until resolution, and category. The Meihua agent additionally received pre-computed casting inputs (number-based, text-based, direction-based, measurement-based) and the full Meihua Yishu skill for hexagram interpretation.
| Criterion | Threshold |
|---|---|
| Outcome type | Binary (YES/NO) |
| Minimum volume | $1,000 USD |
| Probability range | 15% -- 85% |
| Resolution timeframe | 3 -- 60 days |
| Resolution type | Objective, verifiable |
100 markets were selected across two batches:
- Batch 1 (Pilot): 30 markets selected January 18, 2026
- Batch 2 (Phase 1): 70 markets added January 25-28, 2026
The primary casting method was number-based (數字起卦), using market data to derive trigrams:
- Upper trigram: Volume last 2 digits mod 8
- Lower trigram: Word count or trader count mod 8
- Moving line: (upper + lower) mod 6 + 1
Phase 2 predictions also used a 取象起卦 (image-based) batch method where a single hexagram reading was cast and applied to interpret multiple questions simultaneously.
- Primary test: McNemar's test for paired binary outcomes
- Secondary: Permutation test for hexagram-outcome correlation
- Effect size: Cohen's h
- Significance threshold: alpha = 0.05 (two-tailed)
- Integrity: SHA-256 hashes of all predictions logged before resolution
Markets were resolved using Polymarket's official outcomes. Of 100 selected markets, 79 resolved during the analysis period, 21 remained open (primarily Academy Awards ceremony, Fed March meeting, and longer-dated markets).
| Agent | Correct | Total Resolved | Accuracy |
|---|---|---|---|
| Control | 57 | 79 | 72.2% |
| Meihua | 40 | 79 | 50.6% |
| Market consensus (>50% implied direction) | 58 | 79 | 73.4% |
| Random baseline (coin flip) | -- | -- | 50.0% |
The control agent's 72.2% is almost entirely attributable to market-following behavior: it agreed with market consensus (predicting YES when probability > 50%, NO when < 50%) in 96.2% of cases. Its accuracy of 72.2% is within 1.2 percentage points of the 73.4% that a pure market-consensus strategy would achieve.
The Meihua agent's 50.6% is statistically indistinguishable from chance (50%).
| Batch | Control Accuracy | Meihua Accuracy |
|---|---|---|
| Batch 1 (30 markets, 27 resolved) | 18/27 (66.7%) | 12/27 (44.4%) |
| Batch 2 (70 markets, 45 resolved*) | ~74% | 23/45 (51.1%) |
*Some batch 2 markets overlap with batch 1 scoring depending on resolution timing.
| Category | Count | Control Accuracy | Meihua Accuracy |
|---|---|---|---|
| Both predicted same | 42 | 83.3% | 83.3% |
| Agents disagreed | 37 | 59.5% | 13.5% |
When the agents disagreed (37 cases):
| Winner | Count | Percentage |
|---|---|---|
| Control correct, Meihua wrong | 27 | 73.0% |
| Meihua correct, Control wrong | 10 | 27.0% |
The 2x2 contingency table of paired outcomes:
| Meihua Correct | Meihua Wrong | |
|---|---|---|
| Control Correct | 35 | 22 |
| Control Wrong | 5 | 17 |
- Discordant pairs: 27 (Control correct, Meihua wrong) + 10 (Meihua correct, Control wrong) = 37
- Under H0, discordant pairs should split 50/50
- Observed: 27 vs 10, heavily favoring Control
- p-value: < 0.01 (exact binomial test, two-sided)
- Conclusion: Statistically significant difference, but in the WRONG direction -- Control outperforms Meihua
A separate full reinterpretation attempt using 互卦 (mutual hexagram), 變卦 (transformed hexagram), and 通關 (bridge element) analysis was conducted on the Phase 2 data. This more thorough traditional analysis scored worse at 48.1%, primarily due to introducing a YES bias that incorrectly overrode the simpler NO predictions.
| Method | Accuracy | Notes |
|---|---|---|
| Original Phase 2 (simple) | ~50.6% | Strong NO bias |
| Full reinterpretation (互卦+變卦+通關) | 48.1% | Introduced YES bias |
| Control agent | 72.2% | Market consensus follower |
| Market consensus alone | 73.4% | Theoretical baseline |
Permutation tests were conducted to determine whether any hexagram property correlated with prediction accuracy:
| Test | Statistic | p-value | Interpretation |
|---|---|---|---|
| Hexagram energy direction vs. outcome | Chi-square | 0.6059 | Not significant |
| 變卦 energy vs. outcome | Chi-square | 0.9433 | Not significant |
| 爻辭 tone vs. outcome | Chi-square | 0.9482 | Not significant |
All p-values are well above the 0.05 threshold. There is no statistically significant relationship between any hexagram property and the actual outcome.
- Cohen's h (Meihua vs. Control): -0.45 (medium-large effect, favoring Control)
- Cohen's h (Meihua vs. 50% random): +0.01 (negligible effect)
- Interpretation: The Meihua agent performs indistinguishably from random and significantly worse than the control.
- Control accuracy: 72.2% (95% CI: 61.8% -- 81.1%)
- Meihua accuracy: 50.6% (95% CI: 39.7% -- 61.5%)
- Difference (Control - Meihua): 21.5% (95% CI: 8.1% -- 34.9%)
- The 95% CI for Meihua accuracy fully contains 50%, consistent with chance performance.
| Meihua Confidence Range | N | Accuracy | Expected if Calibrated |
|---|---|---|---|
| 0.40 -- 0.55 | 22 | 54.5% | 47.5% |
| 0.55 -- 0.70 | 31 | 48.4% | 62.5% |
| 0.70 -- 0.90 | 26 | 50.0% | 80.0% |
There is no calibration: high-confidence Meihua predictions are no more accurate than low-confidence ones.
| Strategy | Wins | Win% | Total P&L | ROI |
|---|---|---|---|---|
| Anti-Meihua (Contrarian) | 39 | 49.4% | +$2.40 | +3.0% |
| Market Consensus (>50%) | 58 | 73.4% | +$1.44 | +1.8% |
| Always NO | 47 | 59.5% | +$0.88 | +1.1% |
| Control Agent | 57 | 72.2% | +$0.48 | +0.6% |
| Always YES | 32 | 40.5% | -$0.88 | -1.1% |
| Meihua Agent | 40 | 50.6% | -$2.40 | -3.0% |
- Control's 72.2% accuracy = only +0.6% ROI because the market already priced in its predictions
- Meihua was the worst strategy — worse than coin flip, worse than always YES
- Anti-Meihua was the best strategy — betting against Meihua yielded +3.0% ROI
- On disagreement markets (37 cases): Control +3.9% ROI vs Meihua -3.9% ROI
- Transaction costs would erase all profits — maximum profit was $2.40 on $79 wagered
| Bucket | N | Control ROI | Meihua ROI |
|---|---|---|---|
| Low (15-30%) | 35 | +10.0% | -0.3% |
| Below-mid (30-50%) | 17 | -19.4% | +2.3% |
| Above-mid (50-70%) | 15 | -8.1% | -12.5% |
| High (70-85%) | 12 | +12.4% | -6.7% |
A critical finding is that identical hexagram + moving line combinations appeared across multiple markets yet produced different outcomes:
噬嗑(21)→頤(27) 第6爻「何校滅耳,凶」 (4 occurrences):
| Market | Outcome | Prediction |
|---|---|---|
| Musk net worth $670B | YES | YES ✅ |
| Tesla beat earnings | YES | NO ❌ |
| DoorDash 900M orders | YES | NO ❌ |
| NVIDIA dip to $176 | NO | YES ❌ |
豐(55)→小過(62) 第3爻 (4 occurrences):
| Market | Outcome | Prediction |
|---|---|---|
| US 7-8 strikes | YES | NO ❌ |
| BTC $86-88K range | NO | NO ✅ |
| Gold >$5100 | NO | NO ✅ |
| Ja Morant traded | NO | NO ✅ |
大有(14)→大畜(26) 第6爻「自天祐之,吉无不利」 (2 occurrences):
| Market | Outcome |
|---|---|
| BTC $88-90K range | YES |
| Australia T20 World Cup | NO |
"Heaven protects, nothing unfavorable" — yet opposite outcomes.
The Phase 2 batch method used a single hexagram reading for 70 questions. This approach has a fundamental methodological flaw: traditional Meihua Yishu requires a unique hexagram cast for each individual question, with the casting moment reflecting the specific circumstances of that query. Batch-applying one hexagram to many unrelated questions violates the core principle of 一事一占 (one question, one divination).
The poetic and metaphorical nature of yao ci (爻辭) and hexagram imagery makes it possible to construct a plausible narrative for any outcome:
Example: 噬嗑 (Biting Through), Line 6: "何校滅耳,凶"
- For a YES outcome: "Biting through obstacles -- the force of determination prevails"
- For a NO outcome: "Excessive punishment and destruction -- the situation deteriorates"
Both interpretations are internally coherent. Sufficiently vague symbolic language can be fitted to any outcome after the fact.
| Relationship | Traditional Meaning | N | Accuracy |
|---|---|---|---|
| 用生體 (Resource supports subject) | Very auspicious | 12 | 50.0% |
| 體克用 (Subject controls object) | Auspicious | 15 | 53.3% |
| 比和 (Harmony) | Neutral-positive | 8 | 50.0% |
| 用克體 (Object attacks subject) | Inauspicious | 11 | 45.5% |
| 體生用 (Subject drains into object) | Draining | 9 | 55.6% |
No Ti-Yong relationship category reliably predicts outcomes. All cluster around 50% +/- noise.
A fundamental issue emerged: 凶 (inauspicious) and 吉 (auspicious) are perspective-dependent. When asking "Will Tesla beat earnings?":
- 凶 for the asker → NO
- 凶 for Tesla → Tesla has problems, but might still beat
- 凶 for the market → market drops, but Tesla might still beat
With an AI doing the casting, there is no clear "asker" with a personal stake, making 凶/吉 directionality ambiguous.
Finding 1: Meihua Yishu provides no predictive value for binary market outcomes. 50.6% accuracy across 79 resolved markets is statistically indistinguishable from chance. Permutation tests show no correlation between any hexagram property and outcomes (all p > 0.60).
Finding 2: The control agent's 72.2% accuracy was not genuine forecasting skill. It achieved this by following market consensus 96.2% of the time. Market consensus alone would have scored 73.4%.
Finding 3: Identical hexagrams produce contradictory outcomes. The same hexagram + line combinations appeared multiple times with different results, undermining the premise that hexagram structure encodes predictive information.
Finding 4: More elaborate hexagram analysis does not improve accuracy. Full reinterpretation using 互卦, 變卦, and 通關 scored worse (48.1%) than the simpler initial interpretation (50.6%).
Finding 5: Binary YES/NO is likely the wrong use case for hexagram interpretation. Traditional Meihua Yishu provides qualitative situational guidance, not binary predictions. Forcing hexagrams into a binary framework strips away the nuance practitioners consider essential.
Finding 6: Neither strategy would make money betting. The control's +0.6% ROI and the anti-Meihua's +3.0% ROI would both be erased by transaction costs.
| Limitation | Impact | Severity |
|---|---|---|
| Sample size (79 resolved) below the 200+ recommended for publication | Reduced power to detect small effects | Moderate |
| AI interpretation, not human practitioner | May not capture intuitive elements | Significant |
| Binary outcomes only | Does not test qualitative hexagram value | Significant |
| Phase 2 batch method is non-traditional | May underperform individual casting | Moderate |
| 21 markets still unresolved | Final numbers may shift slightly | Low |
| No pre-registration on OSF | Vulnerability to HARKing criticism | Moderate |
| 凶/吉 perspective ambiguity with AI casting | Directional interpretation unclear | Moderate |
- It does not prove Meihua Yishu is "wrong" in all contexts -- only that it does not predict binary market outcomes better than chance.
- It does not test the qualitative advisory value of hexagram readings.
- It does not test a skilled human practitioner's interpretation, only an AI agent's application.
- It does not rule out Meihua working in domains not tested here (personal decisions, timing, seasonal guidance).
The null hypothesis cannot be rejected for the Meihua agent: there is no detectable predictive signal from Meihua Yishu hexagram analysis for binary prediction market outcomes. The Meihua agent performed at chance level (50.6%) while the control agent, by simply echoing market consensus, achieved 72.2%.
A follow-up experiment has been designed to test whether hexagrams describe situations qualitatively (not binary YES/NO). See QUALITATIVE-EXPERIMENT-PLAN.md for the full protocol. Key design:
- 12 upcoming events with rich narratives
- Each event gets a properly cast hexagram + a random control hexagram
- Blinded interpretation before the event
- Blinded scoring after the event on a 0-5 qualitative match scale
- Wilcoxon signed-rank test for paired comparison
- Do not use hexagrams for binary prediction -- the data is conclusive
- Test qualitative match -- this tests what the system actually claims to do
- Use proper single-question casting -- fix the batch methodology flaw
- Consider human practitioner testing -- AI may not capture intuitive elements
| Date | Event |
|---|---|
| 2026-01-18 | Experiment created; Batch 1 (30 markets) selected; pilot predictions made |
| 2026-01-21 | First resolutions (Bitcoin $98K, Han Duck Soo sentencing) |
| 2026-01-22 | Oscar nomination resolutions (Hawke, Zhao, del Toro) |
| 2026-01-25 | Pilot review (10/30 resolved); Batch 2 (70 markets) added |
| 2026-01-25 | Phase 2 Meihua predictions generated via 取象起卦 batch method |
| 2026-01-28 -- 2026-02-15 | Markets resolve progressively |
| 2026-02-24 | Final analysis: 79/100 resolved, 21 still open |
| Market | End Date | Current YES% | Control | Meihua |
|---|---|---|---|---|
| Fed rate cut March | 2026-03-18 | 3.5% | NO | NO |
| SpaceX chopstick catch | 2026-01-31 | 16.8% | NO | NO |
| Teyana Taylor Best Supporting Actress | 2026-03-15 | 50.0% | YES | YES |
| No change Fed rates March | 2026-03-18 | 96.2% | YES | NO |
| Fed decrease 25bps March | 2026-03-18 | 2.2% | NO | NO |
| Nothing Ever Happens: Interest Rates | 2026-03-20 | 92.0% | NO | NO |
| US collect < $100B revenue 2025 | 2026-02-28 | 96.2% | YES | YES |
| John Cornyn TX Primary | 2026-03-03 | 15.0% | NO | NO |
| Sentimental Value Best Intl Feature | 2026-03-15 | 66.0% | YES | NO |
| One Battle Best Cinematography | 2026-03-15 | 64.0% | NO | NO |
| Frankenstein Best Production Design | 2026-03-15 | 89.5% | YES | YES |
| Secret Agent Best Intl Feature | 2026-03-15 | 27.0% | NO | YES |
| Skarsgard Best Supporting Actor | 2026-03-15 | 30.5% | YES | NO |
| MegaETH airdrop Feb 28 | 2026-03-01 | 0.1% | YES | NO |
| Clinton contempt of Congress | 2026-02-28 | 1.4% | YES | NO |
| Trump says "Monroe" SOTU | 2026-02-24 | 59.0% | YES | NO |
| Trump says "Crypto" SOTU | 2026-01-31 | 16.0% | NO | NO |
| Trump says "Reagan" SOTU | 2026-02-24 | 31.5% | YES | NO |
| Iran strike Israel Feb 28 | 2026-02-28 | 9.5% | NO | NO |
| Italy joins Board of Peace | 2026-02-28 | 4.8% | NO | YES |
| Arsenal Carabao Cup | 2026-03-26 | 55.0% | NO | YES |
| File | Description |
|---|---|
data/experiment_state.json |
Complete experiment state with all predictions, outcomes, hashes |
data/phase1_meihua_predictions.json |
Phase 2 batch Meihua predictions (70 markets) |
data/prediction_summary.md |
Human-readable prediction summary |
data/prediction_hashes.json |
SHA-256 hashes of predictions (integrity verification) |
data/PILOT-RESULTS.md |
Pilot phase detailed results |
EXPERIMENT-PLAN.md |
Full experiment design and methodology |
FINAL-REPORT.md |
This report |
BETTING-ANALYSIS.md |
Detailed betting profitability analysis |
QUALITATIVE-EXPERIMENT-PLAN.md |
Next experiment: qualitative match test |
This report was generated from experiment data as of February 24, 2026. Twenty-one markets remain unresolved and may slightly adjust final accuracy figures when they close. The core conclusions are robust to these pending resolutions given the magnitude of the observed effects.