Meihua Yishu Prediction Validation Experiment: Final Analysis Report

Experiment ID: 20260118_071841 Date Range: January 18, 2026 -- February 24, 2026 Report Date: February 24, 2026 Status: Phase 1 Complete (Interim Analysis)

1. Executive Summary

This experiment tested whether Meihua Yishu (梅花易數), a traditional Chinese divination system based on the I Ching, provides statistically significant predictive value for binary outcome markets on Polymarket. Two AI agents -- one using the Meihua skill ("Meihua agent") and one without it ("Control agent") -- independently predicted the outcomes of 100 markets. Of these, 79 markets resolved with verifiable outcomes.

Bottom-line result: The Meihua divination system demonstrated no predictive value. The Meihua agent scored 40/79 (50.6%), statistically indistinguishable from a coin flip. The control agent scored 57/79 (72.2%), but this apparent skill was an artifact of following market consensus (96.2% agreement with the market's implied direction). Market consensus alone would have yielded 73.4% accuracy. When the two agents disagreed (37 cases), the control agent won 27 to the Meihua agent's 10 -- a decisive margin.

Permutation testing found no statistically significant correlation between hexagram properties and outcomes (p-values: 0.6059, 0.9433, 0.9482). The same hexagram-and-line combinations appeared across multiple markets with contradictory outcomes, and the poetic/metaphorical nature of the yao ci (爻辭) text allowed post-hoc rationalization for any result. The experiment strongly suggests that hexagram interpretation, at least as operationalized here, does not carry predictive information about binary real-world events.

2. Methodology

2.1 Experiment Design

The experiment employed a two-agent, paired-comparison architecture:

                     MARKET SELECTOR
        (Automated: Polymarket API + random sampling)
                          |
              +-----------+-----------+
              |                       |
     AGENT A (Control)        AGENT B (Meihua)
     - No Meihua skill        - Full Meihua skill
     - Same market info        - Same market info
     - Reasoning only          - Hexagram casting + 體用 analysis
              |                       |
              +-----------+-----------+
                          |
                      RECORDER
          (Timestamped, SHA-256 hashed predictions)

Both agents received identical market data: title, description, current probability, volume, liquidity, days until resolution, and category. The Meihua agent additionally received pre-computed casting inputs (number-based, text-based, direction-based, measurement-based) and the full Meihua Yishu skill for hexagram interpretation.

2.2 Market Selection Criteria

Criterion	Threshold
Outcome type	Binary (YES/NO)
Minimum volume	$1,000 USD
Probability range	15% -- 85%
Resolution timeframe	3 -- 60 days
Resolution type	Objective, verifiable

100 markets were selected across two batches:

Batch 1 (Pilot): 30 markets selected January 18, 2026
Batch 2 (Phase 1): 70 markets added January 25-28, 2026

2.3 Casting Methods

The primary casting method was number-based (數字起卦), using market data to derive trigrams:

Upper trigram: Volume last 2 digits mod 8
Lower trigram: Word count or trader count mod 8
Moving line: (upper + lower) mod 6 + 1

Phase 2 predictions also used a 取象起卦 (image-based) batch method where a single hexagram reading was cast and applied to interpret multiple questions simultaneously.

2.4 Statistical Framework

Primary test: McNemar's test for paired binary outcomes
Secondary: Permutation test for hexagram-outcome correlation
Effect size: Cohen's h
Significance threshold: alpha = 0.05 (two-tailed)
Integrity: SHA-256 hashes of all predictions logged before resolution

2.5 Resolution

Markets were resolved using Polymarket's official outcomes. Of 100 selected markets, 79 resolved during the analysis period, 21 remained open (primarily Academy Awards ceremony, Fed March meeting, and longer-dated markets).

3. Results

3.1 Overall Accuracy

Agent	Correct	Total Resolved	Accuracy
Control	57	79	72.2%
Meihua	40	79	50.6%
Market consensus (>50% implied direction)	58	79	73.4%
Random baseline (coin flip)	--	--	50.0%

The control agent's 72.2% is almost entirely attributable to market-following behavior: it agreed with market consensus (predicting YES when probability > 50%, NO when < 50%) in 96.2% of cases. Its accuracy of 72.2% is within 1.2 percentage points of the 73.4% that a pure market-consensus strategy would achieve.

The Meihua agent's 50.6% is statistically indistinguishable from chance (50%).

3.2 Batch-Level Accuracy

Batch	Control Accuracy	Meihua Accuracy
Batch 1 (30 markets, 27 resolved)	18/27 (66.7%)	12/27 (44.4%)
Batch 2 (70 markets, 45 resolved*)	~74%	23/45 (51.1%)

*Some batch 2 markets overlap with batch 1 scoring depending on resolution timing.

3.3 Agreement and Disagreement Analysis

Category	Count	Control Accuracy	Meihua Accuracy
Both predicted same	42	83.3%	83.3%
Agents disagreed	37	59.5%	13.5%

When the agents disagreed (37 cases):

Winner	Count	Percentage
Control correct, Meihua wrong	27	73.0%
Meihua correct, Control wrong	10	27.0%

3.4 McNemar's Test (Discordant Pairs)

The 2x2 contingency table of paired outcomes:

	Meihua Correct	Meihua Wrong
Control Correct	35	22
Control Wrong	5	17

Discordant pairs: 27 (Control correct, Meihua wrong) + 10 (Meihua correct, Control wrong) = 37
Under H0, discordant pairs should split 50/50
Observed: 27 vs 10, heavily favoring Control
p-value: < 0.01 (exact binomial test, two-sided)
Conclusion: Statistically significant difference, but in the WRONG direction -- Control outperforms Meihua

3.5 Full Reinterpretation with Extended Hexagram Analysis

A separate full reinterpretation attempt using 互卦 (mutual hexagram), 變卦 (transformed hexagram), and 通關 (bridge element) analysis was conducted on the Phase 2 data. This more thorough traditional analysis scored worse at 48.1%, primarily due to introducing a YES bias that incorrectly overrode the simpler NO predictions.

Method	Accuracy	Notes
Original Phase 2 (simple)	~50.6%	Strong NO bias
Full reinterpretation (互卦+變卦+通關)	48.1%	Introduced YES bias
Control agent	72.2%	Market consensus follower
Market consensus alone	73.4%	Theoretical baseline

4. Statistical Analysis

4.1 Permutation Tests

Permutation tests were conducted to determine whether any hexagram property correlated with prediction accuracy:

Test	Statistic	p-value	Interpretation
Hexagram energy direction vs. outcome	Chi-square	0.6059	Not significant
變卦 energy vs. outcome	Chi-square	0.9433	Not significant
爻辭 tone vs. outcome	Chi-square	0.9482	Not significant

All p-values are well above the 0.05 threshold. There is no statistically significant relationship between any hexagram property and the actual outcome.

4.2 Effect Size

Cohen's h (Meihua vs. Control): -0.45 (medium-large effect, favoring Control)
Cohen's h (Meihua vs. 50% random): +0.01 (negligible effect)
Interpretation: The Meihua agent performs indistinguishably from random and significantly worse than the control.

4.3 Confidence Intervals

Control accuracy: 72.2% (95% CI: 61.8% -- 81.1%)
Meihua accuracy: 50.6% (95% CI: 39.7% -- 61.5%)
Difference (Control - Meihua): 21.5% (95% CI: 8.1% -- 34.9%)
The 95% CI for Meihua accuracy fully contains 50%, consistent with chance performance.

4.4 Confidence Calibration

Meihua Confidence Range	N	Accuracy	Expected if Calibrated
0.40 -- 0.55	22	54.5%	47.5%
0.55 -- 0.70	31	48.4%	62.5%
0.70 -- 0.90	26	50.0%	80.0%

There is no calibration: high-confidence Meihua predictions are no more accurate than low-confidence ones.

5. Betting Profitability Analysis

5.1 Would These Predictions Make Money?

Strategy	Wins	Win%	Total P&L	ROI
Anti-Meihua (Contrarian)	39	49.4%	+$2.40	+3.0%
Market Consensus (>50%)	58	73.4%	+$1.44	+1.8%
Always NO	47	59.5%	+$0.88	+1.1%
Control Agent	57	72.2%	+$0.48	+0.6%
Always YES	32	40.5%	-$0.88	-1.1%
Meihua Agent	40	50.6%	-$2.40	-3.0%

5.2 Key Betting Insights

Control's 72.2% accuracy = only +0.6% ROI because the market already priced in its predictions
Meihua was the worst strategy — worse than coin flip, worse than always YES
Anti-Meihua was the best strategy — betting against Meihua yielded +3.0% ROI
On disagreement markets (37 cases): Control +3.9% ROI vs Meihua -3.9% ROI
Transaction costs would erase all profits — maximum profit was $2.40 on $79 wagered

5.3 Profitability by Market Probability Bucket

Bucket	N	Control ROI	Meihua ROI
Low (15-30%)	35	+10.0%	-0.3%
Below-mid (30-50%)	17	-19.4%	+2.3%
Above-mid (50-70%)	15	-8.1%	-12.5%
High (70-85%)	12	+12.4%	-6.7%

6. Qualitative Hexagram Analysis

6.1 Repeated Hexagram-Line Combinations

A critical finding is that identical hexagram + moving line combinations appeared across multiple markets yet produced different outcomes:

噬嗑(21)→頤(27) 第6爻「何校滅耳，凶」 (4 occurrences):

Market	Outcome	Prediction
Musk net worth $670B	YES	YES ✅
Tesla beat earnings	YES	NO ❌
DoorDash 900M orders	YES	NO ❌
NVIDIA dip to $176	NO	YES ❌

豐(55)→小過(62) 第3爻 (4 occurrences):

Market	Outcome	Prediction
US 7-8 strikes	YES	NO ❌
BTC $86-88K range	NO	NO ✅
Gold >$5100	NO	NO ✅
Ja Morant traded	NO	NO ✅

大有(14)→大畜(26) 第6爻「自天祐之，吉无不利」 (2 occurrences):

Market	Outcome
BTC $88-90K range	YES
Australia T20 World Cup	NO

"Heaven protects, nothing unfavorable" — yet opposite outcomes.

6.2 The 取象起卦 Batch Problem

The Phase 2 batch method used a single hexagram reading for 70 questions. This approach has a fundamental methodological flaw: traditional Meihua Yishu requires a unique hexagram cast for each individual question, with the casting moment reflecting the specific circumstances of that query. Batch-applying one hexagram to many unrelated questions violates the core principle of 一事一占 (one question, one divination).

6.3 Post-Hoc Rationalization (Barnum Effect)

The poetic and metaphorical nature of yao ci (爻辭) and hexagram imagery makes it possible to construct a plausible narrative for any outcome:

Example: 噬嗑 (Biting Through), Line 6: "何校滅耳，凶"

For a YES outcome: "Biting through obstacles -- the force of determination prevails"
For a NO outcome: "Excessive punishment and destruction -- the situation deteriorates"

Both interpretations are internally coherent. Sufficiently vague symbolic language can be fitted to any outcome after the fact.

6.4 Ti-Yong (體用) Relationship Accuracy

Relationship	Traditional Meaning	N	Accuracy
用生體 (Resource supports subject)	Very auspicious	12	50.0%
體克用 (Subject controls object)	Auspicious	15	53.3%
比和 (Harmony)	Neutral-positive	8	50.0%
用克體 (Object attacks subject)	Inauspicious	11	45.5%
體生用 (Subject drains into object)	Draining	9	55.6%

No Ti-Yong relationship category reliably predicts outcomes. All cluster around 50% +/- noise.

6.5 The 凶/吉 Perspective Problem

A fundamental issue emerged: 凶 (inauspicious) and 吉 (auspicious) are perspective-dependent. When asking "Will Tesla beat earnings?":

凶 for the asker → NO
凶 for Tesla → Tesla has problems, but might still beat
凶 for the market → market drops, but Tesla might still beat

With an AI doing the casting, there is no clear "asker" with a personal stake, making 凶/吉 directionality ambiguous.

7. Key Findings and Limitations

7.1 Key Findings

Finding 1: Meihua Yishu provides no predictive value for binary market outcomes. 50.6% accuracy across 79 resolved markets is statistically indistinguishable from chance. Permutation tests show no correlation between any hexagram property and outcomes (all p > 0.60).

Finding 2: The control agent's 72.2% accuracy was not genuine forecasting skill. It achieved this by following market consensus 96.2% of the time. Market consensus alone would have scored 73.4%.

Finding 3: Identical hexagrams produce contradictory outcomes. The same hexagram + line combinations appeared multiple times with different results, undermining the premise that hexagram structure encodes predictive information.

Finding 4: More elaborate hexagram analysis does not improve accuracy. Full reinterpretation using 互卦, 變卦, and 通關 scored worse (48.1%) than the simpler initial interpretation (50.6%).

Finding 5: Binary YES/NO is likely the wrong use case for hexagram interpretation. Traditional Meihua Yishu provides qualitative situational guidance, not binary predictions. Forcing hexagrams into a binary framework strips away the nuance practitioners consider essential.

Finding 6: Neither strategy would make money betting. The control's +0.6% ROI and the anti-Meihua's +3.0% ROI would both be erased by transaction costs.

7.2 Limitations

Limitation	Impact	Severity
Sample size (79 resolved) below the 200+ recommended for publication	Reduced power to detect small effects	Moderate
AI interpretation, not human practitioner	May not capture intuitive elements	Significant
Binary outcomes only	Does not test qualitative hexagram value	Significant
Phase 2 batch method is non-traditional	May underperform individual casting	Moderate
21 markets still unresolved	Final numbers may shift slightly	Low
No pre-registration on OSF	Vulnerability to HARKing criticism	Moderate
凶/吉 perspective ambiguity with AI casting	Directional interpretation unclear	Moderate

7.3 What This Experiment Does NOT Prove

It does not prove Meihua Yishu is "wrong" in all contexts -- only that it does not predict binary market outcomes better than chance.
It does not test the qualitative advisory value of hexagram readings.
It does not test a skilled human practitioner's interpretation, only an AI agent's application.
It does not rule out Meihua working in domains not tested here (personal decisions, timing, seasonal guidance).

8. Conclusions

8.1 Primary Conclusion

The null hypothesis cannot be rejected for the Meihua agent: there is no detectable predictive signal from Meihua Yishu hexagram analysis for binary prediction market outcomes. The Meihua agent performed at chance level (50.6%) while the control agent, by simply echoing market consensus, achieved 72.2%.

8.2 Next Experiment: Qualitative Match Test

A follow-up experiment has been designed to test whether hexagrams describe situations qualitatively (not binary YES/NO). See QUALITATIVE-EXPERIMENT-PLAN.md for the full protocol. Key design:

12 upcoming events with rich narratives
Each event gets a properly cast hexagram + a random control hexagram
Blinded interpretation before the event
Blinded scoring after the event on a 0-5 qualitative match scale
Wilcoxon signed-rank test for paired comparison

8.3 Recommendations

Do not use hexagrams for binary prediction -- the data is conclusive
Test qualitative match -- this tests what the system actually claims to do
Use proper single-question casting -- fix the batch methodology flaw
Consider human practitioner testing -- AI may not capture intuitive elements

Appendix A: Experiment Timeline

Date	Event
2026-01-18	Experiment created; Batch 1 (30 markets) selected; pilot predictions made
2026-01-21	First resolutions (Bitcoin $98K, Han Duck Soo sentencing)
2026-01-22	Oscar nomination resolutions (Hawke, Zhao, del Toro)
2026-01-25	Pilot review (10/30 resolved); Batch 2 (70 markets) added
2026-01-25	Phase 2 Meihua predictions generated via 取象起卦 batch method
2026-01-28 -- 2026-02-15	Markets resolve progressively
2026-02-24	Final analysis: 79/100 resolved, 21 still open

Appendix B: Unresolved Markets (21)

Market	End Date	Current YES%	Control	Meihua
Fed rate cut March	2026-03-18	3.5%	NO	NO
SpaceX chopstick catch	2026-01-31	16.8%	NO	NO
Teyana Taylor Best Supporting Actress	2026-03-15	50.0%	YES	YES
No change Fed rates March	2026-03-18	96.2%	YES	NO
Fed decrease 25bps March	2026-03-18	2.2%	NO	NO
Nothing Ever Happens: Interest Rates	2026-03-20	92.0%	NO	NO
US collect < $100B revenue 2025	2026-02-28	96.2%	YES	YES
John Cornyn TX Primary	2026-03-03	15.0%	NO	NO
Sentimental Value Best Intl Feature	2026-03-15	66.0%	YES	NO
One Battle Best Cinematography	2026-03-15	64.0%	NO	NO
Frankenstein Best Production Design	2026-03-15	89.5%	YES	YES
Secret Agent Best Intl Feature	2026-03-15	27.0%	NO	YES
Skarsgard Best Supporting Actor	2026-03-15	30.5%	YES	NO
MegaETH airdrop Feb 28	2026-03-01	0.1%	YES	NO
Clinton contempt of Congress	2026-02-28	1.4%	YES	NO
Trump says "Monroe" SOTU	2026-02-24	59.0%	YES	NO
Trump says "Crypto" SOTU	2026-01-31	16.0%	NO	NO
Trump says "Reagan" SOTU	2026-02-24	31.5%	YES	NO
Iran strike Israel Feb 28	2026-02-28	9.5%	NO	NO
Italy joins Board of Peace	2026-02-28	4.8%	NO	YES
Arsenal Carabao Cup	2026-03-26	55.0%	NO	YES

Appendix C: Files and Data

File	Description
`data/experiment_state.json`	Complete experiment state with all predictions, outcomes, hashes
`data/phase1_meihua_predictions.json`	Phase 2 batch Meihua predictions (70 markets)
`data/prediction_summary.md`	Human-readable prediction summary
`data/prediction_hashes.json`	SHA-256 hashes of predictions (integrity verification)
`data/PILOT-RESULTS.md`	Pilot phase detailed results
`EXPERIMENT-PLAN.md`	Full experiment design and methodology
`FINAL-REPORT.md`	This report
`BETTING-ANALYSIS.md`	Detailed betting profitability analysis
`QUALITATIVE-EXPERIMENT-PLAN.md`	Next experiment: qualitative match test

This report was generated from experiment data as of February 24, 2026. Twenty-one markets remain unresolved and may slightly adjust final accuracy figures when they close. The core conclusions are robust to these pending resolutions given the magnitude of the observed effects.

FilesExpand file tree

FINAL-REPORT.md

Latest commit

History