-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathvoynich_prosodic_split_report.txt
More file actions
328 lines (254 loc) · 14.1 KB
/
Copy pathvoynich_prosodic_split_report.txt
File metadata and controls
328 lines (254 loc) · 14.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
VOYNICH MANUSCRIPT — PROSODIC SPLIT ANALYSIS
======================================================================
Exploiting the 2-word alternation to separate rhythm from content.
Test: words at even positions (0, 2, 4, ...) vs odd positions (1, 3, 5, ...)
======================================================================
KEY FINDINGS
======================================================================
1. THE ALTERNATION IS WEAK AT THE CHARACTER LEVEL
For the biological section (6681 words, ~28,500 characters):
- Character frequency deltas between even/odd: all < 0.4%
- Character entropy: even=4.130 bits, odd=4.161 bits (negligible diff)
- Bigram entropy: even=6.108 bits, odd=6.182 bits
- Mean word length: even=4.26, odd=4.28
The character composition of even vs odd words is essentially identical.
Characters like 'M', 'd', 'g', 'z' flagged as "prosodic markers" by
extreme even/odd ratios are ALL low-frequency characters (29-123 counts)
and are mostly transcription artifacts (variant glyphs, uncertain readings).
Result: NO character-level prosodic markers exist. The alternation is
NOT carried by specific characters appearing preferentially at even or
odd positions.
2. THE ALTERNATION EXISTS AT THE WORD-TYPE LEVEL
Vocabulary exclusivity:
Biological: even-exclusive tokens = 27.7%, odd-exclusive = 29.4%
Herbal_A: even-exclusive tokens = 34.9%, odd-exclusive = 35.1%
Recipe_Stars: even-exclusive tokens = 35.5%, odd-exclusive = 36.3%
~28-36% of word tokens appear EXCLUSIVELY at either even or odd
positions. This is higher than chance for a vocabulary of this size,
but the most biased individual words have very low counts (5-24 total).
No single high-frequency word shows strong position preference.
Chi-squared tests:
2-char prefix: 340 (df=343) -- NOT significant
2-char suffix: 227 (df=210) -- borderline
Morphological templates: 61 (df=70) -- NOT significant
Word-pair boundaries: 307 (df=322) -- NOT significant
The chi-squared tests all land near their degrees of freedom,
meaning the even/odd distributions are statistically INDISTINGUISHABLE
at the word level too.
3. THE '4o' PREFIX IS NOT A PROSODIC MARKER
'4o' prefix frequency across sections:
Biological: 22.2% of words, even=763 odd=721 (bias=+42, +1.25%)
Herbal_A: 9.2% of words, even=471 odd=485 (bias=-14, -0.27%)
Recipe_Stars: 16.7% of words, even=812 odd=828 (bias=-16, -0.33%)
The '4o' prefix shows NO consistent position preference across sections.
In biological it slightly prefers even; in the other two it slightly
prefers odd. The bias magnitude is ~1% -- noise level.
4. AUTOCORRELATION CONFIRMS A REAL BUT SPECIFIC PATTERN
The autocorrelation analysis reveals the alternation exists, but not
as a simple even/odd vocabulary split. Instead:
BIOLOGICAL SECTION -- Features with lag-2 elevation:
Starts '4o': lag1=0.009, lag2=0.055, lag3=0.029 +0.036 elevation
Ends '9': lag1=0.000, lag2=0.031, lag3=0.031 +0.016 elevation
Ends 'am'/'an': lag1=-0.015, lag2=0.042, lag3=0.018 +0.040 elevation
Contains 'c': lag1=0.003, lag2=0.062, lag3=0.032 +0.044 elevation
The pattern: features that mark word ENDINGS (suffix '9', 'am', 'an')
and word-internal structure ('c' presence) show elevated autocorrelation
at lag 2, while word BEGINNINGS (first character) do not.
This means: every other word tends to share suffix/structural features
with words 2 positions away, but NOT with the immediately adjacent word.
The word length feature does NOT show lag-2 elevation -- the alternation
is about morphological composition, not word size.
5. CROSS-SECTION CONSISTENCY
The lag-2 elevation pattern differs across sections:
Biological (strongest):
'4o' start, '9' end, 'am/an' end, 'oh' start, 'c' contains all YES
Recipe_Stars (moderate):
'4o' start, '9' end, 'am/an' end, first char, 'c' contains YES
Herbal_A (weakest):
Only '4o' start and 'oh' start show weak elevation
The alternation is SECTION-DEPENDENT. Biological has the strongest
prosodic structure; Herbal_A has almost none.
6. WHAT THE ALTERNATION IS
The data supports a specific interpretation: the Voynich text alternates
between two morphological ROLES at a 2-word period. Every other word
tends to be:
Position A: More likely to end in 'am', 'an', contain 'c', start with '4o'
Position B: More likely to end in '9', 'y'
But these are TENDENCIES, not hard rules. The effect size is 1-4%
per feature. The alternation is probabilistic, not deterministic.
This is consistent with:
- A grammatical pattern (e.g., noun-modifier or subject-verb alternation)
- A formulaic/liturgical pattern (response structure)
- A table/list with alternating column types
It is NOT consistent with:
- A cipher system that embeds rhythm in specific characters
- Paragraph marks or spacing conventions
- Transcription artifacts
7. PROSODIC MARKER ASSESSMENT
The characters identified at threshold=0.3 as "prosodic markers" were:
'M' -- EVA capital (variant glyph), 29 occurrences, prefers odd
'd' -- standard EVA character, 26 occurrences, prefers odd
'6' -- EVA digit (glyph shape), 21 occurrences, prefers odd
'g' -- standard EVA character, 123 occurrences, prefers even
'z' -- standard EVA character, 37 occurrences, prefers even
'%' -- EVA uncertain reading mark, 23 occurrences, prefers odd
'A' -- EVA capital (variant glyph), 163 occurrences, prefers odd
These are NOT prosodic markers. They are low-frequency characters whose
apparent position bias is statistical noise. 'A' (163 occurrences) has
a log2 ratio of only -0.37, meaning odd positions have ~29% more 'A'
than even -- but on such small counts this is within random fluctuation.
EVA special characters (',' '(' '%' '$' '+' '*' '!') show no systematic
even/odd preference. The comma ',' (plant intrusion / uncertain boundary)
is the most common special char (349 occurrences) with a weak odd bias
(even=157, odd=192, delta=-0.24%) -- barely above noise.
Cross-section comparison of markers at threshold=0.3:
Shared bio+herbal_a: '%', 'd', 'g', 'z'
Shared bio+recipe: '6', 'A'
No character is a prosodic marker in ALL three sections.
======================================================================
STRIPPED TEXT FILES
======================================================================
Since no meaningful character-level markers were found, the "content
only" and "rhythm only" split was done at the WORD level:
Files saved:
voynich_bio_full -- all words, dot-separated (35,185 bytes)
voynich_bio_content_only -- '4o' prefix stripped from all words (32,239 bytes)
voynich_bio_even_only -- even-position words only (17,560 bytes)
voynich_bio_odd_only -- odd-position words only (17,624 bytes)
voynich_bio_rhythm_only -- binary stream: 1 if word has '4o', 0 otherwise (6,681 bytes)
Spectral analysis (frequency analysis tool) running on all five.
Results appended below when complete.
======================================================================
DEFINITIVE TRANSITION ANALYSIS
======================================================================
The lag-1 vs lag-2 conditional probability test confirms the alternation
is specifically period-2, not simple adjacency clustering:
BIOLOGICAL (6681 words):
Feature Base Lag1_excess Lag2_excess Lag2 dominates?
ends-9 0.499 +0.0003 +0.0313 YES
ends-am/an 0.129 -0.0145 +0.0416 YES
starts-4o 0.222 +0.0091 +0.0552 YES
contains-c 0.383 +0.0028 +0.0616 YES
starts-oh 0.048 +0.0403 +0.0435 no
The key finding: words ending in '9' show ZERO lag-1 excess (+0.0003)
but SUBSTANTIAL lag-2 excess (+0.0313). This means:
- A word ending in '9' tells you NOTHING about whether the NEXT word
ends in '9' (lag-1 is zero)
- But it PREDICTS that the word 2 positions later will also end in '9'
with +3.1% above base rate
This is the textbook signature of period-2 alternation: the immediate
neighbor is anti-correlated or neutral, but the skip-one neighbor is
positively correlated.
HERBAL_A (10430 words):
Feature Base Lag1_excess Lag2_excess Lag2 dominates?
ends-9 0.354 +0.1112 +0.0867 no
ends-am/an 0.141 +0.0143 +0.0127 no
starts-4o 0.092 +0.0373 +0.0338 no
contains-c 0.191 +0.1484 +0.1251 no
Herbal_A shows CLUSTERING instead of alternation. Lag-1 excess dominates
for all features. Words ending in '9' strongly predict the NEXT word
also ends in '9' (+11.1% excess). This is run-based structure, not
prosodic alternation. The herbal section has different compositional
rules than the biological section.
RECIPE_STARS (9803 words):
Feature Base Lag1_excess Lag2_excess Lag2 dominates?
ends-9 0.378 +0.0793 +0.0593 no
ends-am/an 0.187 -0.0351 +0.0380 YES
contains-c 0.335 +0.0468 +0.0696 YES
starts-oh 0.063 +0.0126 +0.0317 YES
Recipe_Stars is mixed: suffix 'am/an' and 'c'-content show lag-2
dominance (alternation), while '9'-ending shows lag-1 dominance
(clustering). The prosodic alternation exists but coexists with
adjacency patterns.
INTERPRETATION:
The biological section has genuine period-2 prosodic alternation,
concentrated in suffix morphology. The herbal section does not.
The recipe/stars section has a weaker version of the same pattern.
This section-dependence suggests the alternation reflects CONTENT
organization (the biological section may describe paired structures
or alternating categories) rather than a universal grammatical rule.
======================================================================
SPECTRAL ANALYSIS RESULTS
======================================================================
PENDING -- 22 structural analysis processes competing on 8 cores (load avg 23.6).
The five spectral runs launched but have not completed after 20+ minutes
due to CPU saturation. Files are saved and ready; run when CPU frees:
# (local path removed)
[analysis tool] deep_results/voynich_bio_full
[analysis tool] deep_results/voynich_bio_content_only
[analysis tool] deep_results/voynich_bio_even_only
[analysis tool] deep_results/voynich_bio_odd_only
[analysis tool] deep_results/voynich_bio_rhythm_only
Expected: the even/odd split files should each still show harmonic
structure (the harmonics come from sub-word bigram competition, not
the word-level alternation). The rhythm-only binary stream should
show weaker harmonics since it's just 0s and 1s with ~22% base rate.
======================================================================
AUTOCORRELATION DETAIL DATA
======================================================================
See: deep_results/voynich_prosodic_autocorr.txt
Summary table -- lag-2 elevation (lag2 - avg(lag1, lag3)):
BIOLOGICAL:
Word length: -0.048 (NO)
Starts '4o': +0.036 (YES)
Ends '9': +0.016 (YES)
Ends 'am/an': +0.040 (YES)
Final char: +0.014 (YES)
First char: -0.004 (NO)
Starts 'oh': +0.008 (YES)
Contains 'c': +0.044 (YES)
HERBAL_A:
Word length: -0.026 (NO)
Starts '4o': +0.006 (YES, weak)
Ends '9': -0.014 (NO)
Ends 'am/an': -0.010 (NO)
Final char: -0.015 (NO)
First char: -0.013 (NO)
Starts 'oh': +0.012 (YES)
Contains 'c': -0.014 (NO)
RECIPE_STARS:
Word length: -0.043 (NO)
Starts '4o': +0.008 (YES)
Ends '9': +0.007 (YES)
Ends 'am/an': +0.044 (YES)
Final char: -0.000 (NO)
First char: +0.008 (YES)
Starts 'oh': +0.004 (NO)
Contains 'c': +0.023 (YES)
======================================================================
SHUFFLE PERMUTATION TEST
======================================================================
Testing whether the lag-2 excess is destroyed by word shuffling
(1000 random permutations):
Biological section:
Feature Original Shuffled_mean Shuffled_std z-score
ends-9 +0.0313 -0.0000 0.0123 2.6
contains-c +0.0616 +0.0000 0.0125 4.9
starts-4o +0.0552 -0.0003 0.0127 4.4
ends-am/an +0.0416 +0.0000 0.0119 3.5
ALL features show z > 2.0. The lag-2 pattern is REAL and POSITIONAL.
It is destroyed by shuffling, confirming it is a property of word
ORDER, not vocabulary composition.
The z-scores range from 2.6 (ends-9) to 4.9 (contains-c).
At z=4.9, p < 0.000001 — essentially impossible by chance.
======================================================================
CONCLUSION
======================================================================
The 2-word prosodic alternation in the Voynich manuscript is REAL but WEAK.
It manifests as:
- 1-4% probability shifts in suffix choice (am/an vs 9/y) at period 2
- Elevated autocorrelation at lag 2 for morphological features
- No character-level markers -- the alternation is purely at word composition level
It does NOT manifest as:
- Characters that encode rhythm (no char shows > 0.5% position bias)
- Vocabulary segregation (same words appear at both positions)
- Consistent structure across all sections (Herbal_A barely shows it)
The alternation is most naturally explained as GRAMMATICAL structure:
the text has a tendency to alternate between two word-form types,
analogous to how natural language alternates between content words
and function words, or between subjects and predicates.
This is a probabilistic linguistic feature, not a cipher artifact.
Critical finding: the alternation concentrates in word SUFFIXES, not
prefixes. The morphological ending of a word (am/an/9/y/oe) carries
the positional signal. This is consistent with inflectional morphology
in natural language, where suffixes encode grammatical role.