affix-decomposition_masked-priming/index.qmd at main · robpetrosino/affix-decomposition_masked-priming · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
title: "Affix priming: A large scale online study"
author:
  - name: Roberto Petrosino
    affiliations:
      - ref: NYUAD
    orcid: 0000-0002-8502-3070
    email: roberto.petrosino@nyu.edu
  - name: Jon Sprouse
    affiliations:
      - ref: NYUAD
    orcid: 0000-0003-4674-8092
    email: jon.sprouse@nyu.edu
  - name: Diogo Almeida
    affiliations:
      - ref: NYUAD
    orcid: 0000-0003-4674-8092
    email: diogo@nyu.edu
affiliations:
  - id: NYUAD
    name: New York University Abu Dhabi
    department: Psychology Program, Division of Science
    address: New York University Abu Dhabi
    city: Abu Dhabi
    country: United Arab Emirates
    postal-code: 129188
abstract: "abstract"
keywords:
  - masked repetition priming
  - affix priming
  - stem priming
  - online browser-based experiment
  - power analysis
bibliography: references.bib
editor:
  markdown:
    wrap: 72
---

# Introduction {#sec-intro}

```{r libraries}
#| echo: false
#| warning: false
#| error: false
#| message: false
library(osfr)
library(tidyverse)
library(knitr)
library(gt)
library(gtExtras)
library(rstatix)
library(lmerTest)
library(here)
```

While models of visual word recognition agree that morphologically complex words are decomposed into their morphological constituents in the process of lexical access, there is no consensus on the exact time course of this process [@Taft2004; @BaayenEtal1997], and the set of linguistic properties it is sensitive to [@RastleEtal2000; @FeldmanEtal1999; @FeldmanEtal2004; @Petrosino2024]. Across languages, it has been reported that a monomorphemic word (called target) is consistently recognized faster when preceded by either (a) a morphologically related word (called prime: e.g., *driver-DRIVE*) even when the latter is masked (i.e., visually presented for such a short time, 40-60 ms, so that subjects do not consciously perceive it); or (b) a seemingly, but not actually morphologically related prime (morphologically “opaque” words; e.g., *brother-BROTH*, which does not mean “someone who corns”, even though it displays the _-er_ agentive suffix). Crucially, a similar facilitation fails to obtain when the prime is only orthographically similar to the target, with no possible morphological parse [e.g., *brothel-BROTH*, where _-el_ is not an extant English suffix\; @RastleEtal2004]. This pattern of results has been argued to support the procedure of _morphological decomposition_ proposed in @Taft1994's _affix stripping model_ occurring early in word processing and prior to lexical access (i.e., at least before accessing the meaning of the whole word), as it seems to rely on the "morpho-orthographic" shape of morphemes (that is, the phono-orthographic sequence of letter strings associated with a given morpheme, since *-er* triggers decomposition, but *-el* does not), and does not depend on semantic interpretation (since *-er* elicits decomposition even in morphologically opaque, monomorphemic words like *brother*). In this study, we particularly focus on corollary argument of the affix stripping model, stating that only stems are activated during decomposition, whereas affixes are just stripped off [@TaftForster1975; @Taft1981], which predicts that, _with everything else being equal_, the masked _affix_ priming effect size be weaker than the masked _stem_ priming effect size. The results reported in the literature suggest that this prediction may be correct. On the one hand, masked stem priming has been consistently reported across languages, even with different systems of word formation [e.g., Semitic languages\: @FrostEtal2001; @BoudelaaMarslen-Wilson2001]. On the other hand, masked affix priming provides less robust results across different languages, though comparatively understudied  [among others, English\: @CrepaldiEtal2016; @Petrosino2024; French\: @GiraudoGrainger2003; Italian\: @GiraudoDalMaso2016; Spanish\: @DominguezEtal2010, @DunabeitiaEtal2008; for a review, @AmentaCrepaldi2012]. However, while masked affix priming results seem to be less robust than masked stem priming, it is hard to ascertain whether they are really null, as the affix stripping model would predict, or just much significantly smaller than stem priming effects, and therefore harder to detect without the right sample size. A recent study from our lab [@PetrosinoEtal2023] aimed to directly compare masked stem and suffix priming, while recruiting a larger sample ($N_{exp1a}=161$; $N_{exp2}=400$) online, in line with the recent trends in the visual priming literature [among others, *PsychoJS*\: @AngeleEtal2013; *Gorilla*: @Cayado2023; *Labvanced*: @PetrosinoEtal2023; @PetrosinoAlmeida2024]. The results showed that on the one hand, non-significant suffix priming effects were uniformly small and statistically, no matter whether shorter (33 ms) or longer (60 ms) prime durations were used (M~suffix = 2 ms); on the other hand, the stem priming effects were substantially larger and statistically significant, and varied as a function of the prime duration (M~stem_exp1 = 16 ms vs M~stem_exp2 = 31 ms), mirroring the dynamics of the identity condition (M~identity_exp1 = 25 ms vs M~identity_exp2 = 47 ms). These results seem to strongly support a differential nature of stems and affixes: at early stages of visual word processing, decomposition occurs on the basis of morpho-orthographic regularities, and eventually triggers lexical activation of the stem, thus leading to priming. However, suffixes do not seem to be activated at all – they are just “stripped off” of the stem, and do not appear to be used in lexical retrieval in the same way stems are, as the latter give rise to priming effects but the former do not. The present contribution builds on these results, and further delves into the two follow-up questions that were mentioned therein. First, we extend our investigation to prefix priming, and ask whether the location of the affix with respect to the stem impinges on decomposition and ultimately priming elicitation. Second, we acknowledge that the bulk of the literature of morphological priming is primarily based on English, a stem-based language where words may surface as phonologically identical to the underlying stems. This property of English is not very common cross-linguistically; in many languages (e.g., Romance languages) stems are instead always bound, in the sense that they never occur without at least one grammatical affix. Such an idiosyncratic property may ultimately hinder the more direct comparison between the affix and stem masked priming response, and therefore detection of potential differences between the two. The present study aims to address both issues, while ensure reproducibility and reliability of the results reported. To that end, we ran an extensive power analysis that allowed us to quantify the correct sample size to recruit to ensure acceptable power ($N=12,000$).

<!--->
TO INCLUDE:

1 - "Re" literature (Stockall)
2 - Bayersmann literature and mechanism on affix decomposition (with Grainger)
3 - Gaston et al 2023 (long distance priming)

<!--->

# Methods {#sec-methods}

## Preregistration {#sec-methods-prereg}

We preregistered the results of the goals, and the design and analysis plan of the study (including the relative power analysis for the sample size assessment) prior to data collection. The preregistration is available on Open Science Framework (<https://doi.org/10.17605/OSF.IO/EAD4G>).

## Sample size rationale {#sec-methods-power}

We ran an extensive power analysis based on four parameters: sample size, standard deviation, pairwise correlation between the related and unrelated conditions, and effect size. Out of the parameters aforementioned, only the sample size and the effect sizes are consistently reported in published studies, whereas the other two are very seldom reported, if ever. For this reason, our power analyses required testing a wide range combinations of these parameters. From several pilot online and in-lab experiments, we identified the range between 80 and 120 ms (with 10 ms increments) for the standard deviation, and between 0.7 and 0.9 (with 0.1 increments) for the correlation. As for the sample size, we selected range between 200 and 5,000 participants (with 150 unit increments). Finally, as for the effect size, we focused on the expected effect size of the masked affix priming response. All previous studies reported a null, or close-to-null raw effect ($\le 7~ms$), so we used the upper-bound limit as a single value in the simulations The four parameters aforementioned were used to simulate 10,000 datasets for each combination. For each dataset, we performed a dependent-samples *t*-test, and then calculated power as the average of the number of significant tests (i.e., with $p < \alpha$) obtained. The code used for the power simulations, along with the simulated datasets are available on OSF (<XXXXXXXX>). We identified a sample size of 3,300 participants for each experiment as a feasible sample size that would allow us to reach an acceptable statistical power ($> 80%$) in most combinations of parameters. Given the known limitations in time accuracy and precision of the current online stimulus delivery programs currently available [@PetrosinoAlmeida2024], the target sample size was maxed out to 6,000 for each experiment, so to ensure the largest sample size possible.

## Participants {#sec-methods-participants}

Twelve thousand participants were recruited on the Prolific online platform (<https://www.prolific.com>). Several criteria were selected to ensure recruitment of native speakers of U.S. English. Participants had to be born in the Unites States of America, speak English as their first and only language, and have no self-reported language-related disorder. We encouraged participants to avoid any sort of distraction throughout the experiment, and to close any program that may be running in the background. Because the experiment was run online, participants could not be monitored during data collection. Finally, to further reduce variability across participants' devices, we restricted the experiment to be run on Google Chrome only, which is the most used browser worldwide [@w3counterGlobalStats], and reportedly performs better than any other across operating systems [likely thanks to the _Blink_ engine\; see @LukacsGaspar2023].

## Design {#sec-methods-design}

The masked priming procedure for both experiments relied on a lexical decision task (LDT). The factorial design used was a a 8 (condition: _prefix_, _suffix_, _prefixed root_, _suffixed root_, _identity_, _orthographic_, _semantic_, _nonword-word foil_) x 2 (prime type: _related_ vs _unrelated_) design in experiment 1; and a 7 (condition: _prefixed stem_, _suffixed stem_, _identity_, _stem-from-prefix_, _stem-from-suffix_, _semantic_, _nonword-word foil_) x 2 (prime type: _related_ vs _unrelated_) design in experiment 2. A detailed description of the conditions can be found below. Both factors were manipulated within-subjects. The dependent variables were lexical decision latency (RT; in milliseconds) and error rate (in percentages).

## Materials {#sec-methods-materials}

The descriptive statistics of the word items used in both experiments are detailed in XXXXX. A complete list of the word items used in both experiments is reported in the appendix at the end of the paper.

### Experiment 1 {#sec-methods-materials-exp1}

Three hundred and thirty-six English words were selected from the English Lexicon Project database [ELP\; @balota2007], amounting to forty-eight words for each of the seven conditions tested in the experiment. In the four main conditions, primes and targets were all bimorphemic words morphologically related to one another. In the _prefix_ and _suffix_ conditions, prime and target words shared the same prefix (e.g., *unfair-UNCOMMON*) and suffix (e.g., *jogger-FREEZER*), respectively. In the _prefixed root_ and _suffixed root_ conditions, prime and target words shared the same stem but differing in the prefix (*disuse-MISUSE*) and the suffix (*lovable-LOVELESS*), respectively. Three additional conditions were also included to assess the impingement of identity, orthographic and semantic relatedness onto the priming response. The _identity_ condition (*scorpion-SCORPION*) consisted of monomorphemic words presented as both prime and target, and was meant to gauge the upper-bound baseline for priming effects. The _orthographic_ condition consisted of words sharing the same leftmost phono-orthographic syllabic unit (*electric-ELECTION*). The _semantic_ condition consisted of semantically related, but morphologically and orthographically unrelated words (*captive-PRISONER*). These pairs were chosen from the semantic priming project database by @HutchisonEtal2013, to further ensure reproducibility of the effects.

Word frequency [HAL\: @LundBurgess1996] and orthographic length could not be controlled across all seven conditions as a whole, so the seven conditions were split in two separate groups: group A consisted of the four morphological conditions and the identity condition; group B consisted of the orthographic and semantic conditions. In each groups, prime and target stimuli were matched as much as possible in word frequency (group 1A, primes: *F*(3,92)=2.08, $p=.11$; group A, targets: *F*(5,138)=1.36, $p=.25$; group B, primes: *F*(1,46)=0.31, $p=.58$; group 1B, targets: *F*(1,46)=0.76, $p=.39$) and length (group 1A, primes: *F*(3,92)=0.76, $p=.52$; group A, targets: *F*(5,138)=1.89, $p=.1$; group 1B, primes: *F*(1,46)=3.07, $p=.09$; group B, targets: *F*(1,46)=0.98, $p=.33$). For each condition, a set of twenty-four unrelated primes were also selected from the ELP database, and matched in HAL frequency and length with their respective related primes (*t*s < 1).

Finally, a _nonword-word foil_ condition (*ovetarm-SPONGE*) consisted of unrelated non-word primes and word targets, and was meant to further reduce the proportion of related prime-target pairs in the experiment.

Three-hundred and eighty-four non-words were randomly selected from the ELP database as well, all matching in length with the word stimuli (*F*s < 2). Half of these non-words were randomly selected to be presented as targets, of which twenty-four were preceded by an identity prime. The other half was instead used as unrelated primes.

### Experiment 2 {#sec-methods-materials-exp2}

The materials for experiment 2 consisted of two separate groups of conditions, for a total of 7 conditions. For the main group of conditions (group 2A), one hundred and forty-for English words were selected from the English Lexicon Project database [ELP\; @balota2007], amounting to forty-eight words for each of the three main conditions tested in the experiment. The two morphologically related conditions, the prime words were derivationally related to the corresponding target stems. In the _prefixed stem_ condition, the prime was a prefixed form of the target stem (*unsafe-SAFE*); in the the _suffixed stem_ condition, the prime was a suffixed form of the target stem (*playable-PLAY*). The _identity_ condition (*split-SPLIT*) consisted of monomorphemic words presented as both prime and target, and was meant to gauge the upper-bound baseline for priming effects. Word frequency [HAL\: @LundBurgess1996] and orthographic length were controlled as much as possible for both primes (HAL: *F*(1,46)=2.68, $p=.11$; length: *F*(1,46)=1.87, $p=.18$) and targets (HAL: *F*(2,69)=0.21, $p=.82$; length: *F*(2,69)=0.16, $p=.85$). Similarly to experiment 1, a _nonword-word foil_ condition (*ovetarm-SPONGE*) was included to further reduce the proportion of related prime-target pairs in the experiment. In this condition, twenty-four word targets were matched in frequency and length with the other targets ($F$s < 2) and paired with unrelated non-word primes.

The three conditions of group 2B were tested only to assess the amount of residual effects in the morphological conditions of experiment 1 (group 1A) that were due to any unwanted (i.e., non-morphological) source. The _stem-from-prefixed_ and _stem-from-suffixed_ conditions included the stems of the pairs used in the _prefix_ and _suffix_ conditions of experiment 1. Naturally, the two conditions could not be controlled in word frequency or length. However, taken as a whole, the primes and targets of the two conditions matched with a semantic condition in both HAL frequency (primes: *F*(1,70)=2.83, $p=.1$; targets: *F*(1,70)=0.56, $p=.46$) and length (primes: *F*(1,70)=2.7, $p=.11$; targets: *F*(1,70)=0.9, $p=.35$). As for the semantic condition in experiment 1, the pairs of the semantic condition in experiment 2 were also chosen from the semantic priming project database by @HutchisonEtal2013 (e.g., *blouse-SHIRT*). For each of the seven conditions described above, a set of twenty-four unrelated primes were also selected from the ELP database, and matched in HAL frequency and length with their respective related primes (*t*s < 1).

## Procedure {#sec-methods-proc}

For each experiment, we prepared two different wordlists that differed only in the relatedness of the prime with respect to the target; other than that, the two lists presented the same set of target words and non-words. In one list, the four conditions (high-frequency, mid-frequency, low-frequency word conditions, and the non-word condition) had 12 target items being preceded by themselves (the *related* primetype condition) and the remaining 12 target items being preceded by one of the unrelated primes belonging to the same frequency bin (the *unrelated* primetype condition). In the other list, the order was reversed. Experiment 1 consisted of 384 pairs; experiment 2 consisted of 336 pairs.

After being recruited, participants were asked to click on a link which redirected them to Labvanced. During the experiment, they were asked to perform a lexical decision task by pressing either the 'J' (for word) or 'F' (for non-word) keys on their keyboard. Each trial consisted of three different stimuli appear at the center of the screen: a series of hashes (#####) presented 500 ms, followed by a prime word presented for 33 ms, and finally the target word; the target word disappeared from the screen as soon as a decision was made. The motivation behind the choice of such a short prime duration (as compared to the literature, in which it is usually around 42 ms) is two-fold. First, several previous pilot experiments being run on the same platform showed that having a longer prime duration increased the number of trials with a prime duration above the subliminal threshold (usually set at 60 ms), which could trigger experiment-wide strategic influences onto the masked priming response. Second, setting such a short prime duration may ultimately maximize subliminal effects, thus ensuring the reliability of the results.

Subjects were also given 6 breaks throughout the experiment. When the experiment was over, the participants were then redirected to Prolific in order to validate their submission. The median time to finish the experiment was about 15 minutes. Each participant was paid with the standard rate of USD 9.5/hour.


# Data analysis {#sec-analysis}

Analysis scripts and an abridged version of the data collected can be
found on OSF (<#########>). We performed three different
steps of analyses (in sequential order), with the goal of gaining a
thorough understanding of the data collected
(`N` observations in
total). The first step of analyses is the novel analysis step that is
usually not included in the typical analysis pipeline for RT-based data,
and looks at at the distribution of the actual duration of prime words
for each trial for each subject. This additional analysis step is indeed
necessary to fully understand the performance capabilities of the engine
Labvanced relies on. The second and third step of analyses are instead
part of the typical analysis pipeline for RT-based data, looking at
subject performance and RT distribution, respectively.

### Subject and item performance {#sec-analysis-performance}

During the experiment, the duration of presentation of the prime word
was recorded online for every trial, as an additional measure necessary
for a thorough assessment of the stimulus delivery engine in terms of
reliability and variance of the duration of the presentation of the
stimuli, and in particular of the prime. The distribution of the prime
durations recorded in the experiment is shown in XXXXX
below. Both the mean (mean =
`N`) and the median (median =
`N`) of the prime duration were slightly
greater than the target prime duration (33 ms). This distribution
suggests that, while overall the web engine presented most trials at the
preset duration, it was not as precise and accurate as a local engine.
This was expected and likely due to the great variation in the
specifications of the devices used by the participants, and may likely
be impossible to control, at least at the current state of development
of the online platforms available at this time. However, in masked
priming, in which the duration of the prime is essential part of the
design itself, such fluctuations may indeed hinder proper elicitation of
the response. As a way to counteract the potential influence that such
fluctuations might have had on the priming response, we only kept trials
whose prime durations were within a pre-set range from the target
duration of 33 ms. Taking a standard 60-Hz monitor as reference, the
lower and the upper bounds was set at half of a full refresh cycle
(i.e., 8 ms) and at 60 ms respectively, so to keep trials that would not
fall below or above the subliminal threshold. Out of the
`N` observations
collected, only `N`% of the trials were
out of the range selected, the great majority of which
(`N`%) were above the range set. We
take this as further corroborating the argument that Labvanced is
capable to reliably present stimuli at short durations. However, by way
of ensuring the quality of the data collected and therefore reliability
of the results, recording of the actual durations of the prime stimuli
of each trial and removal of the out-of-range trials as a cautionary,
preliminary step in the analysis pipeline is recommended.

### Step 2: prime durations

Non-word trials were excluded from analysis a priori, as not strictly
relevant to the specific question being asked in the experiment. The
by-item word error rate revealed that `N` words
had an error rate higher than 30%, and they all belonged to the
low-frequency condition. The high number of words with a high error rate
was likely due to their low frequency (see XXXXXX
for further information), so we decided to remove the low-frequency
condition altogether, since removing the whole set of high-error words
(roughly corresponding to `N`% of
the total items of the low-frequency conditions) would have drastically
decreased of the number of observations for the low-frequency condition,
thus impinging on the reliability of the estimates of that condition.
After removing the entire low-frequency condition, we re-calculated word
error rates. only `N` mid-frequency words
(*`N`*) had an error rate higher than 30%, and
were removed from analysis. We then calculated the subject error rates.
Only `N` subject was removed because of their
overall error score was higher than 30%. After this cut, we also
calculated $d′$ [@GreenSwets1966] to assess participants' attentiveness
to the lexical decision task. The measure $d'$ is usually calculated as
the difference between the by-subject z-transformed percentages of hit
(i.e., a word correctly recognized as a word) and false alarm (i.e., a
non-word incorrectly recognized as a word) scores. A $d'$ value close to
zero generally indicates a lack of attentiveness/awareness of the
participant onto the stimulus. The distribution of $d'$ across
participants was never below 1.5, which suggested that all participants
were actively engaging with the task. Finally,
`N` subjects were removed because the number
of trials was less than half of the trials being presented within the
same condition (i.e., 25). This was an additional cautionary step to
avoid inaccurate estimates. After removing the incorrect responses, a
total of `N`
observations and `N` participants were
included in further analyses.

### Step 3: RT distribution

Finally, individual trials were excluded if the relative RT was below
200 ms and 1800 ms. `N`
observations were excluded at this stage of analysis
(`N`% of the
dataset), which led to a total of
`N` observations
that were actually included in the statistical analysis below.

## Results {#sec-exp-results}

For each frequency bin, priming effects were calculated for each subject
by subtracting the by-subject mean RT to the related sub-condition from
the by-subject mean RT to the unrelated sub-condition. Standardized
effect sizes (i.e., Cohen's *d*) were then calculated for each
condition. Finally, by-subject mean priming effects were grand-averaged
across subjects for each condition. XXXXX and
XXXXX below report the descriptive statistics of the
experiment. Both word conditions show non-null priming effects. The
priming magnitudes (and the relative standardize effect sizes) are
different in terms of magnitude and sign. The high and mid-frequency
word conditions show similar positive effects, with the mid-frequency
priming effects being 6-ms greater that the high-frequency priming
effects.

We ran one *t*-test for each frequency condition, which revealed
significant results for both conditions ($p<.001$). A separate paired
*t*-test comparing the priming magnitudes of between the high-frequency
and low-frequency conditions was instead non-significant ($p=$
`N`).
The statistic values, and *p*-values of both analyses are reported in
XXXXXX.

## Discussion {#sec-exp1-discussion}


# General discussion {#sec-discussion}

# References {.unnumbered}

::: {#refs}
:::

\newpage

# Wordlists {.unnumbered}


### Experiment 1 {.unnumbered}

### Experiment 2 {.unnumbered}