The replication package of the article Saturated by Commerce: A Computational Analysis of Eighteenth-Century British Political Discourse. This public version of the repository lacks many of the datasets that we can not share publicly due to proprietary reasons. If you need the full replication package for replication purposes, contact iiro.tiihonen@helsinki.fi.
Contains the scripts that produce the preprocessing, data outputs and downstream results (e.g. figures) related to the article. Here, we provide an overview about the role of each script.
The script Data Initialisation first transforms some of the input datasets to the parquet-format, making them easier to use later. The script Create Index Reference Document Groups subsets a group of documents from the ECCO that can be used to verify the effectiveness of the topical and thematic indexes used in the article. Keyword Group Aggregation and Normalisation uses the aforementioned to produce (normalised) versions of the index data sets, and additionally produces demonstrative and evaluative results like the evaluation plot of the article. ESTC Subset Editions From England and Scotland subsets the editions published in Britain (England, Scotland, Wales), that will be used to filter documents in the other downsteam analyses, as our analysis is limited to Britain.
The script ECCO Index Temporal Developments produces the univariate and tripartite (e.g. individual indexes and complete hypotheses) analyses and figures of the article, whereas the scripts ECCO Temporal Correlations … produce the bipartite (e.g. connections between any two indexes) index analyses. The script Keyword Lexical Level Co-Occurrences produces the analyses of individual keywords of interest.
Input data sets from other repositories. Contains ESTC and EEBO metadata from the related R-packages, as well as geo-information from https://github.com/COMHIS/estc-places. The keyword data for ECCO documents, as well as the manually annotated sample related to it, comes from https://github.com/COMHIS/ecco_keywords.
Files created during the process, but not part of the 'final' data sets analysed as such.
The datasets that are the basis of the figures and tables of the article. Includes the normalised index scores for documents and document chunks in a parquet format and further processed temporal data about patterns of lexical and index-level co-occurrences.
Figures of article.. Also includes some additional plots that were produced but were not strictly necessary for the final paper.
Add a more detailed documentation of the repository.