Skip to content

Update BibTeX files and integrate extracted talks from tenure packet#73

Open
jpfairbanks wants to merge 5 commits into
mainfrom
jpf/word-citation-extraction
Open

Update BibTeX files and integrate extracted talks from tenure packet#73
jpfairbanks wants to merge 5 commits into
mainfrom
jpf/word-citation-extraction

Conversation

@jpfairbanks

Copy link
Copy Markdown
Member

I updated all my talks in the UF faculty analytics system, which can produce APA formatted word files. Talks from this can be extracted CSL JSON and then used on the website.

Copilot AI review requested due to automatic review settings June 24, 2026 12:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an extracted-talks bibliography source and wiring in Quarto, plus updates multiple CSL-JSON bibliography files to incorporate revised metadata generated from UF Faculty Analytics exports.

Changes:

  • Add a Pandoc Lua filter (lua/extract_csl.lua) to extract APA-style talk citations from DOCX into CSL-JSON.
  • Add a new talks bibliography file (assets/bib/extracted_talks.json) and include it in bibtable.qmd / bibliography.qmd.
  • Update existing bibliography JSON files with revised fields (venues, dates, identifiers), and remove the legacy bibliography.bib.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
lua/extract_csl.lua New Pandoc Lua filter to extract CSL-JSON entries from DOCX talk lists.
csl-output-testing.json Added a sample extracted CSL-JSON output for testing/inspection.
bibtable.qmd Adds assets/bib/extracted_talks.json to the table bibliography inputs/resources.
bibliography.qmd Adds a new talks bibliography topic and a new rendered section for it.
bibliography.bib Removes the legacy BibTeX file previously referenced by site config.
assets/bib/extracted_talks.json Adds extracted talks dataset used by Quarto citeproc/multibib.
assets/bib/cv_talks.json Removes a miscategorized non-talk entry from talks JSON.
assets/bib/cv_proceedings.json Updates proceedings metadata (status, dates, identifiers, URLs/DOIs, etc.).
assets/bib/cv_preprints.json Removes a preprint entry from the preprints JSON.
assets/bib/cv_posters.json Adjusts poster metadata (but introduces typos/name-field swap).
assets/bib/cv_journals.json Adds/updates journal metadata (but introduces a DOI formatting issue).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bibliography.qmd Outdated
Comment on lines +13 to +15
nocite:
- "@*"
- "@talks:*"
Comment thread bibliography.qmd Outdated
Comment on lines +18 to +25
@@ -19,6 +22,7 @@ resources:
- assets/bib/cv_posters.json
- assets/bib/cv_preprints.json
- assets/bib/apa-cv.csl
- csl-output.json
Comment thread bibliography.qmd
talk: assets/bib/cv_talks.json
poster: assets/bib/cv_posters.json
preprint: assets/bib/cv_preprints.json
talks: assets/bib/extracted_talks.json
Comment thread assets/bib/cv_posters.json Outdated
Comment thread assets/bib/cv_posters.json Outdated
Comment on lines 57 to 59
"family": "Evan",
"given": "Patterson"
}
Comment thread assets/bib/cv_journals.json Outdated
Comment thread assets/bib/cv_proceedings.json Outdated
Comment thread assets/bib/cv_proceedings.json Outdated
Comment thread lua/extract_csl.lua
Comment on lines +56 to +62
else
local parts = {}
for k, v in pairs(val) do
table.insert(parts, '"' .. json_escape(k) .. '":' .. encode(v))
end
return "{" .. table.concat(parts, ",") .. "}"
end
Comment thread lua/extract_csl.lua Outdated
Comment on lines +74 to +91
-- --------------------------------------------------------------------
-- APA presentation patterns (Lua patterns, not full PCRE)
-- -------------------------------------------------------
-- The patterns are deliberately permissive – they just need to capture
-- the fields we care about. They are applied to the plain‑text content
-- of a paragraph (i.e. after Pandoc has stripped formatting).
-- --------------------------------------------------------------------
local patterns = {
-- General pattern for numbered APA entries produced by Pandoc from DOCX.
-- Captures:
-- 1) author string (up to the period before the date parentheses)
-- 2) year
-- 3) month/day string (or just month)
-- 4) title (plain text, may contain commas, ends with a period)
-- 5) event (conference/journal etc.)
-- 6) location (city/state/country)
"^%s*%d+%.%s*(.-)%s*%((%d%d%d%d),%s*([^%)]+)%)%.%s*(.-)%.%s*([^,]+),%s+(.+)%.$",
}
@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants