Skip to content

Fix parser line grouping across page breaks#180

Open
jasanpe wants to merge 1 commit into
xitanggg:mainfrom
jasanpe:fix-page-boundary-lines
Open

Fix parser line grouping across page breaks#180
jasanpe wants to merge 1 commit into
xitanggg:mainfrom
jasanpe:fix-page-boundary-lines

Conversation

@jasanpe

@jasanpe jasanpe commented May 28, 2026

Copy link
Copy Markdown

Summary

  • Preserve the PDF page number on parsed text items.
  • Flush the current line when text extraction moves to a new page, so a section heading at the top of the next page cannot be merged into the previous page's last line.
  • Add a regression test covering the page-break case and verifying the next-page heading is recognized as its own section.

Fixes #174.

Verification

  • npx jest src/app/lib/parse-resume-from-pdf/group-text-items-into-lines.test.ts --ci --runInBand
  • npm run test:ci -- --runInBand
  • npm run lint (passes with existing react-hooks/exhaustive-deps warning in src/app/lib/redux/hooks.tsx)
  • npm run build (passes with the same existing lint warning)

Note: npx tsc --noEmit currently reports missing module declarations for existing SVG/JPG imports under public/, while next build completes its normal type-check/build path successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser gets confused when a new section begins on a new page

1 participant