Skip to content

feat(pdf): use pdf metadata title as semantic identifier#8

Closed
razvanMiu wants to merge 1 commit intoeeafrom
292759_pdf_title
Closed

feat(pdf): use pdf metadata title as semantic identifier#8
razvanMiu wants to merge 1 commit intoeeafrom
292759_pdf_title

Conversation

@razvanMiu
Copy link
Copy Markdown

@razvanMiu razvanMiu commented Oct 24, 2025

Description

For PDFs fetched by the Web connector, set the document semantic_identifier to the PDF metadata title when available.

How Has This Been Tested?

  • PDF with title: Confirms semantic_identifier equals metadata title.
  • PDF without title: Confirms fallback to id from URL.
  • Non-PDF content: Confirms no change in behavior.
  • Mixed sitemap/pages: Ensures only PDFs are affected and logging remains clean.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 8, 2026

This PR is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.

@github-actions github-actions Bot added the Stale label Jan 8, 2026
@github-actions
Copy link
Copy Markdown

This PR was closed because it has been stalled for 90 days with no activity.

@github-actions github-actions Bot closed this Jan 16, 2026
@tiberiuichim tiberiuichim deleted the 292759_pdf_title branch March 2, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant