Skip to content

Conversation

@vaind
Copy link
Contributor

@vaind vaind commented Dec 30, 2025

Summary

I've randomely found a broken link in the docs so I went ahead and checked all of them with lychee. There were 54 of them - I've fixed some straightforward ones right away in #15894 and left some ambiguous ones for SDK maintainers who would know better (the summary is part of a weekly or manual workflow run)

This PR adds automated external link checking to catch broken links in documentation using lychee.

What's included

  • GitHub workflow (.github/workflows/lint-external-links.yml)

    • On PRs: checks changed markdown files, fails if broken links found
    • Weekly scheduled: checks all docs, results visible in job summary
    • Manual trigger: run full check anytime via workflow dispatch
  • Pre-commit hook for local validation (warn-only, doesn't block commits)

  • Configuration files

    • lychee.toml - Link checker settings (timeouts, retries, accepted status codes, caching)
    • .lycheeignore - URL patterns to ignore (examples, bot-blocking sites, TLS-incompatible sites)

Caching strategy

Weekly scheduled runs populate the cache, PR checks consume it:

  • Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs
  • Transient errors (429 rate limits, 5xx server errors) are NOT cached, so they get retried
  • Cache lifetime is just under 2 weeks so it survives between weekly runs

Current state

There are broken external links in the docs that will need to be fixed separately - this PR just adds the tooling to detect them.

IS YOUR CHANGE URGENT?

  • Urgent deadline (GA date, etc.):
  • Other deadline:
  • None: Not urgent, can wait up to 1 week+

PRE-MERGE CHECKLIST

  • Checked Vercel preview for correctness, including links
  • PR was reviewed and approved by any necessary SMEs (subject matter experts)
  • PR was reviewed and approved by a member of the Sentry docs team

vaind and others added 9 commits December 30, 2025 14:47
Configures lychee link checker with:
- Rate limiting and retry settings
- Custom user agent to avoid bot blocking
- Cache settings to reduce load on external sites
- Ignore patterns for placeholder URLs, localhost, and sites
  that block automated checkers (Twitter, LinkedIn, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Uses lychee to validate external links in documentation.

Triggers:
- Weekly cron (Sunday 2 AM UTC): Creates/updates GitHub issue
- Manual dispatch: Optionally fails on broken links
- Pull requests: Adds non-blocking comment with report

The workflow caches results to reduce load on external sites
and does not block PRs (external link failures are often
transient or false positives).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add section explaining the relationship between internal link
checking (this script) and external link checking (lychee).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds a warn-only pre-commit hook that checks external links in
changed markdown files using lychee. The hook:
- Only runs on docs/ and develop-docs/ markdown files
- Shows warnings but doesn't block commits
- Gracefully handles missing lychee installation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add instructions for running lychee locally and document the
pre-commit hook behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove separate shell script and use inline bash command with
|| true to achieve warn-only behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace bash one-liner with TypeScript script for Windows
compatibility. Uses bun like other scripts in the repo.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Use git diff to get list of changed markdown files for PRs,
making the check faster. Full scans still run on schedule and
manual dispatch.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@vercel
Copy link

vercel bot commented Dec 30, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
develop-docs Ready Ready Preview, Comment Jan 5, 2026 10:43am
sentry-docs Ready Ready Preview, Comment Jan 5, 2026 10:43am

- Add scheme filter to only check http/https (skip root-relative links)
- Accept 403/418 status codes (bot blocking, freedesktop teapot)
- Add ignore patterns for:
  - Bot-blocking sites (npmjs, maven, medium, gitlab, epicgames)
  - Private resources (Notion, private GitHub repos, Zendesk)
  - Unstable docs (freedesktop)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Set base_url to docs.sentry.io so lychee can resolve root-relative
links, then exclude docs.sentry.io from checking (internal links
are already covered by lint-404s).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
After manually testing ERROR entries from lychee.log:

- bottlepy.org: TLS 1.3 only, incompatible with lychee's native-tls
- help.revise.dev: Cloudflare ECH required, fails even with curl
- dev.getsentry.net: Internal development URLs
- sentry-content-dashboard: Internal dashboard (401)
- godoc.org/pkg.go.dev: Rate-limited (429)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
vaind and others added 2 commits December 30, 2025 20:39
Changed from separate entries to using regex optional group (.+@)?
to match private IPs with or without credentials (e.g., [email protected]).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Split into two jobs for clarity:
- check-pr: PRs only, changed files, adds comment
- check-full: Schedule/manual, all files, creates issue

Removed caching (wasn't working with per-commit keys).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
vaind and others added 2 commits December 30, 2025 21:07
- Rename .lychee.toml to lychee.toml (default config name)
- Remove --config args since lychee.toml is auto-detected
- Simplify workflow: use '.' instead of listing directories
- Split workflow into separate PR and full-scan jobs
- Update PR job to update existing comment instead of creating new ones
- Update full-scan job to update existing issue instead of creating duplicates
- Add file existence checks before reading reports
- Use appropriate GitHub labels (Bug, Team: Docs, Product Area: Docs)
- Add proper permissions scoping per job

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
# Cache settings
#
# Strategy: Weekly scheduled runs populate the cache, PR checks consume it.
# - Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The comment in lychee.toml says 404 is a success, but the accept list omits it, causing link checker workflows to fail on 404 responses.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

The lychee.toml configuration contains a contradiction regarding the handling of 404 status codes. A comment on lines 49-50 states that 404 responses are considered successful and should be cached. However, the accept list on line 33 does not include 404. Since the lychee tool treats any status code not in the accept list as a broken link, this will cause CI workflows to fail when an external link returns a 404. This behavior contradicts the documented caching strategy and can block pull requests unnecessarily.

💡 Suggested Fix

To align the configuration with the documented intent, add 404 to the accept list in lychee.toml. Alternatively, if 404 responses should be treated as failures, remove the mention of 404 from the comment describing the caching strategy.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: lychee.toml#L49

Potential issue: The `lychee.toml` configuration contains a contradiction regarding the
handling of `404` status codes. A comment on lines 49-50 states that `404` responses are
considered successful and should be cached. However, the `accept` list on line 33 does
not include `404`. Since the lychee tool treats any status code not in the `accept` list
as a broken link, this will cause CI workflows to fail when an external link returns a
404. This behavior contradicts the documented caching strategy and can block pull
requests unnecessarily.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8102799

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404 is a response that is successful in terms of giving us the relevant information (does the link point to an existing page or not). Not that it should be accepted as a valid link.

Change from types_or: [markdown] to files pattern so both .md and .mdx
files are checked locally, matching the CI workflow behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
vaind added a commit that referenced this pull request Jan 2, 2026
<!-- Use this checklist to make sure your PR is ready for merge. You may
delete any sections you don't need. -->

## DESCRIBE YOUR PR
Follow up from #15893
Updates various broken external links found by the link checker:

- Fix Django REST Framework serializer docs URL
- Update Sentry options.py path (master→main, correct directory)
- Fix Transifex translation project URL
- Replace deprecated Flux docs link with GitHub archive
- Fix OpenTelemetry semantic conventions URLs
- Update Mailgun documentation URL
- Fix Ping Identity documentation URL
- Update Flagsmith integration documentation URL
- Fix Apple SDK troubleshooting Swift issue reference
- Fix Xamarin SSL certificate issue reference
- Update Remix meta function documentation links (v1→main)
- Fix Next.js custom server documentation URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

## IS YOUR CHANGE URGENT?  

Help us prioritize incoming PRs by letting us know when the change needs
to go live.
- [ ] Urgent deadline (GA date, etc.): <!-- ENTER DATE HERE -->
- [ ] Other deadline: <!-- ENTER DATE HERE -->
- [x] None: Not urgent, can wait up to 1 week+

## SLA

- Teamwork makes the dream work, so please add a reviewer to your PRs.
- Please give the docs team up to 1 week to review your PR unless you've
added an urgent due date to it.
Thanks in advance for your help!

## PRE-MERGE CHECKLIST

*Make sure you've checked the following before merging your changes:*

- [ ] Checked Vercel preview for correctness, including links
- [ ] PR was reviewed and approved by any necessary SMEs (subject matter
experts)
- [ ] PR was reviewed and approved by a member of the [Sentry docs
team](https://github.com/orgs/getsentry/teams/docs)

## LEGAL BOILERPLATE

<!-- Sentry employees and contractors can delete or ignore this section.
-->

Look, I get it. The entity doing business as "Sentry" was incorporated
in the State of Delaware in 2015 as Functional Software, Inc. and is
gonna need some rights from me in order to utilize my contributions in
this here PR. So here's the deal: I retain all rights, title and
interest in and to my contributions, and by keeping this boilerplate
intact I confirm that Sentry can use, modify, copy, and redistribute my
contributions, under Sentry's choice of terms.

## EXTRA RESOURCES

- [Sentry Docs contributor guide](https://docs.sentry.io/contributing/)

Co-authored-by: Claude Opus 4.5 <[email protected]>
Comment on lines +49 to +58
# - Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs
# - Transient errors (429 rate limits, 5xx server errors) are NOT cached, so they get retried
# - Cache lifetime is just under 2 weeks so it survives between weekly runs
#
# This means each weekly run only re-checks:
# 1. Links that failed with transient errors last time
# 2. New links not yet in cache
cache = true
max_cache_age = "335h"
cache_exclude_status = "429, 500.."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The lychee.toml configuration comment says 404 is accepted, but it's missing from the accept list, causing links with a 404 status to be incorrectly flagged as broken.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

There is a contradiction in the lychee.toml configuration. A comment on lines 49-50 states that 404 responses are treated as successful and are cached. However, the accept list on line 33, which defines which status codes are considered valid, does not include 404. Consequently, any link that returns a 404 status will be incorrectly reported as broken. Since the CI workflow is configured with fail: true, this will cause CI checks to fail for legitimate 404s, creating false positives and blocking pull requests.

💡 Suggested Fix

To align the configuration with the documented intent, add 404 to the accept list in lychee.toml. This will ensure that links returning a 404 status are considered successful and do not cause the CI check to fail.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: lychee.toml#L49-L58

Potential issue: There is a contradiction in the `lychee.toml` configuration. A comment
on lines 49-50 states that `404` responses are treated as successful and are cached.
However, the `accept` list on line 33, which defines which status codes are considered
valid, does not include `404`. Consequently, any link that returns a `404` status will
be incorrectly reported as broken. Since the CI workflow is configured with `fail:
true`, this will cause CI checks to fail for legitimate `404`s, creating false positives
and blocking pull requests.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8161120

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of #15893 (comment)

- name: Get changed files
id: changed
run: |
FILES=$(git diff --name-only --diff-filter=AM origin/${{ github.base_ref }}...HEAD -- '*.md' '*.mdx' || true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed markdown files excluded from PR link checks

The --diff-filter=AM flag only includes files with git status A (Added) or M (Modified), but excludes files with status R (Renamed). When a markdown file is renamed in a PR—even if it's also modified with new broken links—it won't appear in the FILES list and won't be checked. The filter could use --diff-filter=AMR to also include renamed files, ensuring their content is validated for broken external links.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants