-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add external link checking with lychee #15893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Configures lychee link checker with: - Rate limiting and retry settings - Custom user agent to avoid bot blocking - Cache settings to reduce load on external sites - Ignore patterns for placeholder URLs, localhost, and sites that block automated checkers (Twitter, LinkedIn, etc.) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Uses lychee to validate external links in documentation. Triggers: - Weekly cron (Sunday 2 AM UTC): Creates/updates GitHub issue - Manual dispatch: Optionally fails on broken links - Pull requests: Adds non-blocking comment with report The workflow caches results to reduce load on external sites and does not block PRs (external link failures are often transient or false positives). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add section explaining the relationship between internal link checking (this script) and external link checking (lychee). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds a warn-only pre-commit hook that checks external links in changed markdown files using lychee. The hook: - Only runs on docs/ and develop-docs/ markdown files - Shows warnings but doesn't block commits - Gracefully handles missing lychee installation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add instructions for running lychee locally and document the pre-commit hook behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove separate shell script and use inline bash command with || true to achieve warn-only behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace bash one-liner with TypeScript script for Windows compatibility. Uses bun like other scripts in the repo. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Use git diff to get list of changed markdown files for PRs, making the check faster. Full scans still run on schedule and manual dispatch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Add scheme filter to only check http/https (skip root-relative links) - Accept 403/418 status codes (bot blocking, freedesktop teapot) - Add ignore patterns for: - Bot-blocking sites (npmjs, maven, medium, gitlab, epicgames) - Private resources (Notion, private GitHub repos, Zendesk) - Unstable docs (freedesktop) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Set base_url to docs.sentry.io so lychee can resolve root-relative links, then exclude docs.sentry.io from checking (internal links are already covered by lint-404s). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
After manually testing ERROR entries from lychee.log: - bottlepy.org: TLS 1.3 only, incompatible with lychee's native-tls - help.revise.dev: Cloudflare ECH required, fails even with curl - dev.getsentry.net: Internal development URLs - sentry-content-dashboard: Internal dashboard (401) - godoc.org/pkg.go.dev: Rate-limited (429) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Changed from separate entries to using regex optional group (.+@)? to match private IPs with or without credentials (e.g., [email protected]). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Split into two jobs for clarity: - check-pr: PRs only, changed files, adds comment - check-full: Schedule/manual, all files, creates issue Removed caching (wasn't working with per-commit keys). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Rename .lychee.toml to lychee.toml (default config name) - Remove --config args since lychee.toml is auto-detected - Simplify workflow: use '.' instead of listing directories - Split workflow into separate PR and full-scan jobs - Update PR job to update existing comment instead of creating new ones - Update full-scan job to update existing issue instead of creating duplicates - Add file existence checks before reading reports - Use appropriate GitHub labels (Bug, Team: Docs, Product Area: Docs) - Add proper permissions scoping per job 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
| # Cache settings | ||
| # | ||
| # Strategy: Weekly scheduled runs populate the cache, PR checks consume it. | ||
| # - Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The comment in lychee.toml says 404 is a success, but the accept list omits it, causing link checker workflows to fail on 404 responses.
Severity: HIGH | Confidence: High
🔍 Detailed Analysis
The lychee.toml configuration contains a contradiction regarding the handling of 404 status codes. A comment on lines 49-50 states that 404 responses are considered successful and should be cached. However, the accept list on line 33 does not include 404. Since the lychee tool treats any status code not in the accept list as a broken link, this will cause CI workflows to fail when an external link returns a 404. This behavior contradicts the documented caching strategy and can block pull requests unnecessarily.
💡 Suggested Fix
To align the configuration with the documented intent, add 404 to the accept list in lychee.toml. Alternatively, if 404 responses should be treated as failures, remove the mention of 404 from the comment describing the caching strategy.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: lychee.toml#L49
Potential issue: The `lychee.toml` configuration contains a contradiction regarding the
handling of `404` status codes. A comment on lines 49-50 states that `404` responses are
considered successful and should be cached. However, the `accept` list on line 33 does
not include `404`. Since the lychee tool treats any status code not in the `accept` list
as a broken link, this will cause CI workflows to fail when an external link returns a
404. This behavior contradicts the documented caching strategy and can block pull
requests unnecessarily.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8102799
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
404 is a response that is successful in terms of giving us the relevant information (does the link point to an existing page or not). Not that it should be accepted as a valid link.
Change from types_or: [markdown] to files pattern so both .md and .mdx files are checked locally, matching the CI workflow behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
<!-- Use this checklist to make sure your PR is ready for merge. You may delete any sections you don't need. --> ## DESCRIBE YOUR PR Follow up from #15893 Updates various broken external links found by the link checker: - Fix Django REST Framework serializer docs URL - Update Sentry options.py path (master→main, correct directory) - Fix Transifex translation project URL - Replace deprecated Flux docs link with GitHub archive - Fix OpenTelemetry semantic conventions URLs - Update Mailgun documentation URL - Fix Ping Identity documentation URL - Update Flagsmith integration documentation URL - Fix Apple SDK troubleshooting Swift issue reference - Fix Xamarin SSL certificate issue reference - Update Remix meta function documentation links (v1→main) - Fix Next.js custom server documentation URL 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]> ## IS YOUR CHANGE URGENT? Help us prioritize incoming PRs by letting us know when the change needs to go live. - [ ] Urgent deadline (GA date, etc.): <!-- ENTER DATE HERE --> - [ ] Other deadline: <!-- ENTER DATE HERE --> - [x] None: Not urgent, can wait up to 1 week+ ## SLA - Teamwork makes the dream work, so please add a reviewer to your PRs. - Please give the docs team up to 1 week to review your PR unless you've added an urgent due date to it. Thanks in advance for your help! ## PRE-MERGE CHECKLIST *Make sure you've checked the following before merging your changes:* - [ ] Checked Vercel preview for correctness, including links - [ ] PR was reviewed and approved by any necessary SMEs (subject matter experts) - [ ] PR was reviewed and approved by a member of the [Sentry docs team](https://github.com/orgs/getsentry/teams/docs) ## LEGAL BOILERPLATE <!-- Sentry employees and contractors can delete or ignore this section. --> Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms. ## EXTRA RESOURCES - [Sentry Docs contributor guide](https://docs.sentry.io/contributing/) Co-authored-by: Claude Opus 4.5 <[email protected]>
| # - Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs | ||
| # - Transient errors (429 rate limits, 5xx server errors) are NOT cached, so they get retried | ||
| # - Cache lifetime is just under 2 weeks so it survives between weekly runs | ||
| # | ||
| # This means each weekly run only re-checks: | ||
| # 1. Links that failed with transient errors last time | ||
| # 2. New links not yet in cache | ||
| cache = true | ||
| max_cache_age = "335h" | ||
| cache_exclude_status = "429, 500.." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The lychee.toml configuration comment says 404 is accepted, but it's missing from the accept list, causing links with a 404 status to be incorrectly flagged as broken.
Severity: HIGH | Confidence: High
🔍 Detailed Analysis
There is a contradiction in the lychee.toml configuration. A comment on lines 49-50 states that 404 responses are treated as successful and are cached. However, the accept list on line 33, which defines which status codes are considered valid, does not include 404. Consequently, any link that returns a 404 status will be incorrectly reported as broken. Since the CI workflow is configured with fail: true, this will cause CI checks to fail for legitimate 404s, creating false positives and blocking pull requests.
💡 Suggested Fix
To align the configuration with the documented intent, add 404 to the accept list in lychee.toml. This will ensure that links returning a 404 status are considered successful and do not cause the CI check to fail.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: lychee.toml#L49-L58
Potential issue: There is a contradiction in the `lychee.toml` configuration. A comment
on lines 49-50 states that `404` responses are treated as successful and are cached.
However, the `accept` list on line 33, which defines which status codes are considered
valid, does not include `404`. Consequently, any link that returns a `404` status will
be incorrectly reported as broken. Since the CI workflow is configured with `fail:
true`, this will cause CI checks to fail for legitimate `404`s, creating false positives
and blocking pull requests.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 8161120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of #15893 (comment)
| - name: Get changed files | ||
| id: changed | ||
| run: | | ||
| FILES=$(git diff --name-only --diff-filter=AM origin/${{ github.base_ref }}...HEAD -- '*.md' '*.mdx' || true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed markdown files excluded from PR link checks
The --diff-filter=AM flag only includes files with git status A (Added) or M (Modified), but excludes files with status R (Renamed). When a markdown file is renamed in a PR—even if it's also modified with new broken links—it won't appear in the FILES list and won't be checked. The filter could use --diff-filter=AMR to also include renamed files, ensuring their content is validated for broken external links.
Summary
I've randomely found a broken link in the docs so I went ahead and checked all of them with lychee. There were 54 of them - I've fixed some straightforward ones right away in #15894 and left some ambiguous ones for SDK maintainers who would know better (the summary is part of a weekly or manual workflow run)
This PR adds automated external link checking to catch broken links in documentation using lychee.
What's included
GitHub workflow (
.github/workflows/lint-external-links.yml)Pre-commit hook for local validation (warn-only, doesn't block commits)
Configuration files
lychee.toml- Link checker settings (timeouts, retries, accepted status codes, caching).lycheeignore- URL patterns to ignore (examples, bot-blocking sites, TLS-incompatible sites)Caching strategy
Weekly scheduled runs populate the cache, PR checks consume it:
Current state
There are broken external links in the docs that will need to be fixed separately - this PR just adds the tooling to detect them.
IS YOUR CHANGE URGENT?
PRE-MERGE CHECKLIST