Skip to content

harden: safe ZIP extraction, checksums, and bounded reads for extension/preset downloads#3141

Draft
PascalThuet wants to merge 3 commits into
github:mainfrom
PascalThuet:split/extension-preset-downloads
Draft

harden: safe ZIP extraction, checksums, and bounded reads for extension/preset downloads#3141
PascalThuet wants to merge 3 commits into
github:mainfrom
PascalThuet:split/extension-preset-downloads

Conversation

@PascalThuet

Copy link
Copy Markdown
Contributor

Part of splitting #2442 into smaller, dedicated PRs. Re-derived against current main and adapted to the extensions/ package layout (the original PR predated the module→package refactor #3014).

Stacked on #3140 (bounded HTTP reads). This PR extends the _download_security module introduced there, so its diff currently includes #3140's commit — review/merge #3140 first.

What

  • Extend src/specify_cli/_download_security.py with safe_extract_zip (path-traversal + symlink + per-member/total-size + entry-count bounds), read_zip_member_limited, and verify_sha256.
  • extensions/__init__.py + presets/__init__.py: install_from_zip now extracts via safe_extract_zip (zip-slip + symlink + zip-bomb defense) instead of the path-only containment check; catalog fetches and package downloads use bounded reads; downloads are checked against their catalog sha256 when present.
  • extensions/_commands.py: the inline extension.yml manifest pre-read (before extraction) uses read_zip_member_limited so a crafted manifest can't zip-bomb memory.
  • catalogs.py (CatalogStackBase) + the preset catalog validator: reject hostless URLs before the scheme check.

Deliberately omitted

The command-registration path-traversal guard the original #2442 change carried is already on main (landed via #3088, and more thoroughly) — so agents.py/test_registrar_path_traversal.py are not touched here.

Validation

  • test_download_security.py, test_extensions.py, test_presets.py, integrations/test_integration_catalog.py757 passed.
  • ruff check clean on all changed files.
  • Existing extension/preset tests pass unchanged except for mock/assertion updates: stream mocks made cursor-based (so they exercise the new bounded reads), and one catalog-URL message assertion updated for the host-before-scheme order. integrations/test_integration_catalog.py carries a one-line param fix because it asserts the shared CatalogStackBase behavior changed here (the rest of that file's bounded-read work is a later PR).

Catalog bounded-reads for the workflow/integration/bundler readers, and the tree-wide unbounded-read gate, come in the next PR.

Add a shared _download_security module (read_response_limited,
is_https_or_localhost_http, size constants) and route the GitHub release
and Azure DevOps token network reads through bounded reads so an oversized
response can't exhaust memory.

Add a strict_redirects mode to authentication.open_url: the redirect handler
now rejects any redirect whose target isn't HTTPS (or HTTP to localhost),
composing with the existing per-hop redirect_validator and auth-stripping.
The Azure DevOps token POST is routed through that handler so a 307/308
cannot forward the client_secret body to a non-HTTPS host.
…on/preset downloads

Extend _download_security with safe_extract_zip (path + symlink + per-member
and total size + entry-count bounds), read_zip_member_limited, and
verify_sha256, and adopt them across the extension and preset install paths:

- install_from_zip extracts via safe_extract_zip (zip-slip + symlink + zip-bomb
  defense) instead of the previous path-only containment check.
- catalog JSON fetches and package downloads use bounded reads; the inline
  manifest pre-read in extension update uses read_zip_member_limited.
- downloaded packages are verified against their catalog sha256 when present.
- CatalogStackBase and the preset catalog validator reject hostless URLs
  before the scheme check.

The command-registration path-traversal guard the original change carried is
already on main (github#3088) and is intentionally omitted.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens extension/preset/catalog download and extraction paths by introducing bounded reads, optional SHA-256 verification, and safer ZIP handling, then wiring those protections into the extension/preset install flows and associated tests.

Changes:

  • Add _download_security helpers for strict URL validation, bounded HTTP reads, SHA-256 verification, bounded ZIP-member reads, and safe ZIP extraction (zip-slip/zip-bomb/symlink defenses).
  • Update extension and preset download/install flows to use bounded reads, checksum verification (when sha256 is present), and safe_extract_zip.
  • Update auth redirect handling and tests to reflect strict redirect enforcement and stream-like read(size) semantics.
Show a summary per file
File Description
src/specify_cli/_download_security.py New security primitives: bounded reads, ZIP extraction/member limits, checksum verification, scheme/host predicate.
src/specify_cli/authentication/http.py Strict redirect validation integrated into redirect handler and exposed via open_url(..., strict_redirects=...).
src/specify_cli/authentication/azure_devops.py Bounded token-response reads and strict redirect protection for client-credentials POST.
src/specify_cli/_version.py Latest-release metadata fetch now uses bounded reads and strict redirects.
src/specify_cli/_github_http.py GitHub release metadata resolution now uses bounded reads to avoid unbounded JSON loads.
src/specify_cli/catalogs.py Catalog URL validation rejects hostless URLs before scheme checks.
src/specify_cli/extensions/__init__.py Extension ZIP installs use safe_extract_zip; catalog and ZIP downloads are bounded and optionally checksum-verified.
src/specify_cli/extensions/_commands.py Extension manifest pre-read from ZIP is now bounded (read_zip_member_limited).
src/specify_cli/presets/__init__.py Preset ZIP installs use safe_extract_zip; catalog and ZIP downloads are bounded and optionally checksum-verified.
src/specify_cli/presets/_commands.py Shared URL predicate for download/redirect validation; preset ZIP downloads use bounded reads and strict redirects for GitHub metadata resolution.
tests/test_download_security.py New tests covering bounded reads, checksum verification, safe ZIP extraction, and bounded ZIP member reads.
tests/test_authentication.py Tests updated for strict redirect rejection and bounded read semantics; Azure AD token tests patch opener behavior.
tests/test_github_http.py Tests updated for stream-like reads; add multi-hop GitHub redirect auth-preservation test.
tests/http_helpers.py Stream-backed read() mocks and an autouse fixture to route build_opener().open() through patched urlopen.
tests/test_upgrade.py Add regression test asserting _fetch_latest_release_tag reads via read_response_limited(max_bytes=...).
tests/self_upgrade_helpers.py Re-export the opener-routing fixture for self-upgrade test modules.
tests/test_self_upgrade_detection.py Enable opener-routing fixture to keep existing urlopen patches effective.
tests/test_self_upgrade_execution.py Enable opener-routing fixture to keep existing urlopen patches effective.
tests/test_self_upgrade_verification.py Enable opener-routing fixture to keep existing urlopen patches effective.
tests/test_extensions.py Stream-like read(size) mocks and one assertion update for host-before-scheme URL validation.
tests/test_presets.py Stream-like read(size) mocks and open_url fake signature updated for strict_redirects.
tests/integrations/test_integration_catalog.py One assertion updated for host-before-scheme URL validation.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 22/22 changed files
  • Comments generated: 2

Comment thread src/specify_cli/_download_security.py Outdated
Comment thread src/specify_cli/_download_security.py Outdated
Assisted-by: Codex (model: GPT-5, autonomous)
@mnriem mnriem marked this pull request as draft June 24, 2026 18:29
@mnriem

mnriem commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Converting to draft as #3140 has to land first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants