Skip to content

Reject deprecated SPDX licenses, but be lenient with committed LICENSE files#754

Open
thomashoneyman wants to merge 6 commits intomasterfrom
trh/fix-deprecated-licenses
Open

Reject deprecated SPDX licenses, but be lenient with committed LICENSE files#754
thomashoneyman wants to merge 6 commits intomasterfrom
trh/fix-deprecated-licenses

Conversation

@thomashoneyman
Copy link
Member

Fixes #753. In that issue, @philippedev101 noted that license validation has a catch-22 in that:

  • The registry reads licenses specified in the manifest and cross-references them with the licenses discovered in the packaged code by licensee. Deprecated licenses like AGPL-3.0 are allowed, and the licenses must match.
  • The registry publishes to Pursuit via purs publish, which requires only current SPDX identifiers. Deprecated licenses like AGPL-3.0 are rejected.
  • The licensee tool will emit deprecated SPDX ids like AGPL-3.0 when it sees an AGPL license file, rather than their current / canonical forms. If the manifest specifies AGPL-3.0-only, which is correct and compatible with the license file, this still gets rejected by the registry due to the "mismatch".

Therefore, publishing would fail in one place or the other with no value that satisfies both. This PR fixes that by being strict for manifest parsing, but lenient with discovered licensee licenses:

  • Registry.License.parse stays strict for user-authored licenses (current canonical SPDX only)
  • Detected licenses are canonicalized leniently first (License.canonicalizeDetected), then parsed strictly. By "canonicalize" I mean licenses with an unambiguous path to be moved from a deprecated to a current id are updated.
  • Ambiguous deprecated IDs fail explicitly instead of being silently dropped (as happened when e.g. parsing

Implementation-wise, I made a few changes:

  • License is now stored as a parsed tree (not opaque text), and print always emits canonical form
  • extractIds works from that stored tree now, so there's no ability to fail
  • The deprecated-ID canonicalization logic lives in PureScript with a small deterministic strategy: we have a tiny rename map and -only and -or-later rewrites.

On the app side of things, we have clear errors for when entries are detected but cannot be canonicalized, and we no longer ignore bad detected values in e.g. bower.json manifest files.

@thomashoneyman
Copy link
Member Author

I double-checked with the strict license decoding, and 44 manifests fail with it. Can't break the manifest decoding, of course! Here are a set of packages which would fail strict parsing.

bookhound: LGPL-2.1 AND LGPL-2.1-only
error, jarilo, undefined, yayamll, several idiomatic-node-*: AGPL-3.0
matrices, emmet: LGPL-3.0
smolder, test-unit, sized-vectors, sized-matrices: LGPL-3.0+
nano-id: GPL-3.0 AND MIT
run-external-state: GPL-3.0 AND GPL-3.0-or-later
ordered-set@0.2.0: AGPL-3.0-or-later AND AGPL-3.0

So instead I'm adjusting to strict-on-publish, lenient-on-decode, to allow these existing manifests to parse.

@thomashoneyman thomashoneyman force-pushed the trh/fix-deprecated-licenses branch from e85e9f6 to 3e045b4 Compare March 15, 2026 17:29
Keep historical manifests backward-compatible while requiring canonical SPDX for new publishes.
@thomashoneyman thomashoneyman force-pushed the trh/fix-deprecated-licenses branch from 3e045b4 to ec28727 Compare March 15, 2026 18:44
@thomashoneyman thomashoneyman requested a review from f-f March 16, 2026 14:20
@thomashoneyman
Copy link
Member Author

I'm not 100% happy with this because it's more code than I'd like to add, but on the other hand dealing with licenses is kind of a pain and it's not a huge surprise. The worst part is the split between License.parse as used to deal with manifests which may have deprecated licenses already listed in the registry and parseCanonical as is used to only accept new licenses with non-deprecated SPDX identifiers. That makes this a little bit brittle because we have to be careful about which we use. Open to suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

License detection outputs deprecated SPDX identifiers, blocking AGPL packages from Pursuit

2 participants