deprecate blosc enums#3963
Conversation
Design for steering BloscCodec users toward literal-string parameters, with the enum classes kept importable but deprecated on member access. Canonical: https://gist.github.com/d-v-b/9fd3fe92f82a24c929129f42a6f11f60 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member access on BloscShuffle / BloscCname now emits DeprecationWarning and returns the equivalent string. BloscCodec stores cname/shuffle as literal strings; passing a real enum.Enum instance to __init__ warns. BloscShuffle.from_int returns a str. Internal call sites in migrate_v3 continue to work because BloscCodec accepts both forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The BloscShuffle and BloscCname enums are deprecated; update doc examples to the recommended literal-string form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 0000 filename is a placeholder; rename to the PR number when the pull request is opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The design doc is published as a public gist linked from the PR description; the in-tree copy is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace Sphinx :class: roles and double-backticks with single backticks in the new docstrings added by the blosc enum deprecation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3963 +/- ##
==========================================
+ Coverage 93.26% 93.27% +0.01%
==========================================
Files 87 87
Lines 11721 11739 +18
==========================================
+ Hits 10932 10950 +18
Misses 789 789
🚀 New features to boost your workflow:
|
Add tests for the ValueError paths in _parse_cname / _parse_shuffle and the AttributeError path in the deprecated-enum metaclass, which codecov reported as uncovered. Drop a few "type: ignore" markers that mypy now flags as unused after the init signature widened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-python into deprecate-blosc-enums
mypy's per-file (pre-commit) and project-wide views disagree on whether the deliberately-wrong arguments need a type ignore. Using typing.cast keeps both views happy and is more explicit about what each test is asserting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
closes #3457
Don't think this Pr does that.
Not sure why we can't use Literal in the strings https://zarr--3963.org.readthedocs.build/en/3963/api/zarr/codecs/#zarr.codecs.BloscCodec for the docs.
Any reason we are not using https://github.com/tox-dev/sphinx-autodoc-typehints?
| CName = Literal["lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd"] | ||
| """The codec identifiers used in the blosc codec """ | ||
|
|
||
| CNAME: Final = ("lz4", "lz4hc", "blosclz", "snappy", "zlib", "zstd") |
There was a problem hiding this comment.
There was a problem hiding this comment.
get_args is only useful at runtime, it doesn't help you with type checking. for a small number of literal strings that won't change, writing them out explicitly as a final collection does help with runtime and type checking.
There was a problem hiding this comment.
But this CNAME all-caps isn't used for anything except runtime checking unless I am missing something, no?
There was a problem hiding this comment.
Like, why even define it? get_args(CName) + lru_cache if there is a performance concern
There was a problem hiding this comment.
it's 6 strings that are defined in a spec. given the choice between get_args + lru_cache (two imports, produces something untyped) and just binding the 6 strings to a constant, the static constant is far simpler (no imports, gets typed correctly).
Anyone who wants to iterate over the values defined in the literal type needs a way of making those values. We should have been doing this in our tests, but we were not. I added some tests that exercise this.
|
|
||
|
|
||
| class BloscShuffle(Enum): | ||
| class _DeprecatedStrEnumMeta(type): |
There was a problem hiding this comment.
https://docs.python.org/3/library/enum.html#enum.EnumType or?
Also why not just warn on import instead of warn on usage? Just warn when people even import the class. It's not like it's used anywhere else.
There was a problem hiding this comment.
Warning on use means fewer, more targeted warnings. I should see how many warnings we would get inside zarr-python if we warned on import.
There was a problem hiding this comment.
So I guess the answer here is "too many?"
we aren't using sphinx! |
Rename Shuffle -> BloscShuffleLiteral and CName -> BloscCnameLiteral. The bare Shuffle name collided with numcodecs.Shuffle re-exported from zarr.codecs, which would have caused mkdocstrings cross-refs in the BloscCodec docstring to resolve to the wrong symbol. The Literal suffix also clearly distinguishes the type alias from the deprecated BloscShuffle / BloscCname enum classes. Update the BloscCodec docstring to reference the new names in the Attributes and Parameters sections (Convention A from cast_value.py), with literal values enumerated in prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize the BloscShuffle / BloscCname member-access warning tests into a single test_blosc_enum_member_access_warns. - Parametrize the cname / shuffle reject-unknown tests into a single test_blosc_codec_rejects_unknown driven by **kwargs. - Parametrize the AttributeError-on-unknown-member tests into a single test_blosc_enum_attribute_error_for_unknown_member. - Add a docstring to every new test explaining what behavior it verifies, so reviewers don't have to read the body to understand the intent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
good catch, I updated the PR summary to say "partially addresses" |
Oh right, mkdocs is completely separate - for a moment I was thinking it used sphinx as a backbone, but I guess not |
…trip Backfill missing coverage: previously every test in the blosc suite used only "lz4" or "zstd" for cname and only "bitshuffle" or "shuffle" for shuffle. Add four parametrized tests driven by BLOSC_CNAME / BLOSC_SHUFFLE: - accepts_all_cnames / accepts_all_shuffles: every value in the runtime tuple is accepted by BloscCodec and round-trips on the stored attribute. Catches drift between the BloscCnameLiteral / BloscShuffleLiteral type aliases and their runtime BLOSC_* counterparts. - json_roundtrip_all_cnames / json_roundtrip_all_shuffles: BloscCodec to_dict / from_dict preserves every value. Codec fields are fully specified so the test doesn't trip over tunable-attribute state, which is not part of the JSON form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| The data type size in bytes used for shuffle filtering. | ||
| cname : BloscCname | ||
| The compression algorithm being used (lz4, lz4hc, blosclz, snappy, zlib, or zstd). | ||
| cname : BloscCnameLiteral |
There was a problem hiding this comment.
Once we fully deprecate the enums, the BloscCname identifier will be available, and I would bind it to the literal type.
|
contextual point I should have mentioned earlier: now that we have code in |
ilan-gold
left a comment
There was a problem hiding this comment.
Not sure what the status of warning on importing is, but it seems like it would be a bit less code and not change the BloscShuffle type, for example, but if the warnings would be too aggressive because of how things are being imported, I understand. I don't fully follow, though - what in the code base would import these values? Seems like they should be removed.
Other than that, looks good!
|
|
||
|
|
||
| class BloscShuffle(Enum): | ||
| class _DeprecatedStrEnumMeta(type): |
There was a problem hiding this comment.
So I guess the answer here is "too many?"
Replace the cname/shuffle JSON-roundtrip pair with a single parametrized
test driven by [("cname", v) for v in BLOSC_CNAME] + [("shuffle", v) for
v in BLOSC_SHUFFLE]. They asserted the same property (to_dict / from_dict
preserves every literal) on two independent axes, so a single test
covers both with clear per-case IDs (e.g. cname-lz4, shuffle-bitshuffle).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked parametrize over BLOSC_CNAME and BLOSC_SHUFFLE so the JSON roundtrip exercises every (cname, shuffle) pair (18 cases instead of 9 in a disjoint union). Drops the **kwargs/dict[str, Any] indirection the disjoint form needed, since the cross-product form passes typed arguments directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FYI mkdocstrings + griffe provides the same functionality as sphinx-autodoc-typehints. We still have the types manually included in the docstrings because many of the types are not public yet. see "Run uv run python ci/check_unlinked_types.py" in https://github.com/zarr-developers/zarr-python/actions/runs/25682116308/job/75396132267 |
This PR deprecates the string enums used for the blosc codec.
Literalstrings are used instead. We only use enums in a few places -- Literal strings are much more abundant for representing constant string types -- and I think that's because enums in python are rather cumbersome compared toLiteral[<str>].If this PR has legs, I would follow up with similar deprecations for the other enums (used in the bytes and sharding codecs).
closespartially addresses #3457