fix(toml): swap uiri/toml encoder for tomli_w (issue #439 residual)#566
fix(toml): swap uiri/toml encoder for tomli_w (issue #439 residual)#566ricaskew wants to merge 4 commits intofabiocaccamo:mainfrom
Conversation
…residual) The uiri/toml encoder raises IndexError on strings containing certain characters (notably real control characters like chr(27)/ANSI escape). Issue fabiocaccamo#439 identified this class of bugs and proposed switching to tomli-w. Decode was migrated to stdlib tomllib on 3.11+ previously, but encode still routed through toml.dumps and still crashed. Reproducer on main: >>> benedict({"color": "\033[31m"}).to_toml() IndexError: list index out of range Change: TOMLSerializer.encode() now calls tomli_w.dumps(). Decode path untouched (tomllib on 3.11+, toml on 3.10). toml stays in the [toml] extra guarded by python_version < '3.11' for the 3.10 decode fallback; tomli-w is added unconditionally for encode. Regression tests cover: - ANSI control character (chr(27)) encode + round-trip — was crashing - Issue fabiocaccamo#439's literal-backslash examples — guard against regression - Round-trip on 7 tricky values (control chars, tabs, unicode, quotes) - Nested dict with embedded control chars - Direct serializer encode/decode path tests/serializers/test_toml_serializer.py replaces the prior TODO stubs with 5 real tests. test_io_dict_toml's "extra not installed" test patches tomli_w_installed (the encode dependency) instead of toml_installed. API note: tomli_w.dumps kwargs differ from toml.dumps (no `encoder=` param; gains `multiline_strings` and `indent`). Callers of `.to_toml(**kwargs)` passing uiri-specific kwargs will hit TypeError and should migrate to tomli-w's kwarg surface. Full suite: 800 tests pass (1 pre-existing skip). Fixes the encode-side failure mode documented in fabiocaccamo#439. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes TOML encoding crashes (notably on real control characters like ESC/ANSI sequences) by switching the encode path from uiri/toml to tomli-w, while keeping the existing decode behavior (tomllib on Python 3.11+, toml on 3.10).
Changes:
- Update
TOMLSerializer.encode()to usetomli_w.dumps()and requiretomli-wat runtime for encoding. - Adjust TOML optional-dependency configuration to add
tomli-wand gatetomlto Python < 3.11. - Replace prior TOML serializer TODO tests with concrete regression/round-trip coverage and update the “extra not installed” patch target accordingly.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
benedict/serializers/toml.py |
Swap TOML encoding implementation to tomli-w and add optional import/installed flag. |
pyproject.toml |
Update [project.optional-dependencies].toml to include tomli-w and conditionally include toml for Python < 3.11. |
requirements.txt |
Add tomli-w to the pinned dev/test requirements set. |
tests/serializers/test_toml_serializer.py |
Add regression and round-trip tests covering control characters and other tricky strings. |
tests/dicts/io/test_io_dict_toml.py |
Patch tomli_w_installed (encode dependency) for the “extra not installed” test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_decode_toml(self) -> None: | ||
| # TODO | ||
| pass | ||
| def test_encode_ansi_control_character(self): |
There was a problem hiding this comment.
Test methods in this file omit the -> None return type annotation, while the other serializer test modules consistently include it (e.g., tests/serializers/test_abstract_serializer.py:12). For consistency (and to keep type-checking expectations uniform), please add -> None to these new test methods.
| def test_encode_ansi_control_character(self): | ||
| """Scenario 1 — falsification clause #1. | ||
|
|
||
| `benedict({"color": "\033[31m"}).to_toml()` must not raise. On |
There was a problem hiding this comment.
The docstring includes "\033[31m" inside a Python string literal, so \033 will be interpreted and the docstring will contain a real ESC control character at runtime. This can lead to invisible/control characters in test output and makes the docstring harder to read/copy. Consider escaping it as \\033[31m (or writing it as \x1b[31m) in the docstring example so the docstring remains printable text.
| `benedict({"color": "\033[31m"}).to_toml()` must not raise. On | |
| `benedict({"color": "\\033[31m"}).to_toml()` must not raise. On |
- Add `-> None` return annotation to all new test methods to match the convention in other tests/serializers/ files. - Escape `\033` as `\\033` in the test_encode_ansi_control_character docstring so the docstring renders as printable text. No behavior change; pure consistency polish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot apply changes based on the comments in this thread |
There was a problem hiding this comment.
@ricaskew thanks for the PR, the crash is confirmed and the fix is good.
I have one concern: we are still using uiri/toml for decode on Python 3.10, but this library is abandoned and has known bugs. Since we are already adding tomli_w (from hukkin), I think it is better to finish the migration and remove uiri/toml completely.
My proposal:
decode->tomli(on Python 3.10 only) +tomllib(stdlib on Python 3.11+)encode->tomli_w(already in this PR)
So the dependencies would be:
- on Python 3.10: only
tomli+tomli-w - on Python 3.11+: only stdlib +
tomli-w
Can you update the PR in this way? I'm happy to review again after.
PS. remember to run pre-commit.
Per maintainer review (CHANGES_REQUESTED, fabiocaccamo): replace abandoned uiri/toml dependency with tomli (Python 3.10) + tomllib (Python 3.11+). Encode side stays on tomli_w (no change). Net: uiri/toml fully removed. Fix mypy CI gate: - benedict/serializers/toml.py:52 — explicit str annotation on tomli_w.dumps() return value to satisfy [no-any-return] check. Auxiliary test file rename + typo fix: - tests/dicts/io/test_io_dict_toml.py — patched flag renamed toml_installed -> tomli_installed; "tomlib" -> "tomllib" in skip-message. Verified locally: pre-commit clean (9/9 hooks pass including mypy); pytest tests/serializers/test_toml_serializer.py + tests/dicts/io/test_io_dict_toml.py 21 passed, 1 skipped (skip is correct — tomllib available on Py3.11+).
|
@copilot resolve the merge conflicts in this pull request |
|
@fabiocaccamo it's been around 4 days since the @copilot resolve directive and the conflicts on |
The uiri/toml encoder raises IndexError on strings containing certain characters (notably real control characters like chr(27)/ANSI escape). Issue #439 identified this class of bugs and proposed switching to tomli-w. Decode was migrated to stdlib tomllib on 3.11+ previously, but encode still routed through toml.dumps and still crashed.
Reproducer on main:
Change: TOMLSerializer.encode() now calls tomli_w.dumps(). Decode path untouched (tomllib on 3.11+, toml on 3.10). toml stays in the [toml] extra guarded by python_version < '3.11' for the 3.10 decode fallback; tomli-w is added unconditionally for encode.
Regression tests cover:
tests/serializers/test_toml_serializer.py replaces the prior TODO stubs with 5 real tests. test_io_dict_toml's "extra not installed" test patches tomli_w_installed (the encode dependency) instead of toml_installed.
API note: tomli_w.dumps kwargs differ from toml.dumps (no
encoder=param; gainsmultiline_stringsandindent). Callers of.to_toml(**kwargs)passing uiri-specific kwargs will hit TypeError and should migrate to tomli-w's kwarg surface.Full suite: 800 tests pass (1 pre-existing skip).
Fixes the encode-side failure mode documented in #439.
name: Pull request
about: Submit a pull request for this project
assignees: fabiocaccamo
Describe your changes
?
Related issue
?
Checklist before requesting a review