Skip to content

fix(toml): swap uiri/toml encoder for tomli_w (issue #439 residual)#566

Open
ricaskew wants to merge 4 commits intofabiocaccamo:mainfrom
ricaskew:fix/toml-encoder-swap
Open

fix(toml): swap uiri/toml encoder for tomli_w (issue #439 residual)#566
ricaskew wants to merge 4 commits intofabiocaccamo:mainfrom
ricaskew:fix/toml-encoder-swap

Conversation

@ricaskew
Copy link
Copy Markdown

The uiri/toml encoder raises IndexError on strings containing certain characters (notably real control characters like chr(27)/ANSI escape). Issue #439 identified this class of bugs and proposed switching to tomli-w. Decode was migrated to stdlib tomllib on 3.11+ previously, but encode still routed through toml.dumps and still crashed.

Reproducer on main:

>>> benedict({"color": "\033[31m"}).to_toml()
IndexError: list index out of range

Change: TOMLSerializer.encode() now calls tomli_w.dumps(). Decode path untouched (tomllib on 3.11+, toml on 3.10). toml stays in the [toml] extra guarded by python_version < '3.11' for the 3.10 decode fallback; tomli-w is added unconditionally for encode.

Regression tests cover:

tests/serializers/test_toml_serializer.py replaces the prior TODO stubs with 5 real tests. test_io_dict_toml's "extra not installed" test patches tomli_w_installed (the encode dependency) instead of toml_installed.

API note: tomli_w.dumps kwargs differ from toml.dumps (no encoder= param; gains multiline_strings and indent). Callers of .to_toml(**kwargs) passing uiri-specific kwargs will hit TypeError and should migrate to tomli-w's kwarg surface.

Full suite: 800 tests pass (1 pre-existing skip).

Fixes the encode-side failure mode documented in #439.


name: Pull request
about: Submit a pull request for this project
assignees: fabiocaccamo


Describe your changes
?

Related issue
?

Checklist before requesting a review

  • I have performed a self-review of my code.
  • I have added tests for the proposed changes.
  • I have run the tests and there are not errors.

…residual)

The uiri/toml encoder raises IndexError on strings containing certain
characters (notably real control characters like chr(27)/ANSI escape).
Issue fabiocaccamo#439 identified this class of bugs and proposed switching to
tomli-w. Decode was migrated to stdlib tomllib on 3.11+ previously,
but encode still routed through toml.dumps and still crashed.

Reproducer on main:

    >>> benedict({"color": "\033[31m"}).to_toml()
    IndexError: list index out of range

Change: TOMLSerializer.encode() now calls tomli_w.dumps(). Decode path
untouched (tomllib on 3.11+, toml on 3.10). toml stays in the [toml]
extra guarded by python_version < '3.11' for the 3.10 decode fallback;
tomli-w is added unconditionally for encode.

Regression tests cover:
- ANSI control character (chr(27)) encode + round-trip — was crashing
- Issue fabiocaccamo#439's literal-backslash examples — guard against regression
- Round-trip on 7 tricky values (control chars, tabs, unicode, quotes)
- Nested dict with embedded control chars
- Direct serializer encode/decode path

tests/serializers/test_toml_serializer.py replaces the prior TODO
stubs with 5 real tests. test_io_dict_toml's "extra not installed"
test patches tomli_w_installed (the encode dependency) instead of
toml_installed.

API note: tomli_w.dumps kwargs differ from toml.dumps (no `encoder=`
param; gains `multiline_strings` and `indent`). Callers of
`.to_toml(**kwargs)` passing uiri-specific kwargs will hit TypeError
and should migrate to tomli-w's kwarg surface.

Full suite: 800 tests pass (1 pre-existing skip).

Fixes the encode-side failure mode documented in fabiocaccamo#439.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fabiocaccamo fabiocaccamo added bug Something isn't working enhancement New feature or request dependencies Pull requests that update a dependency file labels Apr 21, 2026
@fabiocaccamo fabiocaccamo moved this to Todo in Open Source Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes TOML encoding crashes (notably on real control characters like ESC/ANSI sequences) by switching the encode path from uiri/toml to tomli-w, while keeping the existing decode behavior (tomllib on Python 3.11+, toml on 3.10).

Changes:

  • Update TOMLSerializer.encode() to use tomli_w.dumps() and require tomli-w at runtime for encoding.
  • Adjust TOML optional-dependency configuration to add tomli-w and gate toml to Python < 3.11.
  • Replace prior TOML serializer TODO tests with concrete regression/round-trip coverage and update the “extra not installed” patch target accordingly.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
benedict/serializers/toml.py Swap TOML encoding implementation to tomli-w and add optional import/installed flag.
pyproject.toml Update [project.optional-dependencies].toml to include tomli-w and conditionally include toml for Python < 3.11.
requirements.txt Add tomli-w to the pinned dev/test requirements set.
tests/serializers/test_toml_serializer.py Add regression and round-trip tests covering control characters and other tricky strings.
tests/dicts/io/test_io_dict_toml.py Patch tomli_w_installed (encode dependency) for the “extra not installed” test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def test_decode_toml(self) -> None:
# TODO
pass
def test_encode_ansi_control_character(self):
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test methods in this file omit the -> None return type annotation, while the other serializer test modules consistently include it (e.g., tests/serializers/test_abstract_serializer.py:12). For consistency (and to keep type-checking expectations uniform), please add -> None to these new test methods.

Copilot uses AI. Check for mistakes.
def test_encode_ansi_control_character(self):
"""Scenario 1 — falsification clause #1.

`benedict({"color": "\033[31m"}).to_toml()` must not raise. On
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring includes "\033[31m" inside a Python string literal, so \033 will be interpreted and the docstring will contain a real ESC control character at runtime. This can lead to invisible/control characters in test output and makes the docstring harder to read/copy. Consider escaping it as \\033[31m (or writing it as \x1b[31m) in the docstring example so the docstring remains printable text.

Suggested change
`benedict({"color": "\033[31m"}).to_toml()` must not raise. On
`benedict({"color": "\\033[31m"}).to_toml()` must not raise. On

Copilot uses AI. Check for mistakes.
- Add `-> None` return annotation to all new test methods to match
  the convention in other tests/serializers/ files.
- Escape `\033` as `\\033` in the test_encode_ansi_control_character
  docstring so the docstring renders as printable text.

No behavior change; pure consistency polish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fabiocaccamo
Copy link
Copy Markdown
Owner

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Owner

@fabiocaccamo fabiocaccamo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricaskew thanks for the PR, the crash is confirmed and the fix is good.

I have one concern: we are still using uiri/toml for decode on Python 3.10, but this library is abandoned and has known bugs. Since we are already adding tomli_w (from hukkin), I think it is better to finish the migration and remove uiri/toml completely.

My proposal:

  • decode -> tomli (on Python 3.10 only) + tomllib (stdlib on Python 3.11+)
  • encode -> tomli_w (already in this PR)

So the dependencies would be:

  • on Python 3.10: only tomli + tomli-w
  • on Python 3.11+: only stdlib + tomli-w

Can you update the PR in this way? I'm happy to review again after.


PS. remember to run pre-commit.

Per maintainer review (CHANGES_REQUESTED, fabiocaccamo): replace abandoned
uiri/toml dependency with tomli (Python 3.10) + tomllib (Python 3.11+).
Encode side stays on tomli_w (no change). Net: uiri/toml fully removed.

Fix mypy CI gate:
- benedict/serializers/toml.py:52 — explicit str annotation on
  tomli_w.dumps() return value to satisfy [no-any-return] check.

Auxiliary test file rename + typo fix:
- tests/dicts/io/test_io_dict_toml.py — patched flag renamed
  toml_installed -> tomli_installed; "tomlib" -> "tomllib" in skip-message.

Verified locally: pre-commit clean (9/9 hooks pass including mypy);
pytest tests/serializers/test_toml_serializer.py + tests/dicts/io/test_io_dict_toml.py
21 passed, 1 skipped (skip is correct — tomllib available on Py3.11+).
@fabiocaccamo
Copy link
Copy Markdown
Owner

@copilot resolve the merge conflicts in this pull request

@ricaskew
Copy link
Copy Markdown
Author

ricaskew commented May 1, 2026

@fabiocaccamo it's been around 4 days since the @copilot resolve directive and the conflicts on toml.py are still standing. Happy to resolve them manually from this side and push to fix/toml-encoder-swap so you can re-run the matrix on a clean tip — would that be useful, or would you prefer to keep waiting on copilot? Either way works on my end; the local branch on a5fe910 is pre-commit clean (9/9 hooks including mypy) and 21+1 tests pass on the affected files, so once the conflict is resolved CI should run cleanly.

@ricaskew ricaskew removed their assignment May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dependencies Pull requests that update a dependency file enhancement New feature or request

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants