Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ Concise, project-specific guidance for AI coding agents working on this repo. Fo
- Support code in `utils/` (`utils.py`, `decode_utils.py`) and enums/models folders. Option dataclasses centralize behavioral switches; never scatter ad-hoc flags.

## 2. Key Behavioral Invariants
- DO NOT mutate caller inputs—copy/normalize (`deepcopy` for mappings, index-projection for sequences) before traversal.
- DO NOT mutate caller inputs—copy/normalize (shallow copy for mappings; deep-copy only when a callable filter may mutate; index-projection for sequences) before traversal.
- Cycle detection in `encode._encode` must raise `ValueError("Circular reference detected")`—preserve side-channel algorithm.
- Depth, list, and parameter limits are security/safety features: respect `depth`, `list_limit`, `parameter_limit`, and `strict_depth` / `raise_on_limit_exceeded` exactly as tests assert.
- Depth, list, and parameter limits are security/safety features: respect `depth`, `max_depth`, `list_limit`, `parameter_limit`, and `strict_depth` / `raise_on_limit_exceeded` exactly as tests assert. `max_depth` is capped to the current recursion limit.
- Duplicate key handling delegated to `Duplicates` enum: COMBINE → list accumulation; FIRST/LAST semantics enforced during merge.
- List format semantics (`ListFormat` enum) change how prefixes are generated; COMMA + `comma_round_trip=True` must emit single-element marker for round-trip fidelity.
- Charset sentinel logic: when `charset_sentinel=True`, prepend sentinel *before* payload; obey override rules when both charset and sentinel present.
Expand All @@ -39,6 +39,7 @@ Concise, project-specific guidance for AI coding agents working on this repo. Fo
- When altering merge or list/index logic, adjust `Utils.merge` or decoding helpers—never inline merging elsewhere.
- New list or formatting strategies: add Enum member with associated generator/formatter; augment tests to cover serialization/deserialization round trip.
- Performance-sensitive paths: avoid repeated regex compilation or deep copies inside tight loops; reuse existing pre-processing structure (tokenize first, structure later).
- `Utils.merge` is internal and may reuse dict targets for performance; do not assume it preserves caller immutability.

## 6. Testing Strategy
- Mirror existing parametric test style in `tests/unit/*_test.py`.
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,6 @@ venv.bak/

# AI related
.junie
AGENTS.md

# VS Code specific
.history
34 changes: 34 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Repository Guidelines

## Project Structure & Module Organization
- `src/qs_codec/` contains the codec implementation, option models, and helpers; place new modules here and keep exports deliberate.
- `tests/` mirrors the package layout with `test_*.py` files so every feature has a nearby regression check.
- `docs/` builds the Sphinx site; refresh guides when behavior or options change.
- `requirements_dev.txt` pins tooling and `tox.ini` mirrors the CI matrix—update both when adding dependencies.

## Build, Test, and Development Commands
- `python -m pip install -e .[dev]` installs the package alongside linting and typing extras.
- `pytest -v --cov=src/qs_codec` drives the unit suite and produces the coverage XML consumed by codecov.
- `tox -e python3.13` runs tests in an isolated interpreter; swap the env name to target other supported versions.
- `tox -e linters` chains Black, isort, flake8, pylint, bandit, pyright, and mypy to catch style or security drift before review.

## Coding Style & Naming Conventions
- Format code with Black (120-char lines) and order imports with isort's Black profile, both configured in `pyproject.toml`.
- Keep functions and modules in snake_case, reserve PascalCase for classes reflecting `qs` data structures, and type hint public APIs.
- Respect docstring tone and option names from the JavaScript `qs` package to signal parity.

## Testing Guidelines
- Add or extend pytest cases under `tests/`, leaning on parametrization for the different encoder/decoder modes.
- Preserve or raise the coverage level tracked in `coverage.xml`; CI flags regressions.
- Name tests `test_{feature}_{scenario}` and refresh fixtures whenever query-string semantics shift.
- When touching cross-language behavior, run `tests/comparison/compare_outputs.sh` to confirm parity with the Node reference.
- For encoding depth changes, cover `EncodeOptions.max_depth` (positive int/None) and cap-to-recursion behavior.

## Commit & Pull Request Guidelines
- Follow the emoji-prefixed summaries visible in `git log` (e.g., `:arrow_up: Bump actions/setup-python from 5 to 6 (#26)`), using the imperative mood.
- Keep each commit focused; include a short body for impactful changes explaining compatibility or migration notes.
- For PRs, push only after `tox` succeeds, link the driving issue, outline user-facing changes, and note the tests you ran (attach before/after snippets for docs tweaks).

## Security & Compatibility Notes
- Follow `SECURITY.md` for private vulnerability disclosure and avoid posting sensitive details in public threads.
- This port tracks the npm `qs` package; document intentional divergences in both code and docs as soon as they occur.
19 changes: 18 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Highlights
- Pluggable hooks: custom ``encoder``/``decoder`` callables; options to sort keys, filter output, and control percent-encoding (keys-only, values-only).
- Nulls & empties: ``strict_null_handling`` and ``skip_nulls``; support for empty lists/arrays when desired.
- Dates: ``serialize_date`` for ISO 8601 or custom (e.g., UNIX timestamp).
- Safety limits: configurable nesting depth, parameter limit, and list index limit; optional strict-depth errors; duplicate-key strategies (combine/first/last).
- Safety limits: configurable decode depth and encode max depth, parameter limit, and list index limit; optional strict-depth errors; duplicate-key strategies (combine/first/last).
- Extras: numeric entity decoding (e.g. ``☺`` → ☺), alternate delimiters/regex, and query-prefix helpers.

Compatibility
Expand Down Expand Up @@ -458,6 +458,23 @@ Encoding can be disabled for keys by setting the
qs.EncodeOptions(encode_values_only=True)
) == 'a=b&c[0]=d&c[1]=e%3Df&f[0][0]=g&f[1][0]=h'

Maximum encoding depth
^^^^^^^^^^^^^^^^^^^^^^

You can cap how deep the encoder will traverse by setting the
`max_depth <https://techouse.github.io/qs_codec/qs_codec.models.html#qs_codec.models.encode_options.EncodeOptions.max_depth>`__
option. If unset, the encoder derives a safe limit from the interpreter recursion limit; when set, the effective
limit is capped to the current recursion limit to avoid ``RecursionError``.

.. code:: python

import qs_codec as qs

try:
qs.encode({'a': {'b': {'c': 'd'}}}, qs.EncodeOptions(max_depth=2))
except ValueError as e:
assert str(e) == 'Maximum encoding depth exceeded'

This encoding can also be replaced by a custom ``Callable`` in the
`encoder <https://techouse.github.io/qs_codec/qs_codec.models.html#qs_codec.models.encode_options.EncodeOptions.encoder>`__ option:

Expand Down
17 changes: 17 additions & 0 deletions docs/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,23 @@ Encoding can be disabled for keys by setting the
qs.EncodeOptions(encode_values_only=True)
) == 'a=b&c[0]=d&c[1]=e%3Df&f[0][0]=g&f[1][0]=h'

Maximum encoding depth
^^^^^^^^^^^^^^^^^^^^^^

You can cap how deep the encoder will traverse by setting the
:py:attr:`max_depth <qs_codec.models.encode_options.EncodeOptions.max_depth>` option. If unset, the encoder derives a
safe limit from the interpreter recursion limit; when set, the effective limit is capped to the current recursion
limit to avoid ``RecursionError``.

.. code:: python

import qs_codec as qs

try:
qs.encode({'a': {'b': {'c': 'd'}}}, qs.EncodeOptions(max_depth=2))
except ValueError as e:
assert str(e) == 'Maximum encoding depth exceeded'

This encoding can also be replaced by a custom ``Callable`` in the
:py:attr:`encoder <qs_codec.models.encode_options.EncodeOptions.encoder>` option:

Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Highlights
- Pluggable hooks: custom ``encoder``/``decoder`` callables; options to sort keys, filter output, and control percent-encoding (keys-only, values-only).
- Nulls & empties: ``strict_null_handling`` and ``skip_nulls``; support for empty lists/arrays when desired.
- Dates: ``serialize_date`` for ISO 8601 or custom (e.g., UNIX timestamp).
- Safety limits: configurable nesting depth, parameter limit, and list index limit; optional strict-depth errors; duplicate-key strategies (combine/first/last).
- Safety limits: configurable decode depth and encode max depth, parameter limit, and list index limit; optional strict-depth errors; duplicate-key strategies (combine/first/last).
- Extras: numeric entity decoding (e.g. ``&#9786;`` → ☺), alternate delimiters/regex, and query-prefix helpers.

Compatibility
Expand Down
31 changes: 27 additions & 4 deletions src/qs_codec/encode.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
Nothing in this module mutates caller objects: inputs are shallow‑normalized and deep‑copied only where safe/necessary to honor options.
"""

import sys
import typing as t
from collections.abc import Sequence as ABCSequence
from copy import deepcopy
Expand Down Expand Up @@ -45,21 +46,24 @@ def encode(value: t.Any, options: EncodeOptions = EncodeOptions()) -> str:
The encoded query string (possibly prefixed with "?" if requested), or an empty string when there is nothing to encode.

Notes:
- Caller input is not mutated. When a mapping is provided it is deep-copied; sequences are projected to a temporary mapping.
- Caller input is not mutated. When a mapping is provided it is shallow-copied (deep-copied only when a callable
filter is used); sequences are projected to a temporary mapping.
- If a callable `filter` is provided, it can transform the root object.
- If an iterable filter is provided, it selects which *root* keys to emit.
"""
# Treat `None` as "nothing to encode".
if value is None:
return ""

filter_opt = options.filter

# Normalize the root into a mapping we can traverse deterministically:
# - Mapping -> deepcopy (avoid mutating caller containers)
# - Mapping -> shallow copy (deep-copy only when a callable filter may mutate)
# - Sequence -> promote to {"0": v0, "1": v1, ...}
# - Other -> empty (encodes to "")
obj: t.Mapping[str, t.Any]
if isinstance(value, t.Mapping):
obj = deepcopy(value)
obj = deepcopy(value) if callable(filter_opt) else dict(value)
elif isinstance(value, (list, tuple)):
obj = {str(i): item for i, item in enumerate(value)}
else:
Expand All @@ -73,7 +77,6 @@ def encode(value: t.Any, options: EncodeOptions = EncodeOptions()) -> str:

# If an iterable filter is provided for the root, restrict emission to those keys.
obj_keys: t.Optional[t.List[t.Any]] = None
filter_opt = options.filter
if filter_opt is not None:
if callable(filter_opt):
# Callable filter may transform the root object.
Expand All @@ -94,6 +97,7 @@ def encode(value: t.Any, options: EncodeOptions = EncodeOptions()) -> str:

# Side channel for cycle detection across recursive calls.
side_channel: WeakKeyDictionary = WeakKeyDictionary()
max_depth = _get_max_encode_depth(options.max_depth)

# Encode each selected root key.
for _key in obj_keys:
Expand Down Expand Up @@ -126,6 +130,7 @@ def encode(value: t.Any, options: EncodeOptions = EncodeOptions()) -> str:
encode_values_only=options.encode_values_only,
charset=options.charset,
add_query_prefix=options.add_query_prefix,
_max_depth=max_depth,
)

# `_encode` yields either a flat list of `key=value` tokens or a single token.
Expand Down Expand Up @@ -157,6 +162,15 @@ def encode(value: t.Any, options: EncodeOptions = EncodeOptions()) -> str:

# Unique placeholder used as a key within the side-channel chain to pass context down recursion.
_sentinel: WeakWrapper = WeakWrapper({})
# Keep a safety buffer below Python's recursion limit to avoid RecursionError on deep inputs.
_DEPTH_MARGIN: int = 50


def _get_max_encode_depth(max_depth: t.Optional[int]) -> int:
limit = max(0, sys.getrecursionlimit() - _DEPTH_MARGIN)
if max_depth is None:
return limit
return min(max_depth, limit)


def _encode(
Expand All @@ -181,6 +195,8 @@ def _encode(
encode_values_only: bool = False,
charset: t.Optional[Charset] = Charset.UTF8,
add_query_prefix: bool = False,
_depth: int = 0,
_max_depth: t.Optional[int] = None,
) -> t.Union[t.List[t.Any], t.Tuple[t.Any, ...], t.Any]:
"""
Recursive worker that produces `key=value` tokens for a single subtree.
Expand Down Expand Up @@ -217,6 +233,11 @@ def _encode(
Returns:
Either a list/tuple of tokens or a single token string.
"""
if _max_depth is None:
_max_depth = _get_max_encode_depth(None)
if _depth > _max_depth:
raise ValueError("Maximum encoding depth exceeded")

# Establish a starting prefix for the top-most invocation (used when called directly).
if prefix is None:
prefix = "?" if add_query_prefix else ""
Expand Down Expand Up @@ -425,6 +446,8 @@ def _encode(
allow_dots=allow_dots,
encode_values_only=encode_values_only,
charset=charset,
_depth=_depth + 1,
_max_depth=_max_depth,
)

# Flatten nested results into the `values` list.
Expand Down
10 changes: 10 additions & 0 deletions src/qs_codec/models/encode_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,13 @@ def encoder(self, value: t.Optional[t.Callable[[t.Any, t.Optional[Charset], t.Op
sort: t.Optional[t.Callable[[t.Any, t.Any], int]] = field(default=None)
"""Optional comparator for deterministic key ordering. Must return -1, 0, or +1."""

max_depth: t.Optional[int] = None
"""Maximum nesting depth allowed during encoding.

When ``None``, the encoder derives a safe limit from the interpreter recursion limit (minus a safety margin).
When set, the effective limit is capped to the current recursion limit to avoid ``RecursionError``.
"""

def __post_init__(self) -> None:
"""Normalize interdependent options.

Expand All @@ -126,6 +133,9 @@ def __post_init__(self) -> None:
"""
if not hasattr(self, "_encoder") or self._encoder is None:
self._encoder = EncodeUtils.encode
if self.max_depth is not None:
if not isinstance(self.max_depth, int) or isinstance(self.max_depth, bool) or self.max_depth <= 0:
raise ValueError("max_depth must be a positive integer or None")
# Default `encode_dot_in_keys` first, then mirror into `allow_dots` when unspecified.
if self.encode_dot_in_keys is None:
self.encode_dot_in_keys = False
Expand Down
72 changes: 4 additions & 68 deletions src/qs_codec/models/weak_wrapper.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Weakly wrap *any* object with identity equality and deep content hashing."""
"""Weakly wrap *any* object with identity equality and stable hashing."""

from __future__ import annotations

Expand Down Expand Up @@ -40,75 +40,12 @@ def _get_proxy(value: t.Any) -> "_Proxy":
return proxy


def _deep_hash(
obj: t.Any,
_seen: t.Optional[set[int]] = None,
_depth: int = 0,
) -> int:
"""Deterministic deep hash with cycle & depth protection.

- Raises ValueError("Circular reference detected") on cycles.
- Raises RecursionError when nesting exceeds 400.
- Produces equal hashes for equal-by-contents containers.
"""
if _depth > 400:
raise RecursionError("Maximum hashing depth exceeded")

if _seen is None:
_seen = set()

# Track only containers by identity for cycle detection
def _enter(o: t.Any) -> int:
oid = id(o)
if oid in _seen:
raise ValueError("Circular reference detected")
_seen.add(oid)
return oid

def _leave(oid: int) -> None:
_seen.remove(oid)

if isinstance(obj, dict):
oid = _enter(obj)
try:
# Compute key/value deep hashes once and sort pairs for determinism
pairs = [(_deep_hash(k, _seen, _depth + 1), _deep_hash(v, _seen, _depth + 1)) for k, v in obj.items()]
pairs.sort()
kv_hashes = tuple(pairs)
return hash(("dict", kv_hashes))
finally:
_leave(oid)

if isinstance(obj, (list, tuple)):
oid = _enter(obj)
try:
elem_hashes = tuple(_deep_hash(x, _seen, _depth + 1) for x in obj)
tag = "list" if isinstance(obj, list) else "tuple"
return hash((tag, elem_hashes))
finally:
_leave(oid)

if isinstance(obj, set):
oid = _enter(obj)
try:
set_hashes = tuple(sorted(_deep_hash(x, _seen, _depth + 1) for x in obj))
return hash(("set", set_hashes))
finally:
_leave(oid)

# Fallback for scalars / unhashables
try:
return hash(obj)
except TypeError:
return hash(repr(obj))


class WeakWrapper:
"""Wrapper suitable for use as a WeakKeyDictionary key.

- Holds a *strong* reference to the proxy (keeps proxy alive while wrapper exists).
- Exposes a weakref to the proxy via `_wref` so tests can observe/force GC.
- Equality is proxy identity; hash is a deep hash of the underlying value.
- Equality is proxy identity; hash is the proxy identity (stable across mutations).
"""

__slots__ = ("_proxy", "_wref", "__weakref__")
Expand Down Expand Up @@ -145,6 +82,5 @@ def __eq__(self, other: object) -> bool:
return self._proxy is other._proxy

def __hash__(self) -> int:
"""Return a deep hash of the wrapped value."""
# Uses your existing deep-hash helper (not shown here).
return _deep_hash(self.value)
"""Return a stable hash based on the proxy identity."""
return hash(self._proxy)
Loading