Parameter limit by dralley · Pull Request #7803 · pulp/pulpcore

dralley · 2026-06-16T20:54:55Z

📜 Checklist

Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
A changelog entry or entries has been added for any significant changes
Follows the Pulp policy on AI Usage
(For new features) - User documentation and test coverage has been added

dralley · 2026-06-16T21:21:36Z

Follow-up to #6784

This is probably a candidate for using the "run plugin CI against pulpcore main branch" automation.

mdellweg · 2026-06-17T09:29:08Z

+    values = list(values)
+    if len(values) < POSTGRES_MAX_QUERY_PARAMS:
+        return Q(**{f"{field_name}__in": values})
+    return Q(**{f"{field_name}__any_array": values})


Would there be a downside when we always used this array method?

This was something I was wanting to investigate a bit more before undrafting. This is draft because it's pretty much just "what Claude said" and I wanted to at least look at some query plans and compare before going with it.

I was trying to make a reliable unit test as well but, unfortunately, the unit test in #7801 is not reliable for reasons that are not entirely clear to me.

The two SQL strategies are:

IN ($1, $2, ..., $N) — N separate bind parameters, one per value
= ANY($1) — one bind parameter containing a PostgreSQL array
Parsing/planning: With IN, PostgreSQL must parse N parameter placeholders and the planner builds an OR-tree of comparison nodes. At 100K values, that's significant parse time and planner overhead. With = ANY(array), the query structure is always the same single ScalarArrayOpExpr node regardless of list size.

Prepared statement caching: IN produces a different query shape for each different N, so the plan can't be reused across different list sizes. = ANY(array) always has the same shape — one parameter — so the plan is reusable.

Index usage: Both use btree indexes equally well.

Small lists: For tiny lists (1-10 items), IN is marginally cheaper because there's no array construction. The difference is negligible in practice.

So why not always use = ANY?

There's no strong PostgreSQL-level reason not to. The threshold in safe_in() is mostly conservatism:

__in is Django's standard, battle-tested lookup — it handles querysets (subqueries), empty lists, None values, and all the edge cases Django has polished over years. A custom lookup is more code to maintain.
__in works across all database backends. = ANY(array) is PostgreSQL-specific.
For Pulp specifically (always PostgreSQL), you could always use = ANY for Python lists and it'd be fine.
If you want to simplify, you could drop the threshold and always use any_array for concrete lists. The threshold just avoids the custom path when there's no benefit.

PostgreSQL's wire protocol limits bind parameters to 65,535 per statement. When Django ORM's filter(field__in=python_list) generates WHERE field IN ($1, $2, ..., $65536+), it exceeds this limit when using server-side cursors (.iterator()). This introduces a safe_in() utility that uses a custom Django lookup (= ANY(%s)) for large lists, passing the entire list as a single PostgreSQL array parameter regardless of size. For small lists, the standard __in lookup is used unchanged. Applied safe_in() to all vulnerable code paths in pulpcore: - RepositoryVersion.get_content(), added(), removed() - import_repository_version() content mapping Also updated the test to use .iterator() so it reliably exercises the server-side cursor path that triggers the parameter limit. Assisted-By: claude-opus-4.6

github-actions Bot added multi-commit no-changelog no-issue labels Jun 16, 2026

dralley mentioned this pull request Jun 16, 2026

Add a test for the postgresql parameter limit workaround #7801

Merged

4 tasks

dralley added backport-3.85 backport-3.105 backport-3.113 labels Jun 16, 2026

pulpbot removed the backport-3.113 label Jun 16, 2026

mdellweg reviewed Jun 17, 2026

View reviewed changes

dralley force-pushed the parameter-limit branch from 687be47 to 98074bd Compare June 23, 2026 13:13

github-actions Bot removed multi-commit no-changelog labels Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parameter limit#7803

Parameter limit#7803
dralley wants to merge 1 commit into
pulp:mainfrom
dralley:parameter-limit

dralley commented Jun 16, 2026

Uh oh!

dralley commented Jun 16, 2026 •

edited

Loading

Uh oh!

mdellweg Jun 17, 2026

Uh oh!

dralley Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dralley commented Jun 16, 2026

📜 Checklist

Uh oh!

dralley commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdellweg Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

dralley Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dralley commented Jun 16, 2026 •

edited

Loading