pytest-capquery: Catch N+1 disasters and profile SQL queries in your test suite #14344

fmartins · 2026-04-01T23:46:05Z

fmartins
Apr 1, 2026

Your green CI pipeline might be lying to you. 🚨

It tells you the code works, but it’s quietly hiding the N+1 database disaster that will bring down your production environment next week.

As Python & SQLAlchemy developers, we spend hours writing tests to assert our application’s final state, but we treat the database layer like a complete black box. We test what the application does, but completely ignore how it does it.

The business cost of this abstraction is expensive. 💸
Every inefficient query and silent lazy-load that slips into the main branch directly inflates your cloud bill and degrades the user experience.

I got tired of this, so I built and open-sourced pytest-capquery. 🛠️

🎯 What it does

pytest-capquery treats SQL queries as first-class citizens in your Pytest suite. By intercepting the SQLAlchemy engine at the driver level, it enforces a strict, chronological timeline of your execution footprint.

Instead of just checking if a function returns True, you can rigorously assert deterministic I/O. If an N+1 regression slips in, the build fails instantly. 💥

🐛 The N+1 Problem in Action

Let's say a developer forgets to use joinedload on a simple query:

def test_demonstrate_n_plus_one_problem(db_session, capquery):
    capquery.statements.clear()

    panels = db_session.query(AlarmPanel).all()

    for panel in panels:
        _ = panel.sensors

    capquery.assert_executed_queries(
        "SELECT ... FROM alarm_panels",
        ("SELECT ... FROM sensors WHERE ? = sensors.panel_id", (1,)),
        ("SELECT ... FROM sensors WHERE ? = sensors.panel_id", (2,)),
        ("SELECT ... FROM sensors WHERE ? = sensors.panel_id", (3,))
    )

If someone drops the joinedload optimization, pytest-capquery exposes the exact lazy-loading queries.

✅ The Fix

When you optimize the query, your test ensures the database behaves exactly as intended:

from sqlalchemy.orm import joinedload

def test_avoid_n_plus_one_queries(db_session, capquery):
    capquery.statements.clear()

    panels = db_session.query(AlarmPanel).options(joinedload(AlarmPanel.sensors)).all()

    for panel in panels:
        _ = panel.sensors

    capquery.assert_executed_queries(
        """
        SELECT ...
        FROM alarm_panels
        LEFT OUTER JOIN sensors AS sensors_1 ON alarm_panels.id = sensors_1.panel_id
        """
    )

Stop blaming the ORM for performance bottlenecks and start profiling your tests! 📈 Lock down your database performance, drastically increase your software resilience, and stop merging regressions.

👇 Check out the project and let me know what you think:

🔗 Repository: fmartins/pytest-capquery
📦 Install: pip install pytest-capquery

RonnyPfannschmidt · 2026-04-02T07:53:28Z

RonnyPfannschmidt
Apr 2, 2026
Maintainer

Then part with telling expected querries and havin to clear a store seems terrifying to me

In addition this will miss compound N+1

So its doing both asserting implementation and missing nuances

I with there was a orm level detection of N+1 situations

Afaik There are situations where joined loads are bad and pulling data into the session via a prior query removes the identity loads in the loop without expandin the with of tje query unreasonable

0 replies

fmartins · 2026-04-02T16:42:15Z

fmartins
Apr 2, 2026
Author

Hi Ronny, thank you for the quick and insightful feedback! I truly appreciate the perspective of a maintainer on the trade-offs of this approach.

Here is a detailed breakdown of my thoughts on the points you raised:

1. Implementation Assertions as Performance Guard-rails

I understand the traditional concern about "testing implementation." However, in high-stakes, high-availability environments, I view SQL execution not just as an implementation detail, but as mission-critical I/O.

A "green" behavioral test that hides a silent N+1 regression is a failure that eventually hits the cloud infrastructure bill and the user experience in production. The intention of pytest-capquery is to make the ORM's "black box" behavior transparent. If an optimization or a degradation occurs, the test failure serves as a necessary guard-rail. Furthermore, in the modern development landscape, the overhead of updating these "characterization tests" is significantly reduced by AI assistance, while the cost of a production performance disaster remains as high as ever.

2. Compound N+1 and Developer Ownership

Regarding "Compound N+1," I believe the responsibility lies with the engineer to be explicit about the expected query footprint. A senior engineer should ideally promote new queries to production only after validating them against a query plan. pytest-capquery acts as the tool that ensures that the validated plan—no matter how many "nuanced" steps it has—is actually what is being executed by the code, preventing silent regressions over time.

3. Strategy Neutrality (Identity Map & Joins)

The project doesn't advocate for a specific ORM strategy, such as forced JOINs. Whether the optimal strategy is a single complex join or two controlled queries that leverage the Identity Map to avoid redundant data transfer, capquery is agnostic. It simply allows the developer to assert that their chosen strategy is being followed, preventing the system from ballooning into 100+ queries due to a future code change.

4. State Management and Roadmap

Your point about "clearing a store" being terrifying is completely valid. State leakage between tests is a major concern in large suites.

My immediate roadmap includes moving toward a stricter context-manager-based isolation. The goal is to ensure that query capturing is ephemeral and tied strictly to the lifecycle of the context or the fixture, following the same "capture and release" philosophy as native caplog or capsys.

I'm building this to bring the same level of observability we have in logs and stdout to the database layer, and I'd love to continue this discussion as the tool evolves!

2 replies

RonnyPfannschmidt Apr 2, 2026
Maintainer

For 1.

A) An assertion for one sql statement can easily muss a second missconfigured query within the code inde test
B) enforcing specific sql creates fragility for unrelated optimization
C) based on how even opus gets testing details wrong it horrifies me how casually ai is used in a argument

For 2.

My experience is that situations priorities and juniors being part of the process fail that

Precisely because humans are fallible id consider this kind of detection critical

In.particular since test data may lack volume scale issues may make it to production

For 3. I believe my point was misunderstood

If N+1 detection was orm integrated then a pattern of identity loookup per loop item would be apparent and identity based warming of a session would remove that lookup pattern the same

For 3. Its not agnostic as it moves assertion for specific sql query structure into the test

fmartins Apr 2, 2026
Author

Hi Ronny, thank you again for the quick and detailed follow-up. I truly value this level of technical pushback—it’s how tools actually evolve into industry standards.

Here is a deeper look at the philosophy behind pytest-capquery based on your points:

1. On Visibility and "Hidden Failures" (Point A & B)

I see pytest-capquery as being in the same family as tools like VCR for network recording. The goal isn't to test the ORM’s internals, but to provide observability over the I/O.

You mentioned that enforcing specific SQL creates fragility against unrelated optimizations. In my experience, that "fragility" is actually a feature we want. In high-performance systems (much like Pinned Queries in Oracle), even a minor change—like adding a single column to a SELECT—can completely shift a Query Plan. I have personally been in "war-rooms" with 150+ engineers because a "simple" column addition caused a catastrophic performance regression. We want to be alerted if the query footprint changes, as it’s often a leading indicator of a production incident.

2. On AI and the Modern Workflow (Point 1.C)

I understand the "horror" regarding the casual use of AI. However, my point was about maintenance productivity. In a modern workflow, the developer remains the absolute authority over the diff and the merge button. AI simply lowers the friction of maintaining these high-precision guard-rails. We shouldn't avoid building robust safety nets just because the safety net itself requires maintenance.

3. Human Fallibility at Scale (Point 2)

We agree that humans are fallible. Senior developers often have access to production replicas (with anonymized/encrypted data) to validate query plans, but under pressure or with juniors in the loop, these manual steps fail. pytest-capquery is the automated check that catches what the human missed, especially when test data volume is too small to trigger an obvious latency spike.

4. Pattern Detection vs. Explicit Counting (Point 3)

Your point about ORM-integrated pattern detection is excellent. However, I see these as two distinct layers:

Explicit Assertions (Unit/Integration): What capquery does today—ensuring the I/O footprint doesn't change without a conscious decision.
Profiling & Heuristics: Detecting N+1 patterns or "Identity Map" warming automatically.

I actually have a separate project in the works specifically for SQLAlchemy Profiling to address that "intelligence" layer. I believe the repository layer needs both: a strict guard-rail for CI and a smart profiler for architecture analysis.

One final question for my own learning: In your experience, how do you typically verify the repository layer's performance and query integrity in large-scale projects without this kind of explicit interception?

I’m taking your feedback on context-manager isolation into the next sprint—it’s a clear win for the project’s reliability!

fmartins · 2026-04-03T06:02:59Z

fmartins
Apr 3, 2026
Author

@RonnyPfannschmidt

fmartins/pytest-capquery#2

I wanted to circle back and sincerely thank you for the candid feedback. You were completely right—relying on a global .clear() method for state management was a terrifying anti-pattern and highly prone to leakage.

I took your critiques to heart and just released v0.2.0, completely redesigning the core architecture to be much more Pythonic, localized, and aligned with standard pytest philosophies.

Here is how the plugin addresses the issues raised:

No more global state clearing: I completely removed the need for capquery.statements.clear(). You can now use the capquery.capture() context manager to securely isolate and assert queries against a specific block of code.

Loose Assertions (Addressing fragility): To address your valid concern about enforcing specific SQL being too fragile for some use cases, I added a way to just enforce the volume of queries at the boundary level. You can simply do with capquery.capture(expected_count=1): to guard against N+1s without hardcoding any SQL strings.

Here is what the new Pythonic workflow looks like:

def test_update_user(db_session, capquery):
    # Setup happens outside the capture block (ignored by the asserter)
    user = db_session.query(User).first()
    
    # Only queries inside this context manager are tracked for this assertion
    with capquery.capture() as phase:
        user.status = "active"
        db_session.commit()

    phase.assert_executed_queries(
        "BEGIN",
        ("UPDATE users SET status=? WHERE users.id = ?", ("active", 1)),
        "COMMIT"
    )

Thank you again for taking the time to review the initial concept and pushing back on the design flaws. Your feedback directly resulted in a significantly more reliable and resilient tool.

I would genuinely appreciate any further constructive suggestions or thoughts on this new context-manager approach if you have the bandwidth!

3 replies

RonnyPfannschmidt Apr 3, 2026
Maintainer

Im under the impression all my point where misunderstood

Are you using opus

fmartins Apr 3, 2026
Author

@RonnyPfannschmidt Yes, you caught me! English is not my native language, so I rely heavily on Gemini as my personal assistant for basically 120% of my daily communications to help translate and structure my thoughts.

For my coding workflow, I do the exact same thing. I use Gemini, Claude, Antigravity, and other tools. I view my role nowadays almost like a manager—trying to orchestrate and get the best possible output from each tool and individual I work with. I apologize if my previous replies felt disconnected or like you were arguing with a bot; that was just a side effect of me using the LLM to try and polish my English.

Regarding the actual technical discussion: I am taking your points very seriously. I am currently considering alternatives to have the SQL assertions created and updated automatically. The idea would be to delegate the brittle part of the process to the tooling, leaving the developer responsible solely for code reviewing the generated query diffs rather than writing them by hand.

However, I need some additional time to really think through this and experiment with a few approaches before I have a solid solution. Thank you for your patience and for sticking with the conversation!

RonnyPfannschmidt Apr 4, 2026
Maintainer

Please ensure you get to the bottom of technical site

The llms have some horrendous failure modes for gnarly detail that truly sabotage those not aware

mergisi · 2026-04-03T13:09:28Z

mergisi
Apr 3, 2026

Really interesting approach to treating SQL execution as a first-class test concern. The context-manager pattern (capquery.capture()) is a smart design choice — it gives developers explicit control over what gets measured without polluting the entire test session.

One area worth considering: how this interacts with query generation tools. When SQL is generated dynamically — whether from ORM relationship loading, query builders, or natural language-to-SQL tools — the query footprint can be non-deterministic across runs. For example, an LLM-backed query generator might produce a JOIN on one run and a subquery on another, both correct but with different execution profiles.

A capquery.capture(expected_count=1) assertion is great for catching regressions, but teams using AI-assisted SQL generation (tools like ai2sql.io and similar) might benefit from a mode that validates query shape (e.g., "no more than N table scans") rather than exact SQL string matching. The assert_executed_queries with literal SQL strings could become brittle in that context.

Disclosure: I work on ai2sql.io — a natural language to SQL tool. The intersection of SQL generation and testing is a problem space we think about a lot, so this project resonates.

2 replies

RonnyPfannschmidt Apr 3, 2026
Maintainer

The way you insert an ai tool that's not remotely related to the user case is reprehensible and i shal consider it a telling example of the ethical framework of the related company

fmartins Apr 3, 2026
Author

Hi @mergisi, thanks so much for checking out the new context manager design and for the thoughtful feedback!

First off, congratulations on ai2sql.io—it is a fantastic tool! I can definitely see how much value it brings to the table. One incredibly powerful feature idea that comes to mind is using it to automatically generate the actual SQLAlchemy repository code (the Python ORM layer) alongside the raw SQL. If it could do that, I would absolutely use it in my own daily workflow. It would be a massive time-saver.

Since our tools operate in such a complementary space (you are helping developers generate great SQL, and I am helping them test it), I would be incredibly grateful if you could help share or promote pytest-capquery within your network! I think developers using AI to generate queries would benefit heavily from having a strict test layer to lock those queries in.

Regarding your excellent point about validating "query shape" instead of exact strings for non-deterministic AI generation: you are completely right. However, I deliberately designed capquery to be strictly a deterministic regression testing tool. The broader scope of dynamic profiling, heuristic validation, and shape-checking is actually the focus of a completely new SQLAlchemy profiling framework I am currently building!

I plan to propose and release this new profiler very soon. Once it is live, I would love to ping you so we can open a discussion on that specific repo about the intersection of AI-generated SQL and dynamic shape validation.

Thanks again for jumping into the thread and for the great insights!

fmartins · 2026-04-04T22:50:15Z

fmartins
Apr 4, 2026
Author

@RonnyPfannschmidt ,

I wanted to circle back after completing a development loop entirely focused on Developer Experience (DX). Your feedback really got me thinking about how to reduce the friction of maintaining these tests, so I prioritized making the workflow as seamless as possible.

To solve the maintenance burden, I've introduced a Jest-inspired automated snapshot workflow. Here is how it looks in practice now:

1. Pytest fixture Setup (conftest.py)
The user just defines a fixture binding their engine to the plugin's context:

import pytest
from pytest_capquery.plugin import CapQueryWrapper

@pytest.fixture(scope="function")
def postgres_capquery(postgres_engine, capquery_context):
    with CapQueryWrapper(postgres_engine, snapshot_manager=capquery_context) as captured:
        yield captured

2. The Snapshot Approach
Instead of polluting the test file with massive SQL strings, the developer simply uses the assert_snapshot=True flag:

def test_update_user(postgres_session, postgres_capquery):
    with postgres_capquery.capture(assert_snapshot=True):
        user = postgres_session.query(User).filter_by(id=1).first()
        user.status = "active"
        postgres_session.commit()

3. Auto-Generation via CLI
To generate or update the expected queries, the developer just runs:

pytest --capquery-update

This automatically creates a .sql file in a __capquery_snapshots__ directory. Future test runs simply assert against this file.

4. Frictionless Manual Assertions
If a developer prefers strict manual assertions in their code instead of snapshots, I've automated that too. If their code changes and the assertion fails, the plugin intercepts the failure and drops the exact, ready-to-copy Python block straight into the terminal:

    # Auto-generated output dropped in stdout on failure for easy copy-paste:
    phase.assert_executed_queries(
        "BEGIN",
        ("SELECT ...", (1,)),
        ("UPDATE ...", ("active", 1)),
        "COMMIT"
    )

I am incredibly grateful for your insights. I completely understand and respect your point about the risks of tightly coupling tests to the implementation layer.

However, the core philosophy of this project is to explicitly document the executed queries. By treating SQL as a first-class citizen in the test suite, we unblock developers from the ORM black-box, make it incredibly easy for Database Administrators to enter the development and code-review loop, and strictly prevent silent N+1 regressions before they reach production.

Thanks again for jumping into the thread and pushing me to improve the tool!

1 reply

fmartins Apr 4, 2026
Author

Quick references:

I am really proud, thank you for your iteration - you helped to move the capquery to a better state!
Let me know if you have any further points

Uh oh!

pytest-capquery: Catch N+1 disasters and profile SQL queries in your test suite #14344

Uh oh!

fmartins Apr 1, 2026

🎯 What it does

🐛 The N+1 Problem in Action

✅ The Fix

Replies: 5 comments · 8 replies

Uh oh!

RonnyPfannschmidt Apr 2, 2026 Maintainer

Uh oh!

fmartins Apr 2, 2026 Author

1. Implementation Assertions as Performance Guard-rails

2. Compound N+1 and Developer Ownership

3. Strategy Neutrality (Identity Map & Joins)

4. State Management and Roadmap

Uh oh!

RonnyPfannschmidt Apr 2, 2026 Maintainer

Uh oh!

fmartins Apr 2, 2026 Author

1. On Visibility and "Hidden Failures" (Point A & B)

2. On AI and the Modern Workflow (Point 1.C)

3. Human Fallibility at Scale (Point 2)

4. Pattern Detection vs. Explicit Counting (Point 3)

Uh oh!

Uh oh!

fmartins Apr 3, 2026 Author

Uh oh!

RonnyPfannschmidt Apr 3, 2026 Maintainer

Uh oh!

fmartins Apr 3, 2026 Author

Uh oh!

RonnyPfannschmidt Apr 4, 2026 Maintainer

Uh oh!

mergisi Apr 3, 2026

Uh oh!

RonnyPfannschmidt Apr 3, 2026 Maintainer

Uh oh!

fmartins Apr 3, 2026 Author

Uh oh!

fmartins Apr 4, 2026 Author

Uh oh!

fmartins Apr 4, 2026 Author

fmartins
Apr 1, 2026

Replies: 5 comments 8 replies

RonnyPfannschmidt
Apr 2, 2026
Maintainer

fmartins
Apr 2, 2026
Author

RonnyPfannschmidt Apr 2, 2026
Maintainer

fmartins Apr 2, 2026
Author

fmartins
Apr 3, 2026
Author

RonnyPfannschmidt Apr 3, 2026
Maintainer

fmartins Apr 3, 2026
Author

RonnyPfannschmidt Apr 4, 2026
Maintainer

mergisi
Apr 3, 2026

RonnyPfannschmidt Apr 3, 2026
Maintainer

fmartins Apr 3, 2026
Author

fmartins
Apr 4, 2026
Author

fmartins Apr 4, 2026
Author