Proposal: store and serve full thumbnail URLs instead of hitting server#1331
Proposal: store and serve full thumbnail URLs instead of hitting server#1331mihow wants to merge 13 commits into
Conversation
✅ Deploy Preview for antenna-preview canceled.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e8ec983 to
704661c
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/models.py`:
- Around line 2299-2302: thumbnail_urls() routes rows with an empty path
(row.path == "") through the cold redirect, but
find_or_generate_thumbnail_for_label() still treats existing rows as valid if
the width matches and thus never regenerates blank-path rows; update
find_or_generate_thumbnail_for_label() to treat rows with an empty/blank path as
missing/invalid so they fall through to regeneration (e.g., bypass the "width
matches" short-circuit when row.path is blank), and apply the same logic in the
corresponding check referenced around the code near lines 2354-2355 so
blank-path rows are rebuilt instead of returned unchanged.
- Around line 1954-1972: _thumbnail_warm_url currently builds URLs by
concatenating _thumbnail_storage_url_base() (default_storage.url("")) with a
quoted key, which can break for backends that normalize/sign names; change
_thumbnail_warm_url to call default_storage.url(name) directly (e.g. pass
path.lstrip("/") to default_storage.url) so the storage backend generates the
correct URL/encoding/normalization; remove reliance on
_thumbnail_storage_url_base in that function (you can keep
_thumbnail_storage_url_base for other callers/tests but do not use it inside
_thumbnail_warm_url).
In `@ami/main/tests.py`:
- Around line 338-339: Replace the naive datetime usage in the test regen
trigger setup: instead of calling datetime.datetime.now() to set
self.first_capture.last_modified, import and use Django's timezone.now() (from
django.utils import timezone) so the timestamp is timezone-aware when
USE_TZ=True; update the assignment in the test where
self.first_capture.last_modified is set and keep the subsequent save() call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 15cdd819-9675-42a9-a29c-f618f0a63d55
📒 Files selected for processing (4)
ami/main/api/serializers.pyami/main/api/views.pyami/main/models.pyami/main/tests.py
| @functools.lru_cache(maxsize=1) | ||
| def _thumbnail_storage_url_base() -> str: | ||
| """Storage URL prefix shared by every thumbnail row. | ||
|
|
||
| ``default_storage.url('')`` is constant for the process lifetime once Django | ||
| settings are loaded, so cache it module-wide. The warm-path URL builder then | ||
| does one string concat per row instead of routing through the storage | ||
| backend's per-call URL machinery (boto3's ``_normalize_name`` / | ||
| ``_clean_name`` + dict lookups on every call). | ||
|
|
||
| Tests that swap ``DEFAULT_FILE_STORAGE`` at runtime should call | ||
| ``_thumbnail_storage_url_base.cache_clear()`` in their setup. | ||
| """ | ||
| return default_storage.url("") | ||
|
|
||
|
|
||
| def _thumbnail_warm_url(path: str) -> str: | ||
| """Direct storage URL for a cached thumbnail blob (warm path).""" | ||
| return _thumbnail_storage_url_base() + urllib.parse.quote(path.lstrip("/"), safe="/") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== Storage-related settings ==\n'
rg -n --glob '*.py' --glob '*.env*' --glob '*.yml' --glob '*.yaml' \
'DEFAULT_FILE_STORAGE|STORAGES\s*=|AWS_QUERYSTRING_AUTH|GS_QUERYSTRING_AUTH|AWS_QUERYSTRING_EXPIRE|GS_EXPIRATION|AWS_S3_CUSTOM_DOMAIN|MEDIA_URL'
printf '\n== Custom storage classes / url() overrides ==\n'
rg -n --glob '*.py' 'class .*Storage|def url\(self,\s*name'Repository: RolnickLab/antenna
Length of output: 2337
Avoid building _thumbnail_warm_url() via default_storage.url("") + <quoted key>; rely on storage.url(name) for backend-agnostic correctness.
Even though config/settings/production.py sets AWS_QUERYSTRING_AUTH = False (so the “signed URL” failure mode is unlikely in production), _thumbnail_warm_url() still assumes url("") is always a stable prefix equivalent to url(name) (including normalization/encoding) for every storage backend/environment. Prefer letting the configured storage generate the URL.
Suggested fix
-@functools.lru_cache(maxsize=1)
-def _thumbnail_storage_url_base() -> str:
- """Storage URL prefix shared by every thumbnail row.
-
- ``default_storage.url('')`` is constant for the process lifetime once Django
- settings are loaded, so cache it module-wide. The warm-path URL builder then
- does one string concat per row instead of routing through the storage
- backend's per-call URL machinery (boto3's ``_normalize_name`` /
- ``_clean_name`` + dict lookups on every call).
-
- Tests that swap ``DEFAULT_FILE_STORAGE`` at runtime should call
- ``_thumbnail_storage_url_base.cache_clear()`` in their setup.
- """
- return default_storage.url("")
-
-
def _thumbnail_warm_url(path: str) -> str:
"""Direct storage URL for a cached thumbnail blob (warm path)."""
- return _thumbnail_storage_url_base() + urllib.parse.quote(path.lstrip("/"), safe="/")
+ return default_storage.url(path.lstrip("/"))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ami/main/models.py` around lines 1954 - 1972, _thumbnail_warm_url currently
builds URLs by concatenating _thumbnail_storage_url_base()
(default_storage.url("")) with a quoted key, which can break for backends that
normalize/sign names; change _thumbnail_warm_url to call
default_storage.url(name) directly (e.g. pass path.lstrip("/") to
default_storage.url) so the storage backend generates the correct
URL/encoding/normalization; remove reliance on _thumbnail_storage_url_base in
that function (you can keep _thumbnail_storage_url_base for other callers/tests
but do not use it inside _thumbnail_warm_url).
|
Claude says: Adjacent perf question that came up while landing the progressive-JPEG change in WebPPros
Cons
AVIFPros
Cons
Verdict: AVIF probably isn't worth the encoder-time cost for this PR's use case. Revisit in ~2027 if encoder libraries get faster. Suggested shape: pluggable formatIf we do this, the smallest reasonable surface is two new keys on THUMBNAILS = {
"STORAGE_PREFIX": "thumbnails/",
"FORMAT": "webp", # or "jpeg", default "jpeg" for dev/test, "webp" in production
"QUALITY": 80, # PIL accepts 0-100; WebP sweet spot ~80, JPEG ~82
"METHOD": 4, # WebP-only encode effort 0-6 (default 4); ignored for JPEG
"SIZES": {"small": {"width": 240}, "medium": {"width": 1024}},
}
Open questions worth pinning down before opening the follow-up PR:
Happy to open a separate ticket for this if you'd rather keep #1331 focused on the queryset/model rework + the progressive-JPEG win that's already in. Otherwise I can land the WebP flip as a third commit on this branch. |
loppear
left a comment
There was a problem hiding this comment.
I'm probably good with the intention here to split the cost differently between list & image retrieval - lookup thumbnails during API response by join, and skip serving through django redirect if we've already generated it. A lot of this is incidental to making that tradeoff, however.
| return _thumbnail_storage_url_base() + urllib.parse.quote(path.lstrip("/"), safe="/") | ||
|
|
||
|
|
||
| def _thumbnail_route_url(pk: int, label: str, request: Request | None) -> str: |
There was a problem hiding this comment.
This already exists as ami.base.serializers.reverse_with_params as used in the code this PR refactors?
There was a problem hiding this comment.
Claude says: Adopted in 545873a. Dropped the bespoke route helper, using reverse_with_params directly. Kept the explicit api: namespace so the model method works request-less (mgmt commands, template tags) as well as via DRF context.
|
|
||
|
|
||
| @functools.lru_cache(maxsize=1) | ||
| def _thumbnail_storage_url_base() -> str: |
There was a problem hiding this comment.
I'm not sure any of this obfuscation to avoid calling storage.url is worthwhile?
There was a problem hiding this comment.
Claude says: Fair point — the lru_cache + base-prefix helpers were chasing a speculative boto3 overhead we never measured. Dropped in 545873a; warm path is now just default_storage.url(row.path) inline. The string-concat-vs-storage-backend distinction is below noise on the gallery hot path.
| self.assertGreater(len(captures), 1, "Test fixture must produce >1 capture to exercise N+1.") | ||
|
|
||
| with CaptureQueriesContext(connection) as ctx: | ||
| response = self.client.get(f"/api/v2/captures/?project_id={self.project.pk}") |
There was a problem hiding this comment.
This test couldn't fail before, right? The N queries this is all predicated on is over the retrieval of every image url redirect, not in just the captures list response.
There was a problem hiding this comment.
Claude says: You're right — there was no pre-existing N+1 in the parent branch's captures-list serializer; the N+1 that test guarded was one this PR introduced by reading obj.thumbnails.all() from the warm-row check. Dropped that test and the queryset-method prefetch test in 545873a. Rewrote PR body to lead with what this actually wins (80 HTTP round-trips per gallery page, not a DB query count change). Full reasoning in #1331 (comment).
| out: dict[str, str] = {} | ||
| for label, spec in sizes.items(): | ||
| row = existing_by_label.get(label) | ||
| if row and row.path and row.width == spec["width"]: |
There was a problem hiding this comment.
In order to determine whether to skip the redirect, this check needs to reproduce the find_or_generate_thumbnail_for_label check, looking at last_modified (source_changed).
There was a problem hiding this comment.
Claude says: Correctness fix in 545873a. thumbnail_urls now mirrors the generator's source_changed predicate (row.last_modified < source.last_modified) and falls back to the route URL when the source has been re-uploaded, so the next browser fetch regenerates. Added test_serializer_falls_back_to_route_url_when_source_changed to pin it.
| # HEAD per request is expensive on remote storage, and the cached row's | ||
| # presence is the authoritative warm signal. The narrow case of an orphan | ||
| # row (DB row points at a storage path whose blob was deleted out-of-band) | ||
| # is handled by the orphan-thumbnail reconciler (separate follow-up |
There was a problem hiding this comment.
Probably worth it (I don't think there's a path for orphaning now), but maybe a comment that we're not handling orphans rather than that we're implementing this other thing.
There was a problem hiding this comment.
Claude says: Reframed in 545873a. Comment now says we explicitly do NOT detect or repair the orphan case here — missing blob shows broken <img> until the row is removed and the next request regenerates — instead of claiming there's a reconciler that doesn't exist yet.
|
Claude says: On a re-read prompted by Luke's review, my initial analysis behind this PR was wrong on a fundamental point. Flagging it here before going any further on rework, because it changes the merge calculus. What the PR body claimed
The I framed this as fixing a pre-existing N+1 in the captures-list response. What the code actually doesPre-PR (parent branch, def get_thumbnails(self, obj):
return {
label: reverse_with_params(
"sourceimagethumbnail-detail",
args=(obj.pk,),
request=self.context.get("request"),
params={"label": label},
)
for label in settings.THUMBNAILS["SIZES"]
}No There is no pre-existing N+1 against Luke called this out at #1331 (comment) and was right. What the PR actually winsThe real cost the PR removes is 80 HTTP round-trips per gallery page view through the
Each of the 80 redirect requests goes through DRF auth + That's a real cost. It's an HTTP-level fan-out problem, not a DB-level N+1. Different shape, different remedy space. The PR body framed it as the latter. Correctness regression I missedThe original architecture has the redirect viewset acting as an implicit "is this thumbnail still fresh?" check on every browser request — This PR's warm path skips that check entirely. Luke flagged this at #1331 (comment). Concrete failure mode:
User sees a stale thumbnail indefinitely. Pre-PR, this self-healed on the next page view. Alternative approaches worth weighing before pushing rework
My read is that option 1 (measure first) followed by option 2 if needed is probably the right path, and this PR may not be worth landing. The structural cost (lost source-changed check, frontend coupling to direct S3 URL shape, complexity of warm/cold branching in the serializer) is high for a perf gain we never validated. Up to you on whether to keep iterating here or close in favor of measurement-first. I'd lean toward close; the architecture trade isn't favorable without a real bottleneck measurement. |
…equest HEAD Skip Django entirely for the gallery thumbnail fetch when a cached SourceImageThumbnail row matches the configured size and the source hasn't been re-uploaded. Each capture card previously took two round-trips through SourceImageThumbnailViewSet (DRF auth + find_or_generate + a default_storage.exists() HEAD + 302), with no Cache-Control on the redirect so the browser repeated the fan-out every page view. A 40-card gallery = 80 round-trips dropped. URL build lives on SourceImage.thumbnail_urls(request), prefetch contract on SourceImageQuerySet.with_thumbnails(). Single source of truth callable from any surface, both directly testable. Warm/cold predicate mirrors find_or_generate_thumbnail_for_label including source-changed staleness, so a re-uploaded image self-heals through the cold path. Generator drops default_storage.exists() from the regen guard: HEAD-per-request on remote storage was expensive, the cached row is the warm signal. Missing-blob case is documented but not repaired here (would require a separate orphan reconciler). Also: progressive JPEG + optimize + quality=82 on encode for faster perceived load on the medium thumbnails. Co-Authored-By: Claude <noreply@anthropic.com>
46d5fe5 to
545873a
Compare
PIL.Image.thumbnail() preserves aspect ratio by shrinking, so the output width is often a pixel or two below the configured spec (e.g. spec=240 → actual=239). The pre-existing strict ==/!= predicate in the generator meant every cold hit on such rows regenerated forever, and the warm-path emitter on PR #1331 would never fire for them. Surfaced by arctia pass-2 measurement on UKCEH project 1 / event 823: all small thumbs cached at width=239 → cold-emit for every card. Tolerance of 2px applied identically in both find_or_generate's regen gate and thumbnail_urls' warm check. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Arctia two-pass measurement landed. Revising my own framing — earlier write-up overclaimed. Important caveat about arctia (didn't catch this initially): arctia uses So the on-arctia win isn't "skip Django for warm thumbnails" — it's skip the redirect viewset's DRF auth + Where the "Django bypassed" framing does hold is on prod-like deployments using S3 storage with public bucket URLs: Numbers (with corrected framing)Test data: UKCEH project 1, event 823, 40 captures × 2 labels pre-warmed.
On the wall-clock 5× numberThe 5× is a single concurrency point, not a fixed ratio. My test fanned out 80 parallel Bug surfaced during measurement → fix
|
| row is not None | ||
| and self.last_modified is not None | ||
| and row.last_modified is not None | ||
| and row.last_modified < self.last_modified |
There was a problem hiding this comment.
Can source_changed be an annotation in the queryset method instead of looping through the thumbnails here? If not, rename row/rows to thumb/thumbs for clarity.
There was a problem hiding this comment.
Claude says: Done in 1b54cc8a. Annotation route worked cleanly.
SourceImageQuerySet.with_thumbnails() now builds the prefetch queryset and annotates is_source_changed per thumbnail row:
thumbnail_model = apps.get_model("main", "SourceImageThumbnail")
thumbnails_qs = thumbnail_model.objects.annotate(
is_source_changed=models.Case(
models.When(
last_modified__lt=models.F("source_image__last_modified"),
then=models.Value(True),
),
default=models.Value(False),
output_field=models.BooleanField(),
)
)
return self.prefetch_related(models.Prefetch("thumbnails", queryset=thumbnails_qs))The reverse FK source_image__last_modified is reachable from the prefetched queryset, so no OuterRef/Subquery was needed. NULL < x is NULL in SQL, which the CASE resolves as the False default — same behaviour the inline Python check had (it returned False when either side was None).
thumbnail_urls() is now pure URL-formatting: reads thumb.is_source_changed straight off the prefetched row. No more staleness loop. Renamed row/rows → thumb/thumbs while I was in there.
Contract is now strict: callers MUST go through with_thumbnails() — without it, thumbnail_urls() raises AttributeError on thumb.is_source_changed. Production callers (the serializer) already do; the direct-instance test was updated to re-fetch through with_thumbnails().
| ).order_by("timestamp") | ||
| queryset = ( | ||
| queryset.select_related("event", "deployment", "deployment__data_source") | ||
| # Required by SourceImage.thumbnail_urls — drops N+1 on SourceImageThumbnailSerializer. |
There was a problem hiding this comment.
Are they actually N+1 queries or N+1 requests? N+1 is a misleading statement that seems to be persisting.
There was a problem hiding this comment.
Claude says: Scrubbed in 1b54cc8a. Inline views.py comments + serializers.py docstring + models.py with_thumbnails/thumbnail_urls/get_was_processed docstrings all now say "each row triggers a separate SELECT on main_sourceimagethumbnail" (or "fires its own DB query" for the get_was_processed case). N+1 phrasing dropped.
| ) | ||
| if not thumb or thumb.width != size["width"] or source_changed or not default_storage.exists(thumb.path): | ||
| # No ``default_storage.exists(thumb.path)`` here: HEAD-per-request on remote storage | ||
| # is expensive, and the cached row is the warm signal. We do NOT currently detect or |
There was a problem hiding this comment.
Remove reactive/corrective language, the all caps "NOT"
There was a problem hiding this comment.
Claude says: Rewritten in 1b54cc8a. New text:
# Cached row is the warm signal — no HEAD against storage on the regen check.
# Orphan rows (DB row whose blob was deleted out of band) show a broken ``<img>``
# until something removes the row and the next request regenerates.
No "NOT", flat behaviour description.
| # dimensions down by a pixel or two vs the spec (e.g. spec=240 → actual | ||
| # 239). Use a small tolerance so existing rows aren't treated as stale | ||
| # by every warm-check; the same tolerance is applied in the regen gate. | ||
| width_match = row is not None and abs(row.width - spec["width"]) <= _THUMBNAIL_WIDTH_TOLERANCE |
There was a problem hiding this comment.
Discuss some alternatives to this approach of width matching with tolerance.
There was a problem hiding this comment.
Claude says: Five options to weigh; current tolerance is the cheapest interim.
a. Store spec width, not PIL's output. Generator writes row.width = spec["width"] instead of img.size[0]. row.width == spec["width"] then works without tolerance. Trade: row.width becomes "what we asked PIL to produce" rather than "what PIL actually produced." Anything that reads row.width for layout (e.g. a hypothetical <img width=…> server-side) would be off by 1–2px.
b. Store the spec key as a denorm field. Add generated_for_spec (CharField, e.g. "small@240") or full spec-dict on SourceImageThumbnail at generation time. Predicate becomes row.generated_for_spec == settings.THUMBNAILS["SIZES"][label].canonical_key. Migration cost; simplest semantic; future-proof if THUMBNAILS["SIZES"] ever grows fields beyond width/height.
c. Generation-version field. THUMBNAILS["VERSION"] bump in settings, stored on row, regen when row.version != current. Decouples regen from any per-row attribute. Adds a versioning burden — bump-on-edit discipline lives in settings, easy to forget.
d. One-direction predicate (row.width >= spec["width"]). PIL only shrinks, so anything ≥ spec is fresh-enough by definition. No tolerance constant. Doesn't handle the case where spec shrinks to a smaller size — e.g. old row at 1024, new spec 800, would falsely be "fresh" and visually too big.
e. Current tolerance (abs(row.width - spec["width"]) <= 2). No schema change, single module constant, symmetric in both directions. Arbitrary 2-px threshold, opaque without context.
My read: (a) is cleanest semantically and matches how the row should be labelled ("this thumbnail represents what we wanted to generate, not what PIL emitted bit-for-bit"). It's a one-line generator change + no migration. (b) is the right long-term answer if THUMBNAILS["SIZES"] ever grows beyond width/height. The tolerance is fine as an interim.
If you want me to swap to (a) in this PR I'm happy to — it's a small follow-up commit. Or land the current tolerance as the unblocking fix and queue (a)/(b) as a follow-up issue. Your call.
There was a problem hiding this comment.
@loppear what do you think about options a) or d) here?
There was a problem hiding this comment.
Claude says: Went with option (a). The generator now records the requested spec width on the row (parent branch commit 012efbc, with a regression test that simulates Pillow's rounding), and the 2px tolerance is gone from both the warm-check and the regen gate (56bc46a on this branch — strict equality everywhere). Legacy rows that stored encoder output read as stale, fall back to the route URL, and self-heal with a single regeneration. Option (b) (a generated_for_spec field) remains the long-term answer if thumbnail specs ever grow beyond width; queued as a follow-up rather than another migration in this pair of PRs.
|
@loppear thanks for reviewing this one. the agent went a bit overboard on my requests. But there are a few changes I like. I think we can consider these all optimizations and merge your main thumbs PR soon. |
…to feat/thumbnails-serializer-direct-urls
…-feedback rewrites * SourceImageQuerySet.with_thumbnails() now annotates is_source_changed onto the prefetched thumbnail rows in SQL (CASE / WHEN last_modified < source's), so SourceImage.thumbnail_urls() is pure URL formatting — no Python staleness loop. Per @mihow review note on #1331. * thumbnail_urls() rename row/rows -> thumb/thumbs for consistency with the rest of the codebase. * Drop 'N+1' phrasing from views.py, serializers.py, and models.py comments and docstrings — replaced with 'each row triggers a separate SELECT' / 'each row fires its own DB query' which is the actual mechanism. * Drop 'NOT' caps + reactive framing from the HEAD-removal comment in find_or_generate_thumbnail_for_label; rewrite as flat behaviour description. * Update test_thumbnail_urls_model_method_emits_warm_and_route to re-fetch through with_thumbnails() so the strict annotation contract holds. 24/24 ami.main.tests.TestImageThumbnailViews pass. Co-Authored-By: Claude <noreply@anthropic.com>
…to feat/thumbnails-serializer-direct-urls
… main merge) into feat/thumbnails-serializer-direct-urls
…a fix) into feat/thumbnails-serializer-direct-urls
…EAD, progressive JPEG, redirect caching, blank-path guard) into feat/thumbnails-serializer-direct-urls Co-Authored-By: Claude <noreply@anthropic.com> # Conflicts: # ami/main/models.py # ami/main/tests.py
…ec width The generator records the requested spec width on the row (see parent branch), so the warm-check can use strict equality instead of the 2px tolerance that compensated for PIL's aspect-ratio rounding. Legacy rows that stored encoder output read as stale, fall back to the route URL, and self-heal with a single regeneration. Co-Authored-By: Claude <noreply@anthropic.com>
The test pinned a Pillow encoder flag rather than behavior we own; it served its purpose during development. Co-Authored-By: Claude <noreply@anthropic.com>
…thumbnails-serializer-direct-urls Co-Authored-By: Claude <noreply@anthropic.com>
Same pass as the parent branch: keep the contract statements (prefetch required, annotation required, strict width semantics) and drop the review-era justification prose. Co-Authored-By: Claude <noreply@anthropic.com>
…to feat/thumbnails-serializer-direct-urls Co-Authored-By: Claude <noreply@anthropic.com>
Stacked on #1306. Base branch:
feat/thumbnail-source-images. Several pieces of the original proposal were promoted to the parent PR on 2026-06-10 so it deploys safely on its own; what remains here is only the warm/cold URL classification (+223/−20 over four files).Summary
If a thumbnail has already been generated, and the source image hasn't changed, list responses return the direct file-storage URL instead of the Django redirect route. Cold or stale thumbnails keep the existing redirect route, which generates them on first request.
Here you can see the small thumbnail has already been fetched, but the medium one has not.

Two separate costs — what this fixes and what it must not introduce
Earlier drafts of this PR (and some review discussion) conflated these, so spelling them out:
Browser HTTP round-trips — what this PR removes. With the parent alone, every thumbnail
<img>is a request throughSourceImageThumbnailViewSet: DRF auth,get_object(), a thumbnail-row check, then a 302 to storage. This is not a database N+1 — it is per-image HTTP fan-out, paid by every client. A 40-card gallery page is 80 of those requests. The parent has since reduced the per-request cost (no storage existence check, and the 302 is now browser-cached for five minutes), but each client still pays the full fan-out on first view and again whenever the cache expires. This PR removes it for warm rows: a measurement on a dev deployment showed Django requests per warm gallery view going from 80 to 0.Database N+1 — what this PR must avoid introducing. The parent's serializer never touches the thumbnail table: it builds route URLs from
obj.pkplus settings, so there is no pre-existing N+1 to fix (an earlier draft wrongly claimed there was — see this comment and Luke's review thread). This PR makes the serializer readobj.thumbnails.all()per row to decide warm vs cold, which would be a new N+1 on its own.SourceImageQuerySet.with_thumbnails()prefetches it as one extra query per page, and the tests pin that contract.In short: the win is fewer HTTP requests to Django; the prefetch is self-defense against a regression this PR would otherwise create, not a query-count improvement over the parent.
Why generation stays lazy (and why the cached row matters)
Captures have a sparse access pattern: a project can hold on the order of a million captures, but users review occurrence crops — captures themselves are opened only when troubleshooting or checking the context around a detection, so only a few hundred per project are ever viewed. That rules out pre-warming thumbnails at upload/sync time (it would spend storage and compute on images nobody opens) and with it the simpler "deterministic URL, no lookup" design, which only works when the blob is guaranteed to exist.
With generation lazy, something must answer "has this thumbnail been generated, at the current spec, and is it still fresh?" per capture and label — that is exactly what the
SourceImageThumbnailrow records, and why this PR reads it (via the prefetch below) instead of constructing URLs blindly.What changes
ami/main/models.pySourceImageQuerySet.with_thumbnails()— prefetchesthumbnailswith anis_source_changedannotation (thumbnaillast_modifiedpredates the source's), pushing the staleness predicate into the prefetch SQL so the URL builder stays pure formatting.SourceImage.thumbnail_urls(request)— per-label{label: url}builder, the single source of truth for the warm/cold decision. Warm = a cached row exists with a non-blank path, a width equal to the requested spec width, and no source change →default_storage.url(row.path). Anything else → route URL viareverse_with_params, exactly as the parent serializer built it. The width check is strict equality because the generator (since the spec-width fix in Thumbnails API for SourceImages #1306) records the requested width on the row; legacy rows that stored the encoder's output read as stale here and self-heal through one regeneration.ami/main/api/serializers.pySourceImageThumbnailSerializer.get_thumbnailsbecomes a one-line passthrough:ami/main/api/views.pySourceImageViewSet.get_queryset→.with_thumbnails().EventViewSet.get_queryset's innerexample_capturesPrefetch → same.List of changes
default_storage.url(row.path)when the cached row matches the configured width and isn't source-changed<img src=>); no FE change.SourceImageThumbnailViewSet+ concurrency-safe upsertis_source_changedannotation compares row vs sourcelast_modified; stale rows fall back to the route URLwith_thumbnails()prefetch is the contract; viewsets wire it; tests assert ittest_thumbnail_urls_model_method_emits_warm_and_routeexercisesSourceImage.thumbnail_urlsdirectlyContract change
thumbnails.{small,medium}URL shape changes for cached rows fromhttp://<host>/api/v2/captures/thumbnails/<id>/?label=smallto e.g.https://<bucket>.s3.amazonaws.com/thumbnails/capture_<id>/small.jpg.Frontend treats the value as opaque (see
ui/src/data-services/models/capture.ts:170-182,ui/src/pages/captures/capture-columns.tsx). No frontend change required.On deployments using
FileSystemStorage(e.g. local dev),default_storage.url()returns a relative/media/...URL that Django still serves — the "bypass Django" win applies fully only where storage URLs are external (S3/MinIO). Compose with a CDN later viaAWS_S3_CUSTOM_DOMAINand these URLs front-load automatically.Promoted to #1306 (no longer in this diff)
These shipped on the parent so it is solo-deployable: removal of the per-request
default_storage.exists()check, progressive JPEG encoding,Cache-Control: private, max-age=300on the redirect, the blank-path regen guard, and the spec-width fix (replacing the 2px width tolerance an earlier revision of this branch carried).Luke's review threads addressed
reverse_with_paramsalready exists — adopted.lru_cachehelper is obfuscation — dropped; warm path just callsdefault_storage.url(row.path).is_source_changedannotation and warm-path check.What this does NOT change
SourceImageThumbnailViewSet) and its concurrency-safe upsert path are unchanged.Known follow-ups
<img>until the row is removed (consequence of theexists()removal, which lives in Thumbnails API for SourceImages #1306). Planned as a periodic integrity-check task; separate ticket.Event.first_capture()atami/main/models.py:1277is a per-row method that runs a fresh query; pre-existing, unchanged.Verification
<img>requests go direct to storage; zero requests to/api/v2/captures/thumbnails/....🤖 Generated with Claude Code