Show stats panel in occurrence list sidebar by mihow · Pull Request #1308 · RolnickLab/antenna

mihow · 2026-05-15T23:34:42Z

Summary

Frontend consumer for the /occurrences/stats/model-agreement/ endpoint added in #1307. Adds a Stats panel at the top of the occurrence list sidebar, above the filter sections.

New OccurrenceStats component (ui/src/pages/occurrences/occurrence-stats.tsx)
Wired into occurrences.tsx, threading the same active filter array the list view sends to useOccurrences — so the stats always match the current result set (taxon, deployment, date, verification status, default filters, etc.)
Bars rendered, top to bottom:
- Verified occurrences — verified_pct with the raw verified_count alongside (e.g. 0% (121)), so a small-but-nonzero set that rounds to 0% still surfaces the count.
- Agreement (exact taxon) — agreed_exact_pct, with agreed_exact_count / verified_with_prediction_count and inline Wilson 95% CI.
- Agreement (any rank) — agreed_any_rank_pct, same shape (exact matches plus disagreements whose LCA is at any real taxonomic rank).
- Agreement (≥ <RANK>) — agreed_coarser_rank_pct, only rendered when the caller passes ?agreement_coarsest_rank=<RANK> and the backend echoes it. No CI in the BE response yet, so the bar shows just the point estimate.
- Cohen's κ (beyond chance) — signed [-1, 1] bar, zero centred.
Loading skeleton; renders nothing on error so it never blocks the list.

Stacked on the backend branch — base is feat/human-model-agreement-endpoint (#1307), not main. Rebase/retarget to main once #1307 merges.

Wilson CI rendered inline (not on its own row)

The Wilson 95% CI is folded into each agreement bar instead of sitting on a separate row. The bar is a single 0–100% track with:

a translucent CI band (bg-primary/40) from low to high
2px-wide CI bound caps (whiskers) at the low / high edges
a 3px-wide dark vertical marker at the point estimate, slightly taller than the track

This puts the uncertainty visually adjacent to the number it qualifies — the bar is the CI, the marker is the point estimate — so a wide band immediately reads as "shaky number" and a tight band as "confident", without the reader having to cross-reference a separate row.

Filter parity

The panel reuses the list view's filters array verbatim and converts it to query params with the same active/error rules as getFetchUrl (value?.length && !error). The endpoint accepts the full occurrence-list filter set (#1307), so the numbers stay consistent with the visible results.

Test plan

tsc --noEmit — no errors in touched files
eslint + prettier clean on new/modified files
Live browser render verified against project 18 (Vermont Atlas of Life) via the worktree dev server proxied to the Endpoint for stats about verified occurrences #1307 backend. All four core bars render: VERIFIED OCCURRENCES 0% (121), AGREEMENT (EXACT TAXON) 90% (90 of 100) with 95% CI 83–94%, AGREEMENT (ANY RANK) 94% (94 of 100) with 95% CI 88–97%, COHEN'S κ 0.84. The coarser-rank bar is hidden when ?agreement_coarsest_rank is not supplied — verified by direct API call.
Live filter reactivity verified: toggling Default Filters off set ?apply_defaults=false and the Stats panel re-queried with the same param. Same filter array drives both list and stats.

Toolchain note for reviewers

The worktree ui/ has no node_modules. Installing under the host's Node 22 breaks the dev server (nova-ui-kit dereferences a React-18 internal removed in React 19 at tailwind-config eval). Use the repo-pinned Node 18 (.nvmrc → 18.12.0): nvm use 18.12.0 && yarn install && yarn start. Under Node 18 it boots cleanly.

Design notes

The "agreement rate" is the share of human-verified occurrences where the human pick matched the model's pick. Three calibration ideas are baked into this panel:

Raw counts beside the percentage — Verified occurrences shows 0% (121), making the rounded-to-zero percentage readable as "121 of ~24k" rather than literally zero. Each agreement bar also shows K of N so the reader instantly sees how many verifications the rate is built on.

Hard cutoff vs. confidence interval — rather than a yes/no "enough data" line, the Wilson 95% CI shows how shaky the number is. A Wilson score interval behaves well at small samples, so when few occurrences are verified the band is wide and as more get verified it tightens. This is more honest than picking a magic threshold like "30 verifications" (which is a textbook rule of thumb that only holds if verifications are a random sample — they aren't, people verify the unusual / uncertain / eye-catching ones first).

Plain agreement vs. agreement beyond chance — plain agreement % has a blind spot: if 95% of moths in a project are one common species, human and model "agree" most of the time just by both guessing the common one — that's luck, not skill. Cohen's κ subtracts that expected-by-chance agreement; κ of 1.0 = perfect, 0 = no better than guessing, negative = worse than chance. Same caveat as the CI: it still only describes the occurrences people chose to verify, not the whole project.

🤖 Generated with Claude Code

netlify · 2026-05-15T23:34:47Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`3ecc891`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/6a18c55c0c6e3600088e87fc

coderabbitai · 2026-05-15T23:34:49Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ce45a0c9-e7fa-4c4a-a6e1-7fcb6948df1b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/occurrence-stats-ui

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ry params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>

useModelAgreement.ts belongs with the frontend consumer (#1308), not the backend endpoint PR. Keeps #1307 backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

Typed React Query wrapper for /occurrences/stats/model-agreement/. Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

…ry params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>

useModelAgreement.ts belongs with the frontend consumer (#1308), not the backend endpoint PR. Keeps #1307 backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

Typed React Query wrapper for /occurrences/stats/model-agreement/. Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

…ry params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>

useModelAgreement.ts belongs with the frontend consumer (#1308), not the backend endpoint PR. Keeps #1307 backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

Typed React Query wrapper for /occurrences/stats/model-agreement/. Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

Pure-Python LCA over (taxon_id, rank, parents_json) tuples. Returns the deepest shared TaxonRank or None. Used by the upcoming human-model-agreement stat to bucket agreement at-or-finer-than ORDER. Plan: docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md Side-research: docs/claude/planning/occurrence-filter-driven-exports.md Co-Authored-By: Claude <noreply@anthropic.com>

… queryset Pure aggregation; caller wires apply_default_filters + OccurrenceFilter. Annotates best machine prediction, prefetches non-withdrawn identifications, batches Taxon fetch for parents_json, buckets exact / under-order / above-order. Co-Authored-By: Claude <noreply@anthropic.com>

Adds HumanModelAgreementSerializer and the human_model_agreement action on OccurrenceStatsViewSet. Extracts OccurrenceViewSet's filter backends + filterset_fields into a module-level tuple so OccurrenceStatsViewSet can reuse the same OccurrenceFilter pass-through (deployment, event, taxa lists, verified, score thresholds, apply_defaults=false, etc). The top_identifiers action keeps its current behavior — filter_queryset is only invoked by actions that opt in. Co-Authored-By: Claude <noreply@anthropic.com>

Adds 6 HTTP-level tests: missing project_id 400, draft 404, empty zeros, happy-path exact match, deployment filter pass-through, apply_defaults=false score-threshold bypass. Also adds DjangoFilterBackend to OccurrenceStatsViewSet.filter_backends so filterset_fields (event, deployment, determination__rank, ...) actually take effect. Without DjangoFilterBackend, filterset_fields are silently ignored and ?deployment=N returns the unfiltered set. Co-Authored-By: Claude <noreply@anthropic.com>

Mirrors useTopIdentifiers's useAuthorizedQuery pattern. Accepts an arbitrary filter map so the occurrence list page can thread its filter state through unchanged (deployment, event, taxon, score thresholds, apply_defaults). Co-Authored-By: Claude <noreply@anthropic.com>

… review fixes Captures: review findings from Copilot + CodeRabbit, perf bench evidence (43k rows → 159s timeout on apply_defaults=false), and the planned changes for the next session (rename to model-agreement, push aggregation into SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap + test gaps). Co-Authored-By: Claude <noreply@anthropic.com>

…ion to SQL Addresses review feedback on PR #1307: Rename (drop "human"): - URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/ - Function: human_model_agreement_for_project -> model_agreement_for_project - Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer - Viewset action + url_path: human_model_agreement -> model_agreement - FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol) - FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow) - Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject SQL push-down (Copilot+CodeRabbit perf flag): - Replace list(qs) full-row materialization with annotated aggregate(). - Annotate best_user_taxon_id via Subquery over Identification (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon") on identifications since only taxon_id is read. - aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction. - For under-order disagreement: group disagreement set by distinct (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once. - Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass (93,019 occurrences post-filter). Denominator fix (Copilot): - agreed_*_pct now divides by verified_with_prediction_count instead of verified_count. A verified occurrence with no machine prediction can't agree or disagree; including it in the denominator drags the rate down without representing actual model disagreement. - Surface no_prediction_count + verified_with_prediction_count as sibling fields so consumers can see how many such occurrences exist. UNKNOWN rank bug (Copilot): - TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order, so without explicit exclusion UNKNOWN >= ORDER is True and a shared UNKNOWN ancestor would wrongly count as under-order agreement. Filter UNKNOWN out of lca_rank_between's candidate ranks. Add regression test. Tests: - New: test_unknown_rank_excluded_from_lca (LCA regression) - New: test_agreement_under_order_bucket (HTTP coverage for sister-species case, previously only exact-match shortcut was exercised) - Updated: happy-path asserts verified_with_prediction_count and no_prediction_count. 22/22 backend tests green: docker compose exec django python manage.py test ami.main.tests.TestLcaRankBetween ami.main.tests.TestModelAgreementForProject ami.main.tests.TestOccurrenceStatsViewSet Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

Replace the .aggregate() over the full filtered queryset with a two-step approach: 1. SQL Count('pk') for total_occurrences (no joins, no subqueries). 2. Fetch the verified set (occurrences with at least one non-withdrawn ident) with both best_user_taxon_id and best_machine_prediction_taxon_id annotated, then bucket counts + LCA in Python. Why: the previous version evaluated two correlated subqueries (best user identification + best machine prediction) on every row of the filtered queryset. For typical projects, >95% of occurrences have no identification — those rows ran the user-ident subquery only to discover NULL, then ran the (much more expensive) machine-prediction subquery on detections that won't contribute to any agreement bucket. Scoping the subqueries to the verified set avoids that waste. Bench (cold, cache invalidated): Project Total Verified Pre Post P#85 SEC-SEQ 36,253 13,140 — 1.18s P#20 BCI 40,958 1,351 — 0.92s P#84 Pennsylvania 18,407 251 — 0.56s P#24 Atlantic Forestry 2,797 274 — 0.50s P#18 Vermont 43,149 45 ~928ms 0.35s P#23 Insectarium Montreal 20,393 74 — 0.43s Warm via django-cachalot: 122–343ms across all projects. For P#85 (highest absolute identification count in the system), the cost is dominated by apply_default_filters' score-threshold join, not the subqueries. apply_defaults=false actually runs faster (0.69s cold, 179,466 total / 13,140 verified) because the classification join is skipped. Co-Authored-By: Claude <noreply@anthropic.com>

… param Replaces hardcoded `lca >= TaxonRank.ORDER` agreement gate with two layers: - Always returned: `agreed_any_rank_*` — exact matches plus any non-null LCA at a real rank (UNKNOWN excluded). The upstream filter (e.g. a Lepidoptera include list) is what bounds the meaningful scope, not a hardcoded threshold in this function. - Optional `?agreement_coarsest_rank=FAMILY`: when supplied, response also includes `agreed_coarser_rank_*` (exact + LCAs at or below the threshold). The applied rank is echoed in `agreement_coarsest_rank`; null when absent. Also addresses CodeRabbit feedback on the existing branch: - Dedupe base queryset before counting (joins from default-filter chain can inflate Occurrence rows). - Bound `*_pct` FloatFields to [0.0, 1.0] in the serializer. Param validation: invalid rank → 400; UNKNOWN rejected as not meaningful. Tests cover any-rank fallback, threshold filtering, invalid + UNKNOWN rejection, and threshold echo. Co-Authored-By: Claude <noreply@anthropic.com>

…ry params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>

Session-scratchpad doc — belongs in local notes, not the merged branch. Co-Authored-By: Claude <noreply@anthropic.com>

- 2026-05-14-human-model-agreement-endpoint.md — design narrative; superseded by code + PR description. - occurrence-filter-driven-exports.md — side-research stub Copilot flagged as out-of-scope. Promoted to a PR-description follow-up item. Co-Authored-By: Claude <noreply@anthropic.com>

create_detections assigns the classification taxon via .order_by("?"), so the previous test picked a random machine taxon and then required a sister species under the same genus. Random non-species picks (ORDER / FAMILY / GENUS) have no sister, flaking ~50% of runs. Pin both the machine prediction and the human ID to two fixed Vanessa species, so the LCA is always GENUS (any-rank bucket, not exact) and the test is deterministic. Co-Authored-By: Claude <noreply@anthropic.com>

useModelAgreement.ts belongs with the frontend consumer (#1308), not the backend endpoint PR. Keeps #1307 backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

Both derive from the verified_rows already in memory — no extra query. - wilson_interval(): 95% Wilson score CI on agreed_exact_pct and agreed_any_rank_pct (agreed_*_ci_low / _ci_high). Wilson stays inside [0,1] and is honest at the small n typical of verified sets, where the normal approximation breaks down. - cohens_kappa(): exact-taxon agreement beyond chance (cohens_kappa field, range [-1, 1]). Null when no doubly-classified occurrences or expected agreement is 1.0. Discounts the agreement you'd get for free in a project dominated by one common species. Adds 5 nullable response fields. Backwards-compatible (additive only). 9 pure-Python unit tests + 2 HTTP field-presence tests. Co-Authored-By: Claude <noreply@anthropic.com>

Both are generic statistical helpers — they don't depend on Django or any domain model. Lifting them out of ami/main/models_future/occurrence.py so other endpoints/jobs that need binomial CIs or chance-corrected agreement can import them without dragging in the occurrence module. Same implementations, just relocated. Renamed parameter names on cohens_kappa from (human, model) to (rater_a, rater_b) so the helper reads as generic rather than human-vs-model specific. Tests already use isolated `from ami.utils.stats import …` imports (updated all 9 sites in ami/main/tests.py). Co-Authored-By: Claude <noreply@anthropic.com>

Typed React Query wrapper for /occurrences/stats/model-agreement/. Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

@action

Adds ResponseSchemaMetadata (ami/base/metadata.py) — a SimpleMetadata subclass that emits the response serializer's field schema (type, label, help_text, bounds) under actions.GET. DRF's default SimpleMetadata only emits field schema for write methods (POST / PUT), so read-only stats endpoints previously returned only name + description on OPTIONS. Wires it into OccurrenceStatsViewSet and passes serializer_class= to each @action decorator so view.get_serializer() resolves to the per-action response serializer during OPTIONS resolution. Result: frontends can fetch OPTIONS once per stats endpoint and key tooltips / labels by field name. Stat copy lives next to the serializer definition; interpretation copy stays in the FE bundle next to the visualization. Documented in docs/claude/reference/api-stats-pattern.md. Co-Authored-By: Claude <noreply@anthropic.com>

Identification.taxon is nullable — a comment-only verification has a machine prediction but no human label to compare. Previously such rows landed in the agreement denominator (verified_with_prediction_count) but never in any numerator, silently dragging agreed_*_pct down. Adds a comparable cohort: verified occurrences with BOTH a machine prediction and a human taxon. All agreed_*_pct and the Wilson CIs now divide by comparable_count instead of verified_with_prediction_count, so numerator and denominator describe the same set. Cohen's kappa already used this cohort (both_present_pairs), so it is unchanged. Surfaces two new fields so consumers can see why comparable_count differs from verified_count: - comparable_count — denominator for agreed_*_pct - verified_without_taxon_count — verified, has prediction, no human taxon Co-Authored-By: Claude <noreply@anthropic.com>

Replaces the manual try/except rank parsing with a ChoiceField run through SingleParamSerializer, matching the project's standard boundary-validation pattern. Closes a gap where ?agreement_coarsest_rank= (blank) silently no-opped instead of returning the documented 400 for an invalid rank. DRF treats blank fields in QueryDict (HTML) input as absent, so the value is passed in a plain dict to force "" through validation. Unknown ranks and UNKNOWN (absent from the choice list) also 400 at the boundary, and the param stays case-insensitive via an explicit uppercase. drf-spectacular reads the ChoiceField choices into the OpenAPI schema as an enum, so /api/v2/docs/ now lists the valid rank values. Co-Authored-By: Claude <noreply@anthropic.com>

successes > total (or negative) makes the variance term negative and crashes deeper in math.sqrt with an opaque domain error. Since wilson_interval is a public helper in ami/utils/stats, guard the inputs and raise a clear ValueError at the boundary instead. No production caller can currently hit this — agreed_* counts are always a subset of the comparable denominator — but the helper shouldn't depend on that. Co-Authored-By: Claude <noreply@anthropic.com>

Adds an OccurrenceStats panel above the filter sections on the occurrence list page. Consumes the /occurrences/stats/model-agreement/ endpoint, threading the same active filter array the list view sends so the numbers always reflect the current result set. Shows two metrics: verified occurrences % and human-model agreement rate % (rank-level / under-order agreement). Co-Authored-By: Claude <noreply@anthropic.com>

One-line field rename in the occurrence stats panel to match the backend's dropped ORDER threshold. Hook type rename + multi-value filter support landed on the base branch (4a92c0b on #1307). Co-Authored-By: Claude <noreply@anthropic.com>

`StatBar` takes an optional `count` rendered as "0% (121)". Wired into the Verified occurrences bar so a small-but-nonzero verified set that rounds to 0% still surfaces the underlying count. Co-Authored-By: Claude <noreply@anthropic.com>

Typed React Query wrapper for /occurrences/stats/model-agreement/. Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

Two new horizontal bars below the existing verified / agreement-rate bars: - 'Agreement 95% CI (Wilson)' — RangeBar showing the Wilson CI as a filled segment between low and high (wide bar = shaky number, narrow bar = tight). Value reads '87–97%'. '—' when no verified-with-pred set. - 'Cohen's κ (beyond chance)' — SignedBar over [-1, 1] with the zero midpoint marked. Positive fills right, negative fills left. Value reads '0.41'. '—' when undefined (empty or single-category set). Hook type extended with the five new fields (agreed_*_ci_low/high + cohens_kappa). Loading skeleton bumped to 4 placeholders. Co-Authored-By: Claude <noreply@anthropic.com>

…nline Stats panel now renders three agreement bars side-by-side instead of one generic agreement row plus a separate CI range bar: - Agreement (exact taxon) — agreed_exact_* - Agreement (any rank) — agreed_any_rank_* (LCA at any real rank) - Agreement (≥ <rank>) — agreed_coarser_rank_* (only when the caller passes ?agreement_coarsest_rank=<RANK>; otherwise hidden) Wilson 95% CI is folded into each agreement bar instead of sitting on its own row. The bar is a single 0–100% track with: - a translucent CI band (bg-primary/40) from low to high - 2px-wide CI bound caps (whiskers) at low/high - a 3px tall dark vertical marker for the point estimate This puts the uncertainty visually adjacent to the number it qualifies — the bar IS the CI, the marker IS the point — so the CI is no longer easy to overlook. Each agreement row also surfaces raw counts ("90 of 100"). Cohen's κ keeps its existing signed bar. Co-Authored-By: Claude <noreply@anthropic.com>

@action

* feat(occurrence-stats): add lca_rank_between helper Pure-Python LCA over (taxon_id, rank, parents_json) tuples. Returns the deepest shared TaxonRank or None. Used by the upcoming human-model-agreement stat to bucket agreement at-or-finer-than ORDER. Plan: docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md Side-research: docs/claude/planning/occurrence-filter-driven-exports.md Co-Authored-By: Claude <noreply@anthropic.com> * feat(occurrence-stats): aggregate human-model agreement over filtered queryset Pure aggregation; caller wires apply_default_filters + OccurrenceFilter. Annotates best machine prediction, prefetches non-withdrawn identifications, batches Taxon fetch for parents_json, buckets exact / under-order / above-order. Co-Authored-By: Claude <noreply@anthropic.com> * feat(occurrence-stats): wire human-model-agreement action Adds HumanModelAgreementSerializer and the human_model_agreement action on OccurrenceStatsViewSet. Extracts OccurrenceViewSet's filter backends + filterset_fields into a module-level tuple so OccurrenceStatsViewSet can reuse the same OccurrenceFilter pass-through (deployment, event, taxa lists, verified, score thresholds, apply_defaults=false, etc). The top_identifiers action keeps its current behavior — filter_queryset is only invoked by actions that opt in. Co-Authored-By: Claude <noreply@anthropic.com> * test(occurrence-stats): HTTP coverage for human-model-agreement action Adds 6 HTTP-level tests: missing project_id 400, draft 404, empty zeros, happy-path exact match, deployment filter pass-through, apply_defaults=false score-threshold bypass. Also adds DjangoFilterBackend to OccurrenceStatsViewSet.filter_backends so filterset_fields (event, deployment, determination__rank, ...) actually take effect. Without DjangoFilterBackend, filterset_fields are silently ignored and ?deployment=N returns the unfiltered set. Co-Authored-By: Claude <noreply@anthropic.com> * feat(ui): useHumanModelAgreement hook for occurrence stats Mirrors useTopIdentifiers's useAuthorizedQuery pattern. Accepts an arbitrary filter map so the occurrence list page can thread its filter state through unchanged (deployment, event, taxon, score thresholds, apply_defaults). Co-Authored-By: Claude <noreply@anthropic.com> * docs(prompts): handoff for PR #1307 rework — rename + SQL push-down + review fixes Captures: review findings from Copilot + CodeRabbit, perf bench evidence (43k rows → 159s timeout on apply_defaults=false), and the planned changes for the next session (rename to model-agreement, push aggregation into SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap + test gaps). Co-Authored-By: Claude <noreply@anthropic.com> * refactor(occurrence-stats): rename to model-agreement + push aggregation to SQL Addresses review feedback on PR #1307: Rename (drop "human"): - URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/ - Function: human_model_agreement_for_project -> model_agreement_for_project - Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer - Viewset action + url_path: human_model_agreement -> model_agreement - FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol) - FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow) - Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject SQL push-down (Copilot+CodeRabbit perf flag): - Replace list(qs) full-row materialization with annotated aggregate(). - Annotate best_user_taxon_id via Subquery over Identification (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon") on identifications since only taxon_id is read. - aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction. - For under-order disagreement: group disagreement set by distinct (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once. - Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass (93,019 occurrences post-filter). Denominator fix (Copilot): - agreed_*_pct now divides by verified_with_prediction_count instead of verified_count. A verified occurrence with no machine prediction can't agree or disagree; including it in the denominator drags the rate down without representing actual model disagreement. - Surface no_prediction_count + verified_with_prediction_count as sibling fields so consumers can see how many such occurrences exist. UNKNOWN rank bug (Copilot): - TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order, so without explicit exclusion UNKNOWN >= ORDER is True and a shared UNKNOWN ancestor would wrongly count as under-order agreement. Filter UNKNOWN out of lca_rank_between's candidate ranks. Add regression test. Tests: - New: test_unknown_rank_excluded_from_lca (LCA regression) - New: test_agreement_under_order_bucket (HTTP coverage for sister-species case, previously only exact-match shortcut was exercised) - Updated: happy-path asserts verified_with_prediction_count and no_prediction_count. 22/22 backend tests green: docker compose exec django python manage.py test ami.main.tests.TestLcaRankBetween ami.main.tests.TestModelAgreementForProject ami.main.tests.TestOccurrenceStatsViewSet Co-Authored-By: Claude <noreply@anthropic.com> * docs(plan): add text lang to fenced block (markdownlint MD040) Co-Authored-By: Claude <noreply@anthropic.com> * perf(occurrence-stats): scope agreement subqueries to verified set Replace the .aggregate() over the full filtered queryset with a two-step approach: 1. SQL Count('pk') for total_occurrences (no joins, no subqueries). 2. Fetch the verified set (occurrences with at least one non-withdrawn ident) with both best_user_taxon_id and best_machine_prediction_taxon_id annotated, then bucket counts + LCA in Python. Why: the previous version evaluated two correlated subqueries (best user identification + best machine prediction) on every row of the filtered queryset. For typical projects, >95% of occurrences have no identification — those rows ran the user-ident subquery only to discover NULL, then ran the (much more expensive) machine-prediction subquery on detections that won't contribute to any agreement bucket. Scoping the subqueries to the verified set avoids that waste. Bench (cold, cache invalidated): Project Total Verified Pre Post P#85 SEC-SEQ 36,253 13,140 — 1.18s P#20 BCI 40,958 1,351 — 0.92s P#84 Pennsylvania 18,407 251 — 0.56s P#24 Atlantic Forestry 2,797 274 — 0.50s P#18 Vermont 43,149 45 ~928ms 0.35s P#23 Insectarium Montreal 20,393 74 — 0.43s Warm via django-cachalot: 122–343ms across all projects. For P#85 (highest absolute identification count in the system), the cost is dominated by apply_default_filters' score-threshold join, not the subqueries. apply_defaults=false actually runs faster (0.69s cold, 179,466 total / 13,140 verified) because the classification join is skipped. Co-Authored-By: Claude <noreply@anthropic.com> * feat(occurrence-stats): drop ORDER threshold; add coarsest_rank query param Replaces hardcoded `lca >= TaxonRank.ORDER` agreement gate with two layers: - Always returned: `agreed_any_rank_*` — exact matches plus any non-null LCA at a real rank (UNKNOWN excluded). The upstream filter (e.g. a Lepidoptera include list) is what bounds the meaningful scope, not a hardcoded threshold in this function. - Optional `?agreement_coarsest_rank=FAMILY`: when supplied, response also includes `agreed_coarser_rank_*` (exact + LCAs at or below the threshold). The applied rank is echoed in `agreement_coarsest_rank`; null when absent. Also addresses CodeRabbit feedback on the existing branch: - Dedupe base queryset before counting (joins from default-filter chain can inflate Occurrence rows). - Bound `*_pct` FloatFields to [0.0, 1.0] in the serializer. Param validation: invalid rank → 400; UNKNOWN rejected as not meaningful. Tests cover any-rank fallback, threshold filtering, invalid + UNKNOWN rejection, and threshold echo. Co-Authored-By: Claude <noreply@anthropic.com> * feat(ui): align model-agreement hook with BE rename + multi-value query params - Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's dropped ORDER threshold (0565f06). - Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to the response type (not consumed yet — UI follows in #1308). - Widen `filters` to accept arrays and append repeated query params so multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via `request.query_params.getlist(...)`) survive. Per CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com> * chore(docs): drop NEXT_SESSION_PROMPT.md from PR Session-scratchpad doc — belongs in local notes, not the merged branch. Co-Authored-By: Claude <noreply@anthropic.com> * chore(docs): drop session-scratchpad planning docs from PR - 2026-05-14-human-model-agreement-endpoint.md — design narrative; superseded by code + PR description. - occurrence-filter-driven-exports.md — side-research stub Copilot flagged as out-of-scope. Promoted to a PR-description follow-up item. Co-Authored-By: Claude <noreply@anthropic.com> * test(occurrence-stats): make any-rank bucket test deterministic create_detections assigns the classification taxon via .order_by("?"), so the previous test picked a random machine taxon and then required a sister species under the same genus. Random non-species picks (ORDER / FAMILY / GENUS) have no sister, flaking ~50% of runs. Pin both the machine prediction and the human ID to two fixed Vanessa species, so the LCA is always GENUS (any-rank bucket, not exact) and the test is deterministic. Co-Authored-By: Claude <noreply@anthropic.com> * chore(occurrence-stats): move FE hook to UI PR #1308 useModelAgreement.ts belongs with the frontend consumer (#1308), not the backend endpoint PR. Keeps #1307 backend-only. Co-Authored-By: Claude <noreply@anthropic.com> * feat(occurrence-stats): add Wilson CI + Cohen's kappa to model-agreement Both derive from the verified_rows already in memory — no extra query. - wilson_interval(): 95% Wilson score CI on agreed_exact_pct and agreed_any_rank_pct (agreed_*_ci_low / _ci_high). Wilson stays inside [0,1] and is honest at the small n typical of verified sets, where the normal approximation breaks down. - cohens_kappa(): exact-taxon agreement beyond chance (cohens_kappa field, range [-1, 1]). Null when no doubly-classified occurrences or expected agreement is 1.0. Discounts the agreement you'd get for free in a project dominated by one common species. Adds 5 nullable response fields. Backwards-compatible (additive only). 9 pure-Python unit tests + 2 HTTP field-presence tests. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(stats): move wilson_interval + cohens_kappa to ami/utils/stats Both are generic statistical helpers — they don't depend on Django or any domain model. Lifting them out of ami/main/models_future/occurrence.py so other endpoints/jobs that need binomial CIs or chance-corrected agreement can import them without dragging in the occurrence module. Same implementations, just relocated. Renamed parameter names on cohens_kappa from (human, model) to (rater_a, rater_b) so the helper reads as generic rather than human-vs-model specific. Tests already use isolated `from ami.utils.stats import …` imports (updated all 9 sites in ami/main/tests.py). Co-Authored-By: Claude <noreply@anthropic.com> * feat(stats): expose response schema via OPTIONS metadata Adds ResponseSchemaMetadata (ami/base/metadata.py) — a SimpleMetadata subclass that emits the response serializer's field schema (type, label, help_text, bounds) under actions.GET. DRF's default SimpleMetadata only emits field schema for write methods (POST / PUT), so read-only stats endpoints previously returned only name + description on OPTIONS. Wires it into OccurrenceStatsViewSet and passes serializer_class= to each @action decorator so view.get_serializer() resolves to the per-action response serializer during OPTIONS resolution. Result: frontends can fetch OPTIONS once per stats endpoint and key tooltips / labels by field name. Stat copy lives next to the serializer definition; interpretation copy stays in the FE bundle next to the visualization. Documented in docs/claude/reference/api-stats-pattern.md. Co-Authored-By: Claude <noreply@anthropic.com> * fix(stats): exclude taxon-less verifications from agreement denominator Identification.taxon is nullable — a comment-only verification has a machine prediction but no human label to compare. Previously such rows landed in the agreement denominator (verified_with_prediction_count) but never in any numerator, silently dragging agreed_*_pct down. Adds a comparable cohort: verified occurrences with BOTH a machine prediction and a human taxon. All agreed_*_pct and the Wilson CIs now divide by comparable_count instead of verified_with_prediction_count, so numerator and denominator describe the same set. Cohen's kappa already used this cohort (both_present_pairs), so it is unchanged. Surfaces two new fields so consumers can see why comparable_count differs from verified_count: - comparable_count — denominator for agreed_*_pct - verified_without_taxon_count — verified, has prediction, no human taxon Co-Authored-By: Claude <noreply@anthropic.com> * fix(stats): validate agreement_coarsest_rank via ChoiceField Replaces the manual try/except rank parsing with a ChoiceField run through SingleParamSerializer, matching the project's standard boundary-validation pattern. Closes a gap where ?agreement_coarsest_rank= (blank) silently no-opped instead of returning the documented 400 for an invalid rank. DRF treats blank fields in QueryDict (HTML) input as absent, so the value is passed in a plain dict to force "" through validation. Unknown ranks and UNKNOWN (absent from the choice list) also 400 at the boundary, and the param stays case-insensitive via an explicit uppercase. drf-spectacular reads the ChoiceField choices into the OpenAPI schema as an enum, so /api/v2/docs/ now lists the valid rank values. Co-Authored-By: Claude <noreply@anthropic.com> * fix(stats): wilson_interval rejects successes outside [0, total] successes > total (or negative) makes the variance term negative and crashes deeper in math.sqrt with an opaque domain error. Since wilson_interval is a public helper in ami/utils/stats, guard the inputs and raise a clear ValueError at the boundary instead. No production caller can currently hit this — agreed_* counts are always a subset of the comparable denominator — but the helper shouldn't depend on that. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Michael Bunsen <michael@mixedneeds.com>

mihow force-pushed the feat/occurrence-stats-ui branch from 326cd68 to 4ae69ec Compare May 21, 2026 00:52

mihow force-pushed the feat/occurrence-stats-ui branch 3 times, most recently from d621ac3 to 3692eba Compare May 21, 2026 01:13

mihow mentioned this pull request May 21, 2026

Endpoint for stats about verified occurrences #1307

Merged

6 tasks

mihow force-pushed the feat/occurrence-stats-ui branch from 3692eba to d0669ee Compare May 21, 2026 01:18

mihow force-pushed the feat/occurrence-stats-ui branch from 5e5252d to 50c5ff9 Compare May 22, 2026 04:36

mihow force-pushed the feat/human-model-agreement-endpoint branch from f958a38 to c4a4171 Compare May 26, 2026 01:10

mihow force-pushed the feat/occurrence-stats-ui branch from 50c5ff9 to 1241967 Compare May 26, 2026 01:10

mihow force-pushed the feat/occurrence-stats-ui branch from 3a5e022 to ef2cf01 Compare May 26, 2026 19:58

mihow force-pushed the feat/human-model-agreement-endpoint branch from 9347277 to e476333 Compare May 27, 2026 01:11

mihow force-pushed the feat/occurrence-stats-ui branch from ef2cf01 to 2391505 Compare May 27, 2026 01:12

mihow changed the title ~~feat(ui): live stats panel in occurrence list sidebar~~ Show stats panel in occurrence list sidebar May 27, 2026

mihow marked this pull request as draft May 27, 2026 13:20

mihow and others added 4 commits May 27, 2026 06:25

mihow and others added 13 commits May 27, 2026 06:27

docs(plan): add text lang to fenced block (markdownlint MD040)

da2a232

Co-Authored-By: Claude <noreply@anthropic.com>

chore(docs): drop NEXT_SESSION_PROMPT.md from PR

7c144b0

Session-scratchpad doc — belongs in local notes, not the merged branch. Co-Authored-By: Claude <noreply@anthropic.com>

chore(occurrence-stats): move FE hook to UI PR #1308

b74b3cd

useModelAgreement.ts belongs with the frontend consumer (#1308), not the backend endpoint PR. Keeps #1307 backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

mihow force-pushed the feat/human-model-agreement-endpoint branch from e476333 to 336c1fe Compare May 27, 2026 13:29

mihow force-pushed the feat/occurrence-stats-ui branch from 2391505 to 237a013 Compare May 27, 2026 13:29

mihow and others added 10 commits May 28, 2026 15:35

feat(ui): add useModelAgreement hook for occurrence stats

047bd40

Typed React Query wrapper for /occurrences/stats/model-agreement/. Owned by this UI PR (#1308); the backend PR (#1307) is now backend-only. Co-Authored-By: Claude <noreply@anthropic.com>

mihow force-pushed the feat/occurrence-stats-ui branch from 237a013 to 3ecc891 Compare May 28, 2026 22:44

Base automatically changed from feat/human-model-agreement-endpoint to main May 29, 2026 03:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show stats panel in occurrence list sidebar#1308

Show stats panel in occurrence list sidebar#1308
mihow wants to merge 27 commits into
mainfrom
feat/occurrence-stats-ui

mihow commented May 15, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mihow commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Wilson CI rendered inline (not on its own row)

Filter parity

Test plan

Toolchain note for reviewers

Design notes

Uh oh!

netlify Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mihow commented May 15, 2026 •

edited

Loading

netlify Bot commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading