Skip to content

📝 CodeRabbit Chat: Implement requested code changes#1325

Closed
coderabbitai[bot] wants to merge 18 commits into
mainfrom
coderabbitai/chat/336c1fe
Closed

📝 CodeRabbit Chat: Implement requested code changes#1325
coderabbitai[bot] wants to merge 18 commits into
mainfrom
coderabbitai/chat/336c1fe

Conversation

@coderabbitai

@coderabbitai coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Code changes was requested by @mihow.

The following files were modified:

  • ami/base/serializers.py
  • ami/main/api/views.py

mihow and others added 18 commits May 27, 2026 06:25
Pure-Python LCA over (taxon_id, rank, parents_json) tuples. Returns
the deepest shared TaxonRank or None. Used by the upcoming
human-model-agreement stat to bucket agreement at-or-finer-than ORDER.

Plan: docs/claude/planning/2026-05-14-human-model-agreement-endpoint.md
Side-research: docs/claude/planning/occurrence-filter-driven-exports.md

Co-Authored-By: Claude <noreply@anthropic.com>
… queryset

Pure aggregation; caller wires apply_default_filters + OccurrenceFilter.
Annotates best machine prediction, prefetches non-withdrawn identifications,
batches Taxon fetch for parents_json, buckets exact / under-order / above-order.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds HumanModelAgreementSerializer and the human_model_agreement action
on OccurrenceStatsViewSet. Extracts OccurrenceViewSet's filter backends +
filterset_fields into a module-level tuple so OccurrenceStatsViewSet can
reuse the same OccurrenceFilter pass-through (deployment, event, taxa lists,
verified, score thresholds, apply_defaults=false, etc).

The top_identifiers action keeps its current behavior — filter_queryset
is only invoked by actions that opt in.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds 6 HTTP-level tests: missing project_id 400, draft 404, empty zeros,
happy-path exact match, deployment filter pass-through, apply_defaults=false
score-threshold bypass.

Also adds DjangoFilterBackend to OccurrenceStatsViewSet.filter_backends so
filterset_fields (event, deployment, determination__rank, ...) actually take
effect. Without DjangoFilterBackend, filterset_fields are silently ignored
and ?deployment=N returns the unfiltered set.

Co-Authored-By: Claude <noreply@anthropic.com>
Mirrors useTopIdentifiers's useAuthorizedQuery pattern. Accepts an
arbitrary filter map so the occurrence list page can thread its filter
state through unchanged (deployment, event, taxon, score thresholds,
apply_defaults).

Co-Authored-By: Claude <noreply@anthropic.com>
… review fixes

Captures: review findings from Copilot + CodeRabbit, perf bench evidence
(43k rows → 159s timeout on apply_defaults=false), and the planned changes
for the next session (rename to model-agreement, push aggregation into
SQL/ORM, fix UNKNOWN rank LCA + denominator + verified_by_me anon gap +
test gaps).

Co-Authored-By: Claude <noreply@anthropic.com>
…ion to SQL

Addresses review feedback on PR #1307:

Rename (drop "human"):
- URL: /occurrences/stats/human-model-agreement/ -> /model-agreement/
- Function: human_model_agreement_for_project -> model_agreement_for_project
- Serializer: HumanModelAgreementSerializer -> ModelAgreementSerializer
- Viewset action + url_path: human_model_agreement -> model_agreement
- FE hook: useHumanModelAgreement -> useModelAgreement (file + symbol)
- FE type: Response -> ModelAgreementResponse (fixes DOM Response shadow)
- Test class: TestHumanModelAgreementForProject -> TestModelAgreementForProject

SQL push-down (Copilot+CodeRabbit perf flag):
- Replace list(qs) full-row materialization with annotated aggregate().
- Annotate best_user_taxon_id via Subquery over Identification
  (BEST_IDENTIFICATION_ORDER). Drop the prefetch + select_related("taxon")
  on identifications since only taxon_id is read.
- aggregate() Count(filter=Q(...)) for total/verified/exact/no-prediction.
- For under-order disagreement: group disagreement set by distinct
  (user_taxon, machine_taxon) pair before LCA. Each pair's LCA runs once.
- Bench against project 18 (43,149 occurrences): pre-rework apply_defaults=false
  curl timed out at 159s; post-rework 1.96s unfiltered / 3.4s with bypass
  (93,019 occurrences post-filter).

Denominator fix (Copilot):
- agreed_*_pct now divides by verified_with_prediction_count instead of
  verified_count. A verified occurrence with no machine prediction can't
  agree or disagree; including it in the denominator drags the rate down
  without representing actual model disagreement.
- Surface no_prediction_count + verified_with_prediction_count as sibling
  fields so consumers can see how many such occurrences exist.

UNKNOWN rank bug (Copilot):
- TaxonRank.UNKNOWN sorts after SPECIES in OrderedEnum definition order,
  so without explicit exclusion UNKNOWN >= ORDER is True and a shared
  UNKNOWN ancestor would wrongly count as under-order agreement. Filter
  UNKNOWN out of lca_rank_between's candidate ranks. Add regression test.

Tests:
- New: test_unknown_rank_excluded_from_lca (LCA regression)
- New: test_agreement_under_order_bucket (HTTP coverage for sister-species
  case, previously only exact-match shortcut was exercised)
- Updated: happy-path asserts verified_with_prediction_count and
  no_prediction_count.

22/22 backend tests green:
  docker compose exec django python manage.py test
    ami.main.tests.TestLcaRankBetween
    ami.main.tests.TestModelAgreementForProject
    ami.main.tests.TestOccurrenceStatsViewSet

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Replace the .aggregate() over the full filtered queryset with a two-step
approach:
  1. SQL Count('pk') for total_occurrences (no joins, no subqueries).
  2. Fetch the verified set (occurrences with at least one non-withdrawn
     ident) with both best_user_taxon_id and best_machine_prediction_taxon_id
     annotated, then bucket counts + LCA in Python.

Why: the previous version evaluated two correlated subqueries (best user
identification + best machine prediction) on every row of the filtered
queryset. For typical projects, >95% of occurrences have no identification
— those rows ran the user-ident subquery only to discover NULL, then ran
the (much more expensive) machine-prediction subquery on detections that
won't contribute to any agreement bucket. Scoping the subqueries to the
verified set avoids that waste.

Bench (cold, cache invalidated):

  Project                          Total    Verified   Pre      Post
  P#85 SEC-SEQ                     36,253   13,140     —        1.18s
  P#20 BCI                         40,958    1,351     —        0.92s
  P#84 Pennsylvania                18,407      251     —        0.56s
  P#24 Atlantic Forestry            2,797      274     —        0.50s
  P#18 Vermont                     43,149       45     ~928ms   0.35s
  P#23 Insectarium Montreal        20,393       74     —        0.43s

Warm via django-cachalot: 122–343ms across all projects.

For P#85 (highest absolute identification count in the system), the cost
is dominated by apply_default_filters' score-threshold join, not the
subqueries. apply_defaults=false actually runs faster (0.69s cold,
179,466 total / 13,140 verified) because the classification join is
skipped.

Co-Authored-By: Claude <noreply@anthropic.com>
… param

Replaces hardcoded `lca >= TaxonRank.ORDER` agreement gate with two layers:

- Always returned: `agreed_any_rank_*` — exact matches plus any non-null LCA
  at a real rank (UNKNOWN excluded). The upstream filter (e.g. a Lepidoptera
  include list) is what bounds the meaningful scope, not a hardcoded
  threshold in this function.
- Optional `?agreement_coarsest_rank=FAMILY`: when supplied, response also
  includes `agreed_coarser_rank_*` (exact + LCAs at or below the threshold).
  The applied rank is echoed in `agreement_coarsest_rank`; null when absent.

Also addresses CodeRabbit feedback on the existing branch:
- Dedupe base queryset before counting (joins from default-filter chain can
  inflate Occurrence rows).
- Bound `*_pct` FloatFields to [0.0, 1.0] in the serializer.

Param validation: invalid rank → 400; UNKNOWN rejected as not meaningful.
Tests cover any-rank fallback, threshold filtering, invalid + UNKNOWN
rejection, and threshold echo.

Co-Authored-By: Claude <noreply@anthropic.com>
…ry params

- Rename `agreed_under_order_*` → `agreed_any_rank_*` to match the endpoint's
  dropped ORDER threshold (0565f06).
- Add optional `agreement_coarsest_rank` + `agreed_coarser_rank_*` fields to
  the response type (not consumed yet — UI follows in #1308).
- Widen `filters` to accept arrays and append repeated query params so
  multi-value filters (e.g. `algorithm`, `not_algorithm` — backend reads via
  `request.query_params.getlist(...)`) survive. Per CodeRabbit review.

Co-Authored-By: Claude <noreply@anthropic.com>
Session-scratchpad doc — belongs in local notes, not the merged branch.

Co-Authored-By: Claude <noreply@anthropic.com>
- 2026-05-14-human-model-agreement-endpoint.md — design narrative; superseded
  by code + PR description.
- occurrence-filter-driven-exports.md — side-research stub Copilot flagged as
  out-of-scope. Promoted to a PR-description follow-up item.

Co-Authored-By: Claude <noreply@anthropic.com>
create_detections assigns the classification taxon via .order_by("?"),
so the previous test picked a random machine taxon and then required a
sister species under the same genus. Random non-species picks (ORDER /
FAMILY / GENUS) have no sister, flaking ~50% of runs.

Pin both the machine prediction and the human ID to two fixed Vanessa
species, so the LCA is always GENUS (any-rank bucket, not exact) and the
test is deterministic.

Co-Authored-By: Claude <noreply@anthropic.com>
useModelAgreement.ts belongs with the frontend consumer (#1308), not the
backend endpoint PR. Keeps #1307 backend-only.

Co-Authored-By: Claude <noreply@anthropic.com>
Both derive from the verified_rows already in memory — no extra query.

- wilson_interval(): 95% Wilson score CI on agreed_exact_pct and
  agreed_any_rank_pct (agreed_*_ci_low / _ci_high). Wilson stays inside
  [0,1] and is honest at the small n typical of verified sets, where the
  normal approximation breaks down.
- cohens_kappa(): exact-taxon agreement beyond chance (cohens_kappa
  field, range [-1, 1]). Null when no doubly-classified occurrences or
  expected agreement is 1.0. Discounts the agreement you'd get for free
  in a project dominated by one common species.

Adds 5 nullable response fields. Backwards-compatible (additive only).
9 pure-Python unit tests + 2 HTTP field-presence tests.

Co-Authored-By: Claude <noreply@anthropic.com>
Both are generic statistical helpers — they don't depend on Django or any
domain model. Lifting them out of ami/main/models_future/occurrence.py so
other endpoints/jobs that need binomial CIs or chance-corrected agreement
can import them without dragging in the occurrence module.

Same implementations, just relocated. Renamed parameter names on
cohens_kappa from (human, model) to (rater_a, rater_b) so the helper
reads as generic rather than human-vs-model specific.

Tests already use isolated `from ami.utils.stats import …` imports
(updated all 9 sites in ami/main/tests.py).

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai coderabbitai Bot requested a review from mihow May 28, 2026 00:18
@netlify

netlify Bot commented May 28, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 3d522db
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a1789df8daeec000838fe1d

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7610e23d-e83e-473e-afea-57d1b8256f92

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from feat/human-model-agreement-endpoint to main May 29, 2026 03:54
@mihow

mihow commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Claude says: Closing this — it's superseded by #1307, which merged on 2026-05-29 (the same day this PR was opened). The endpoint, ModelAgreementSerializer, model_agreement_for_project, and the filter backends this diff would add are all already on main, and the branch is now conflicting against it.

The one genuinely new idea here is the EnumChoiceField helper in ami/base/serializers.py — a reusable enum-backed ChoiceField that accepts member names case-insensitively and supports excluding sentinel values like UNKNOWN. That abstraction didn't land with #1307.

For reference, the rank param is currently validated inline in model_agreement (ami/main/api/views.py, around line 1463):

valid_ranks = [r.name for r in TaxonRank if r != TaxonRank.UNKNOWN]
raw_rank = request.query_params.get("agreement_coarsest_rank")
# Build a plain dict (not the QueryDict) so a blank value is validated as a real "".
# DRF treats blank fields in HTML/QueryDict input as absent, which would let
# ?agreement_coarsest_rank= silently no-op. Uppercase to stay case-insensitive.
rank_data = {} if raw_rank is None else {"agreement_coarsest_rank": raw_rank.upper()}
coarsest_rank_param = SingleParamSerializer[str].clean(
    param_name="agreement_coarsest_rank",
    field=serializers.ChoiceField(choices=valid_ranks, required=False, allow_blank=False),
    data=rank_data,
)
coarsest_rank = TaxonRank[coarsest_rank_param] if coarsest_rank_param else None

This works and gives strict 400s for blank, unknown, and UNKNOWN ranks, plus drf-spectacular reads the choices into the OpenAPI schema. EnumChoiceField would just DRY this up — worth doing once a second enum-backed query param appears.

Potential follow-up (good first issue): extract EnumChoiceField into ami/base/serializers.py as a fresh PR off current main, swap this call site to use it, and add a couple of unit tests covering the case-insensitive name match, the exclude rejection, and the blank-value 400.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant