feat(experimentation): results aggregation query and payload builder by gagantrivedi · Pull Request #7781 · Flagsmith/flagsmith

gagantrivedi · 2026-06-15T10:11:48Z

I have read the Contributing Guide.
I have added information to docs/ if required so people know about the feature. (deferred — internal; docs land with the results UI.)
I have filled in the "Changes" section below.
I have filled in the "How did you test this code" section below.

Changes

Contributes to the experimentation results scorecard (v0.2): the ClickHouse aggregation and pure payload builder that feed the stats kernel (#7769). Model, endpoints, task and the ORM orchestrator (metric specs + per-environment expected shares) follow in the next PR.

Results query — one pass: a shared exposures CTE (first-exposure dedup, quarantine, half-open window) joined to post-exposure metric events, conditionally aggregated per metric into (n, sum, sum_squares) plus per-variant identity counts for SRM. Aggregations: occurrence / count / sum / mean; the join is window-bounded so ClickHouse range-scans on the sort key.
build_results_summary — pure: compare_to_control per treatment + srm_p_value on the counts. Inference is withheld below the data floor (n ≥ 50/arm, ≥ 5 conversions/arm for occurrence) and SRM below 100 identities; chance-to-win is flipped for lower-is-better metrics. Raw per-variant stats are kept; means/status/ordering are derived client-side.
Dataclasses MetricSpec / ResultsAggregates / MetricResult / ResultsSummary (reusing the kernel's VariantStats / Inference); asdict is the wire shape.

How did you test this code?

Unit (faked client): row mapping, per-aggregation expressions, windowed post-exposure join, no-metrics path; builder data floors, lower-is-better flip, SRM balanced/imbalanced/not-computable, exact wire shape via asdict.
Ran the query against a local ClickHouse with seeded data — confirmed the numbers and caught a real JOIN-ON error the substring tests couldn't.
pytest tests/unit/experimentation/ — 285 passed; ruff + mypy strict clean.

Note

CI has no ClickHouse (parked), so the query's CH-specific semantics are asserted by SQL substring and were checked manually against a local instance; the mean/avgIf path stays the least-covered until ClickHouse-in-CI lands.

vercel · 2026-06-15T10:11:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Jun 16, 2026 8:02am

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
flagsmith-frontend-preview	Ignored	Preview	Jun 16, 2026 8:02am
flagsmith-frontend-staging	Ignored	Preview	Jun 16, 2026 8:02am

codecov · 2026-06-15T10:18:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.57%. Comparing base (4ec3d45) to head (1cdb491).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #7781    +/-   ##
========================================
  Coverage   98.57%   98.57%            
========================================
  Files        1462     1462            
  Lines       56566    56762   +196     
========================================
+ Hits        55759    55955   +196     
  Misses        807      807

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ClickHouse rejects an ON clause mixing left and right columns in an inequality; verified against ClickHouse 24.8 with seeded data.

gagantrivedi · 2026-06-16T07:39:57Z

/gemini review

…umbers

gemini-code-assist

Code Review

This pull request introduces the core backend logic for calculating experiment results and Sample Ratio Mismatch (SRM) from warehouse data. It adds new dataclasses, ClickHouse query generation for different metric aggregations (occurrence, count, sum, mean), and statistical inference logic comparing treatments to control with minimum thresholds. It also includes comprehensive unit tests. Feedback from the reviewer highlights two key areas for improvement: a performance optimization in the ClickHouse query to restrict joined events to the experiment window by adding m.timestamp >= %(window_start)s to the LEFT JOIN ON clause, and a robustness improvement to normalize expected_shares so they sum to 1.0 before calculating the SRM p-value to prevent potential mathematical errors or false-positive mismatches.

…acts

github-actions · 2026-06-16T08:38:01Z

Docker builds report

Image	Build Status	Security report
`ghcr.io/flagsmith/flagsmith-e2e:pr-7781`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-api-test:pr-7781`	Finished ✅	Skipped
`ghcr.io/flagsmith/flagsmith-frontend:pr-7781`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-api:pr-7781`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith:pr-7781`	Finished ✅	Results ✅
`ghcr.io/flagsmith/flagsmith-private-cloud:pr-7781`	Finished ✅	Results ✅

github-actions · 2026-06-16T08:44:39Z

Playwright Test Results (oss - depot-ubuntu-latest-16)

1 passed

Details

1 test across 1 suite
32 seconds
1cdb491
🔄 Run: #17533 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

1 passed

Details

1 test across 1 suite
36.7 seconds
1cdb491
🔄 Run: #17533 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

2 failed
3 passed

Details

5 tests across 5 suites
17.7 seconds
1cdb491
📦 Artifacts: View test results and HTML report
🔄 Run: #17533 (attempt 1)

Failed tests

firefox › tests/environment-permission-test.pw.ts › Environment Permission Tests › Environment-level permissions control access to features, identities, and segments @enterprise
firefox › tests/versioning-tests.pw.ts › Versioning tests - Create, edit, and compare feature versions @oss

### Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

2 passed

Details

2 tests across 2 suites
46.1 seconds
1cdb491
🔄 Run: #17533 (attempt 2)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

3 passed

Details

3 tests across 3 suites
39.7 seconds
1cdb491
🔄 Run: #17533 (attempt 3)

github-actions · 2026-06-16T08:45:19Z

Visual Regression

19 screenshots compared. See report for details.
View full report

github-actions Bot added api Issue related to the REST API feature New feature or request labels Jun 15, 2026

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 15, 2026

Base automatically changed from feat/experiment-stats-kernel to main June 16, 2026 06:42

gagantrivedi added 3 commits June 16, 2026 12:19

feat(experimentation): results aggregation query and payload builder

468a853

refactor(experimentation): bundle results aggregates and harden decode

9467a33

fix(experimentation): move post-exposure attribution out of JOIN ON

4694326

ClickHouse rejects an ON clause mixing left and right columns in an inequality; verified against ClickHouse 24.8 with seeded data.

gagantrivedi force-pushed the feat/experiment-results-query branch from bdfef40 to 4694326 Compare June 16, 2026 06:51

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 16, 2026

refactor(experimentation): share the exposures CTE between queries

1092621

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 16, 2026

docs(experimentation): regenerate events catalogue for shifted line n…

edb3588

…umbers

vercel Bot deployed to Preview – docs June 16, 2026 07:41 View deployment

github-actions Bot added the docs Documentation updates label Jun 16, 2026

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread api/experimentation/services.py

Comment thread api/experimentation/services.py

github-actions Bot added feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026

perf(experimentation): prune pre-window events from the metric join

965efc8

vercel Bot deployed to Preview – docs June 16, 2026 07:54 View deployment

github-actions Bot added docs Documentation updates and removed feature New feature or request labels Jun 16, 2026

github-actions Bot added feature New feature or request and removed docs Documentation updates labels Jun 16, 2026

docs(experimentation): trim the shared CTE comment to non-derivable f…

1cdb491

…acts

github-actions Bot added the docs Documentation updates label Jun 16, 2026

vercel Bot deployed to Preview – docs June 16, 2026 08:02 View deployment

github-actions Bot added feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026

gagantrivedi marked this pull request as ready for review June 16, 2026 08:36

gagantrivedi requested review from a team as code owners June 16, 2026 08:36

gagantrivedi requested review from emyller and removed request for a team June 16, 2026 08:36

flagsmith-engineering Bot assigned emyller Jun 16, 2026

github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 16, 2026

gagantrivedi requested review from Zaimwa9 and removed request for a team and emyller June 16, 2026 08:41

flagsmith-engineering Bot assigned Zaimwa9 Jun 16, 2026

gagantrivedi unassigned emyller and Zaimwa9 Jun 16, 2026

gagantrivedi mentioned this pull request Jun 16, 2026

feat(experimentation): experiment results model, task and endpoints #7796

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(experimentation): results aggregation query and payload builder#7781

feat(experimentation): results aggregation query and payload builder#7781
gagantrivedi wants to merge 7 commits into
mainfrom
feat/experiment-results-query

gagantrivedi commented Jun 15, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

gagantrivedi commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gagantrivedi commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

How did you test this code?

Uh oh!

vercel Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gagantrivedi commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docker builds report

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Playwright Test Results (oss - depot-ubuntu-latest-16)

Details

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

Details

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

Details

Details

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

Details

Uh oh!

github-actions Bot commented Jun 16, 2026

Visual Regression

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gagantrivedi commented Jun 15, 2026 •

edited

Loading

vercel Bot commented Jun 15, 2026 •

edited

Loading

codecov Bot commented Jun 15, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading