Skip to content

feat(experimentation): results aggregation query and payload builder#7781

Open
gagantrivedi wants to merge 7 commits into
mainfrom
feat/experiment-results-query
Open

feat(experimentation): results aggregation query and payload builder#7781
gagantrivedi wants to merge 7 commits into
mainfrom
feat/experiment-results-query

Conversation

@gagantrivedi

@gagantrivedi gagantrivedi commented Jun 15, 2026

Copy link
Copy Markdown
Member
  • I have read the Contributing Guide.
  • I have added information to docs/ if required so people know about the feature. (deferred — internal; docs land with the results UI.)
  • I have filled in the "Changes" section below.
  • I have filled in the "How did you test this code" section below.

Changes

Contributes to the experimentation results scorecard (v0.2): the ClickHouse aggregation and pure payload builder that feed the stats kernel (#7769). Model, endpoints, task and the ORM orchestrator (metric specs + per-environment expected shares) follow in the next PR.

  • Results query — one pass: a shared exposures CTE (first-exposure dedup, quarantine, half-open window) joined to post-exposure metric events, conditionally aggregated per metric into (n, sum, sum_squares) plus per-variant identity counts for SRM. Aggregations: occurrence / count / sum / mean; the join is window-bounded so ClickHouse range-scans on the sort key.
  • build_results_summary — pure: compare_to_control per treatment + srm_p_value on the counts. Inference is withheld below the data floor (n ≥ 50/arm, ≥ 5 conversions/arm for occurrence) and SRM below 100 identities; chance-to-win is flipped for lower-is-better metrics. Raw per-variant stats are kept; means/status/ordering are derived client-side.
  • Dataclasses MetricSpec / ResultsAggregates / MetricResult / ResultsSummary (reusing the kernel's VariantStats / Inference); asdict is the wire shape.

How did you test this code?

  • Unit (faked client): row mapping, per-aggregation expressions, windowed post-exposure join, no-metrics path; builder data floors, lower-is-better flip, SRM balanced/imbalanced/not-computable, exact wire shape via asdict.
  • Ran the query against a local ClickHouse with seeded data — confirmed the numbers and caught a real JOIN-ON error the substring tests couldn't.
  • pytest tests/unit/experimentation/ — 285 passed; ruff + mypy strict clean.

Note

CI has no ClickHouse (parked), so the query's CH-specific semantics are asserted by SQL substring and were checked manually against a local instance; the mean/avgIf path stays the least-covered until ClickHouse-in-CI lands.

@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 16, 2026 8:02am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
flagsmith-frontend-preview Ignored Ignored Preview Jun 16, 2026 8:02am
flagsmith-frontend-staging Ignored Ignored Preview Jun 16, 2026 8:02am

Request Review

@github-actions github-actions Bot added api Issue related to the REST API feature New feature or request labels Jun 15, 2026
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.57%. Comparing base (4ec3d45) to head (1cdb491).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #7781    +/-   ##
========================================
  Coverage   98.57%   98.57%            
========================================
  Files        1462     1462            
  Lines       56566    56762   +196     
========================================
+ Hits        55759    55955   +196     
  Misses        807      807            

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 15, 2026
Base automatically changed from feat/experiment-stats-kernel to main June 16, 2026 06:42
@gagantrivedi gagantrivedi force-pushed the feat/experiment-results-query branch from bdfef40 to 4694326 Compare June 16, 2026 06:51
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 16, 2026
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 16, 2026
@gagantrivedi

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the core backend logic for calculating experiment results and Sample Ratio Mismatch (SRM) from warehouse data. It adds new dataclasses, ClickHouse query generation for different metric aggregations (occurrence, count, sum, mean), and statistical inference logic comparing treatments to control with minimum thresholds. It also includes comprehensive unit tests. Feedback from the reviewer highlights two key areas for improvement: a performance optimization in the ClickHouse query to restrict joined events to the experiment window by adding m.timestamp >= %(window_start)s to the LEFT JOIN ON clause, and a robustness improvement to normalize expected_shares so they sum to 1.0 before calculating the SRM p-value to prevent potential mathematical errors or false-positive mismatches.

Comment thread api/experimentation/services.py
Comment thread api/experimentation/services.py
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026
@github-actions github-actions Bot added docs Documentation updates and removed feature New feature or request labels Jun 16, 2026
@github-actions github-actions Bot added feature New feature or request and removed docs Documentation updates labels Jun 16, 2026
@github-actions github-actions Bot added the docs Documentation updates label Jun 16, 2026
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026
@gagantrivedi gagantrivedi marked this pull request as ready for review June 16, 2026 08:36
@gagantrivedi gagantrivedi requested review from a team as code owners June 16, 2026 08:36
@gagantrivedi gagantrivedi requested review from emyller and removed request for a team June 16, 2026 08:36
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 16, 2026
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-7781 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-7781 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-frontend:pr-7781 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-7781 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-7781 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-7781 Finished ✅ Results

@gagantrivedi gagantrivedi requested review from Zaimwa9 and removed request for a team and emyller June 16, 2026 08:41
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Playwright Test Results (oss - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  32 seconds
commit  1cdb491
info  🔄 Run: #17533 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  36.7 seconds
commit  1cdb491
info  🔄 Run: #17533 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

failed  2 failed
passed  3 passed

Details

stats  5 tests across 5 suites
duration  17.7 seconds
commit  1cdb491
info  📦 Artifacts: View test results and HTML report
🔄 Run: #17533 (attempt 1)

Failed tests

firefox › tests/environment-permission-test.pw.ts › Environment Permission Tests › Environment-level permissions control access to features, identities, and segments @enterprise
firefox › tests/versioning-tests.pw.ts › Versioning tests - Create, edit, and compare feature versions @oss

### Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

passed  2 passed

Details

stats  2 tests across 2 suites
duration  46.1 seconds
commit  1cdb491
info  🔄 Run: #17533 (attempt 2)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

passed  3 passed

Details

stats  3 tests across 3 suites
duration  39.7 seconds
commit  1cdb491
info  🔄 Run: #17533 (attempt 3)

@github-actions

Copy link
Copy Markdown
Contributor

Visual Regression

19 screenshots compared. See report for details.
View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants