feat(experimentation): results aggregation query and payload builder#7781
feat(experimentation): results aggregation query and payload builder#7781gagantrivedi wants to merge 7 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7781 +/- ##
========================================
Coverage 98.57% 98.57%
========================================
Files 1462 1462
Lines 56566 56762 +196
========================================
+ Hits 55759 55955 +196
Misses 807 807 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
ClickHouse rejects an ON clause mixing left and right columns in an inequality; verified against ClickHouse 24.8 with seeded data.
bdfef40 to
4694326
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the core backend logic for calculating experiment results and Sample Ratio Mismatch (SRM) from warehouse data. It adds new dataclasses, ClickHouse query generation for different metric aggregations (occurrence, count, sum, mean), and statistical inference logic comparing treatments to control with minimum thresholds. It also includes comprehensive unit tests. Feedback from the reviewer highlights two key areas for improvement: a performance optimization in the ClickHouse query to restrict joined events to the experiment window by adding m.timestamp >= %(window_start)s to the LEFT JOIN ON clause, and a robustness improvement to normalize expected_shares so they sum to 1.0 before calculating the SRM p-value to prevent potential mathematical errors or false-positive mismatches.
Docker builds report
|
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Failed testsfirefox › tests/environment-permission-test.pw.ts › Environment Permission Tests › Environment-level permissions control access to features, identities, and segments @enterprise Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
|
Visual Regression19 screenshots compared. See report for details. |
docs/if required so people know about the feature. (deferred — internal; docs land with the results UI.)Changes
Contributes to the experimentation results scorecard (v0.2): the ClickHouse aggregation and pure payload builder that feed the stats kernel (#7769). Model, endpoints, task and the ORM orchestrator (metric specs + per-environment expected shares) follow in the next PR.
(n, sum, sum_squares)plus per-variant identity counts for SRM. Aggregations:occurrence/count/sum/mean; the join is window-bounded so ClickHouse range-scans on the sort key.build_results_summary— pure:compare_to_controlper treatment +srm_p_valueon the counts. Inference is withheld below the data floor (n ≥ 50/arm, ≥ 5 conversions/arm for occurrence) and SRM below 100 identities; chance-to-win is flipped for lower-is-better metrics. Raw per-variant stats are kept; means/status/ordering are derived client-side.MetricSpec/ResultsAggregates/MetricResult/ResultsSummary(reusing the kernel'sVariantStats/Inference);asdictis the wire shape.How did you test this code?
asdict.pytest tests/unit/experimentation/— 285 passed;ruff+mypystrict clean.Note
CI has no ClickHouse (parked), so the query's CH-specific semantics are asserted by SQL substring and were checked manually against a local instance; the
mean/avgIfpath stays the least-covered until ClickHouse-in-CI lands.