Is your feature request related to a problem or challenge?
The current statistics_with_args / StatisticsArgs design (#21815) embeds cache lookup
and child traversal directly inside each operator's statistics_with_args override via args.compute_child_statistics(...). This means:
- Each operator must be aware of caching mechanics, coupling local propagation logic to the traversal strategy
- Evolving the traversal or cache model requires touching every operator implementation
Describe the solution you'd like
Introduce a stateless statistics_from_inputs method on ExecutionPlan (defaulting to Statistics::new_unknown) that expresses only local propagation logic from pre-computed child statistics:
fn statistics_from_inputs(
&self,
input_stats: &[Arc<Statistics>],
partition: Option<usize>,
) -> Result<Arc<Statistics>> {
Ok(Arc::new(Statistics::new_unknown(self.schema().as_ref())))
}
The external StatisticsContext owns traversal and cache management, calling statistics_from_inputs after resolving child statistics. statistics_with_args remains the public API and is unchanged.
Benefits
- Non-breaking:
statistics_from_inputs has a safe default; statistics_with_args is unchanged
- Operators that override
statistics_from_inputs automatically benefit from any future
improvements to the traversal/caching strategy without code changes
- Operators become easier to test in isolation (no need to construct
StatisticsArgs or
a plan tree)
Describe alternatives you've considered
Keep the current statistics_with_args design as-is. Each operator handles caching via args.compute_child_statistics(...). Works correctly but tightly couples propagation logic to traversal mechanics, making the cache model hard to evolve.
Additional context
Suggested by @2010YOUY01 in this comment as a follow-up for #21815
Is your feature request related to a problem or challenge?
The current
statistics_with_args/StatisticsArgsdesign (#21815) embeds cache lookupand child traversal directly inside each operator's
statistics_with_argsoverride viaargs.compute_child_statistics(...). This means:Describe the solution you'd like
Introduce a stateless
statistics_from_inputsmethod onExecutionPlan(defaulting toStatistics::new_unknown) that expresses only local propagation logic from pre-computed child statistics:The external
StatisticsContextowns traversal and cache management, callingstatistics_from_inputsafter resolving child statistics.statistics_with_argsremains the public API and is unchanged.Benefits
statistics_from_inputshas a safe default;statistics_with_argsis unchangedstatistics_from_inputsautomatically benefit from any futureimprovements to the traversal/caching strategy without code changes
StatisticsArgsora plan tree)
Describe alternatives you've considered
Keep the current
statistics_with_argsdesign as-is. Each operator handles caching viaargs.compute_child_statistics(...). Works correctly but tightly couples propagation logic to traversal mechanics, making the cache model hard to evolve.Additional context
Suggested by @2010YOUY01 in this comment as a follow-up for #21815