Skip to content

Reverse order scans#7777

Draft
ch-sc wants to merge 4 commits intovortex-data:developfrom
ch-sc:reverse-order-scans
Draft

Reverse order scans#7777
ch-sc wants to merge 4 commits intovortex-data:developfrom
ch-sc:reverse-order-scans

Conversation

@ch-sc
Copy link
Copy Markdown

@ch-sc ch-sc commented May 4, 2026

Summary

Reverse order scans are an optimization for queries like ORDER BY timestamp DESC LIMIT n where the data is ordered by timestamp ASC. Such read patterns appear constantly in time-series workloads where callers want the most recent rows. With the current implementation users would follow naive approaches: fully scan a Vortex file, buffer all rows and then reverse the output or sort all rows of the file. This is unnecessarily expensive.

If files are already written in sorted order, a scan in opposite direction can be answered by iterating chunks from last to first and reversing the rows within each chunk. Avoiding sorting and buffering. This PR implements this by reversing ranges in the scan layer and reversing the Vortex array representation.

Closes: #7787

Implementation

The work spans two layers: the scan orchestration layer (vortex-layout) and the array encoding layer (vortex-array).

Scan layer (vortex-layout)

ScanBuilder gains a with_reversed(bool) builder method. When set:

  • RepeatedScan::execute collects the chunk ranges and iterates them in reverse order (last chunk first). This is the global reversal — chunk order is flipped for free by reversing a Vec of ranges.
  • The map_fn closure wraps the user-supplied function to call array.reverse() on each chunk before passing it downstream. This is the per-chunk reversal — row order within each chunk is flipped.

Reversed scans are always ordered (they produce a strict global sequence), so ordered = true is implied.

Array layer (vortex-array) — ReversedArray

ReversedArray is a new lazy wrapper encoding. It is constructed by ArrayRef::reverse() and immediately runs through the optimizer. The optimizer fires structural reduce rules at construction time, before any data is read:

Reduce rules:

Pattern Result Cost
Reversed(Reversed(x)) x Zero — both wrappers cancelled
Reversed(Dict(codes, values)) Dict(Reversed(codes), values) Reverse only the codes array; values dictionary reused
Reversed(Chunked([c₀, c₁, …, cₙ])) Chunked([reverse(cₙ), …, reverse(c₁),reverse(c₀)]) Chunk order flipped; each chunk wrapped in Reversed and re-optimized recursively

The Dict rule is the most important one. Reversing a Dict means reversing only the codes, not the values.

Execute kernels:

Canonical type Path
Primitive Iterates the typed buffer backwards — O(n), sequential, auto-vectorizable
Bool Reads bits in reverse via BitBuffer::value_unchecked — O(n), no intermediate allocation
Struct Calls field.reverse() on each child — per-field optimizer rules still fire
All others Falls back to take(reversed_indices)

API Changes

New surface in vortex-array:

  • ArrayRef::reverse() -> VortexResult<ArrayRef> — reverse any array lazily
  • Reversed / ReversedArray — the new encoding type (public, can be pattern-matched)
  • ReverseReduce trait + ReverseReduceAdaptor struct — extension point for custom encodings

New surface in vortex-layout:

  • ScanBuilder::with_reversed(bool) -> Self
  • ScanBuilder::reversed() -> bool

No breaking changes. All changes are additive.

Testing

vortex-array/src/arrays/reversed/tests.rs covers 13 cases for PrimitiveArray, BoolArray, DictArray, StructArray, and ChunkedArray.

ch-sc added 4 commits May 4, 2026 12:20
…on.io>

I, Christoph Schulze <christoph.schulze@polygon.io>, hereby add my Signed-off-by to this commit: 0e64d5e
I, Christoph Schulze <christoph.schulze@polygon.io>, hereby add my Signed-off-by to this commit: 96a951e

Signed-off-by: Christoph Schulze <christoph.schulze@polygon.io>
@connortsui20
Copy link
Copy Markdown
Contributor

Hi, thanks for the PR!

Could you create a discussion for this? It's not clear to me that this is how we would want to implement this.

I agree it might be nice to have the functionality to reverse scan. However, we might not want to implement this as an array encoding.

We also consider Vortex to be a "scalar" query engine, where we essentially always know where values are located in a column (row indices), and thus ORDER BY and JOINs are not supported in Vortex. So this might not fit our model, but at the same time since it is literally just a reversal of the direction in which we scan, maybe this can fit?

Regardless, this likely needs some discussion before we can move forward. Let us know if you have any questions!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 4, 2026

Merging this PR will degrade performance by 24.99%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 1 regressed benchmark
✅ 1168 untouched benchmarks
⏩ 138 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation new_bp_prim_test_between[i64, 32768] 177.3 µs 236.4 µs -24.99%

Comparing ch-sc:reverse-order-scans (70ebbce) with develop (44a6367)

Open in CodSpeed

Footnotes

  1. 138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@ch-sc ch-sc mentioned this pull request May 5, 2026
@ch-sc
Copy link
Copy Markdown
Author

ch-sc commented May 5, 2026

Hi @connortsui20, thanks for the feedback. I created an issue: #7787 to discuss implementation details further.

We also consider Vortex to be a "scalar" query engine, where we essentially always know where values are located in a column (row indices), and thus ORDER BY and JOINs are not supported in Vortex. So this might not fit our model, but at the same time since it is literally just a reversal of the direction in which we scan, maybe this can fit?

I totally see where you are coming from. I think this really is a data access optimization which can be applied when data properties line up. It doesn't add sorting capabilities to Vortex. There might be things that I have overlooked though - I'm fairly new to Vortex.

@connortsui20
Copy link
Copy Markdown
Contributor

connortsui20 commented May 5, 2026

Hi @ch-sc, just a heads up that I converted the issue to a discussion. Also, if you haven't already, please feel free to join the public Vortex Slack! I'm going to post the discussion there since I feel other people might have some thoughts on this.

In the meantime, I'm going to make this PR a draft.

@connortsui20 connortsui20 marked this pull request as draft May 5, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reverse order scans

2 participants