Skip to content

[TASK-87] Optimize copying in CPP bindings#330

Open
fresh-borzoni wants to merge 4 commits intoapache:mainfrom
fresh-borzoni:cpp-memory-optimize-opaque-types
Open

[TASK-87] Optimize copying in CPP bindings#330
fresh-borzoni wants to merge 4 commits intoapache:mainfrom
fresh-borzoni:cpp-memory-optimize-opaque-types

Conversation

@fresh-borzoni
Copy link
Contributor

@fresh-borzoni fresh-borzoni commented Feb 15, 2026

Summary

closes #87

Replace value-copying FFI layer with opaque Rust-backed types for zero-copy reads and cheaper writes in CPP bindings.

Changes

  • Opaque types: replace Datum/FfiDatum/FfiGenericRow/FfiScanRecord copying infrastructure with three opaque Rust types — GenericRowInner (writes), ScanResultInner (scan reads), LookupResultInner (lookup reads)
  • Zero-copy reads: new RowView class borrows directly from Rust scan results; LookupResult class replaces (bool found, GenericRow out) pattern

Benchmark Results

Environment: Apple M1 Max, 32 GB RAM, macOS 15.7.1, local fluss cluster
Workload: 1M rows per run, averaged over 100 runs

Two scan configurations:

  • 8 mixed fields: int, string, float, int, date, time, timestamp, timestamp_ltz
  • 4×100B strings: int + 4 string fields (100 bytes each, ~381 MB raw data)

Each scan polls all 1M records from the server, accumulates them in memory,
then reads every field. Reported time covers both phases (poll + field access).

Memory is measured as C++ heap delta (malloc_zone_statistics size_in_use)
before and after accumulating all records.

Throughput

Metric main opaque types Change
Append fire-and-forget 1,634 ms 1,176 ms -28%
Append with ack 1,662 ms 1,109 ms -33%
Scan (8 mixed fields) 639 ms 222 ms -65%
Scan (4×100B strings) 1,115 ms 349 ms -69%

Memory (held during scan)

Metric main opaque types Change
Scan (8 mixed fields) 915 MB 119 MB -87%
Scan (4×100B strings) 1,344 MB 471 MB -65%

Memory figures reflect worst-case accumulation. In typical streaming usage memory stays flat, but per-batch overhead is still reduced since the intermediate FfiDatum copy layer is eliminated entirely.

@fresh-borzoni
Copy link
Contributor Author

@zhaohaidao @luoyuxia PTAL 🙏

@luoyuxia
Copy link
Contributor

cc @leekeiabstraction

@luoyuxia
Copy link
Contributor

@fresh-borzoni Thanks for the pr. Maybe you can share the performance improvement benchmark in here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the C++ bindings FFI layer to remove per-row value copying and instead use opaque Rust-backed types for writes and reads, enabling zero-copy scan reads and a more efficient lookup API.

Changes:

  • Replace FfiDatum/FfiGenericRow/FfiScanRecords copying with opaque Rust types (GenericRowInner, ScanResultInner, LookupResultInner) exposed via cxx.
  • Add zero-copy read APIs in C++ (RowView, ScanRecords iterator) and replace lookup (found, out_row) with LookupResult.
  • Add AppendArrowBatch to write Arrow RecordBatches via the Arrow C Data Interface.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
bindings/cpp/src/types.rs Updates type-id mappings (adds Char/Binary) and introduces resolve_row_types / owned conversion helpers for the new opaque-row write/read paths.
bindings/cpp/src/table.cpp Implements C++ wrappers for the new opaque types (GenericRow, RowView, ScanRecords, LookupResult) and updates writers/scanners to use them.
bindings/cpp/src/lib.rs Replaces row/scan/lookup copy-based FFI structs with opaque Rust types and accessor methods; adds Arrow batch import for append.
bindings/cpp/src/ffi_converter.hpp Removes old GenericRow/ScanRecords conversions and adds support for custom_properties in table descriptors/info.
bindings/cpp/include/fluss.hpp Updates the public C++ API: write-only GenericRow, new RowView/ScanRecords/LookupResult, new ChangeType, and custom_properties.
bindings/cpp/examples/kv_example.cpp Migrates lookup example to LookupResult and demonstrates name-based getters.
bindings/cpp/examples/example.cpp Migrates scan iteration to the new ScanRecords API and adds an AppendArrowBatch example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fresh-borzoni
Copy link
Contributor Author

fresh-borzoni commented Feb 15, 2026

Addressed comments.
Added simple benchmark table to demonstrate the difference. It's with my local fluss cluster, so it's just for demonstration.

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Not used to C++ so a bit out of water here but left some comments.

Copy link
Contributor

@zhaohaidao zhaohaidao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thanks for your pr. left minor comments.

@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction @zhaohaidao
Ty for the review. Addressed comments.
PTAL 🙏

@fresh-borzoni fresh-borzoni force-pushed the cpp-memory-optimize-opaque-types branch from 5975580 to 9fd91b3 Compare February 15, 2026 15:37
Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the revision! Left a couple of comments

@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction @zhaohaidao Ty for the review!

Addressed comments.
PTAL 🙏

@fresh-borzoni fresh-borzoni force-pushed the cpp-memory-optimize-opaque-types branch from 47ad6df to 0bb4679 Compare February 15, 2026 23:52
Copy link
Contributor

@zhaohaidao zhaohaidao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thank you. Only left one comment.

@fresh-borzoni
Copy link
Contributor Author

@zhaohaidao Thanks for the review. Answered comment. PTAL 🙏

@zhaohaidao
Copy link
Contributor

@zhaohaidao Thanks for the review. Answered comment. PTAL 🙏

Thanks for your explanation. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize copying during C++ and Rust data conversion.

4 participants