Make block_size flexible#235
Make block_size flexible#235razdoburdin wants to merge 11 commits intointel:rfsaliev/cpp-runtime-bindingfrom
Conversation
ahuber21
left a comment
There was a problem hiding this comment.
Great! This should do what we want it to do. I'd refine the tests a bit and maybe improve the syntax. We can brainstorm together what the cleanest implementation would look like.
| ) { | ||
| auto threadpool = default_threadpool(); | ||
|
|
||
| auto blocksize_bytes = svs::lib::PowerOfTwo(blocksize_exp); |
There was a problem hiding this comment.
Since this is a runtime value, we'd probably need some validation. Do you think it could be a good idea to create a struct similar to PowerOfTwo and use that as function arg rather than the plain int. We could perform some boundary checks in the constructor.
There was a problem hiding this comment.
Yes, that make sense.
What values of exponents are acceptable? In the best of my understanding, blocksize_bytes should be in range [4KB, 1GB] (based on possible page sizes
) , right?There was a problem hiding this comment.
Yes, that sounds reasonable. @mihaic did you ever experiment with even bigger hugepages?
There was a problem hiding this comment.
Since Xeon only goes to 1 GiB, we can have that as the limit.
There was a problem hiding this comment.
I have added check on object construction
bindings/cpp/tests/runtime_test.cpp
Outdated
| std::vector<size_t> labels(test_n); | ||
| std::iota(labels.begin(), labels.end(), 0); | ||
|
|
||
| int block_size_exp = 17; // block_size = 2^block_size_exp |
There was a problem hiding this comment.
Beyond testing if passing the param works, we should also test if the block size actually changed.
There was a problem hiding this comment.
I have added getter, to check the actual value of blocksize
| // Abstract interface for Dynamic Vamana-based indexes. | ||
| struct SVS_RUNTIME_API DynamicVamanaIndex : public VamanaIndex { | ||
| virtual Status add(size_t n, const size_t* labels, const float* x) noexcept = 0; | ||
| virtual Status add(size_t n, const size_t* labels, const float* x, int blocksize_exp = 30) noexcept = 0; |
There was a problem hiding this comment.
I'm wondering if there's a cleaner way to communicate this value to init_impl(). Maybe @rfsaliev has an opinion?
rfsaliev
left a comment
There was a problem hiding this comment.
Thank you for the PR, there is some functionality to be implemented.
| struct SVS_RUNTIME_API DynamicVamanaIndex : public VamanaIndex { | ||
| virtual Status add(size_t n, const size_t* labels, const float* x) noexcept = 0; | ||
| virtual Status | ||
| add(size_t n, const size_t* labels, const float* x, int blocksize_exp = 30 |
There was a problem hiding this comment.
Seems like this approach manages block size for an index created from scratch.
How will we manage block size for an index loaded from a stream?
There was a problem hiding this comment.
blocksize is directly loaded from stream together with all the data, so for indexes being already saved the default blocksize would be used (as far as it has been already saved).
I have added a check in runtime tests to verify that blocksize is writed and saved correctly.
bindings/cpp/src/svs_runtime_utils.h
Outdated
| static StorageType init( | ||
| const svs::data::ConstSimpleDataView<float>& data, | ||
| Pool& pool, | ||
| svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes) |
There was a problem hiding this comment.
SQDataset<>::compress() accepts customized allocator via allocator argument...
bindings/cpp/src/svs_runtime_utils.h
Outdated
| static StorageType init( | ||
| const svs::data::ConstSimpleDataView<float>& data, | ||
| Pool& pool, | ||
| svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes) |
There was a problem hiding this comment.
LVQDataset<>::compress() accepts customized allocator via allocator argument...
bindings/cpp/src/svs_runtime_utils.h
Outdated
| static StorageType init( | ||
| const svs::data::ConstSimpleDataView<float>& data, | ||
| Pool& pool, | ||
| svs::lib::PowerOfTwo SVS_UNUSED(blocksize_bytes), |
There was a problem hiding this comment.
LeanDataset<>::compress() accepts customized allocator via allocator argument...
ethanglaser
left a comment
There was a problem hiding this comment.
We can fix CI with a new LTO static binary produced after merging this into rfsaliev/cpp-runtime-binding
|
@razdoburdin please re-open PR to main branch now that #208 was merged |
This PR adds ability to set custom index blocksize Reopening of #235 The new parameter structure for build method of `DynamicVamanaIndex` and `DynamicVamanaIndexLeanVec` is introduced. Example of building a LeanVec index with block_size = 2^12 ``` svs::runtime::v0::DynamicVamanaIndex* index = nullptr; svs::runtime::v0::VamanaIndex::BuildParams build_params{64}; svs::runtime::v0::VamanaIndex::DynamicIndexParams dynamic_index_params{12}; svs::runtime::v0::Status status = svs::runtime::v0::DynamicVamanaIndexLeanVec::build( &index, test_d, svs::runtime::v0::MetricType::L2, svs::runtime::v0::StorageKind::LeanVec4x4, 32, build_params, {}, dynamic_index_params ); ``` --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com> Co-authored-by: Andreas Huber <9201869+ahuber21@users.noreply.github.com>
This small PR adds an ability to set block_size value while building index. If parameter isn't set the default value of 2^30 is used.