feat: support mla disagg pd pull mode via mooncake transfer on mlu device.#1167
feat: support mla disagg pd pull mode via mooncake transfer on mlu device.#1167phantomlei3 wants to merge 4 commits intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables Disaggregated Prefill/Decode (PD) support for the MLU backend by integrating and refactoring the Mooncake transfer engine. Key changes include the introduction of a descriptor-based memory registration system (using BufferDesc and MemInfo) and a new GetRegisteredMemory RPC to synchronize memory layouts between nodes, which is essential for supporting complex architectures like MLA. Additionally, the PR enhances validation logic in LLMEngine::pull_kv_blocks, automates device_ip resolution, and adds comprehensive unit tests for the transfer engine. Review feedback identifies several opportunities to align with the project's style guide, specifically regarding the use of emplace_back for vector operations and the annotation of constant arguments in function calls.
c8d119d to
f024a9f
Compare
| return os.str(); | ||
| } | ||
|
|
||
| std::vector<proto::BufferDesc> to_proto_buffer_descs( |
There was a problem hiding this comment.
The transfer_engine is designed to facilitate data transfer between specified source and destination addresses. It should remain agnostic to the higher-level semantics of the data being transferred, whether it's KV cache, model weights, or any other type.
| size_per_block_ = size_per_block; | ||
| num_layers_ = reg_layers; | ||
| local_memory_info_ = std::move(mem_info); | ||
| core_.set_registered_memory_info(addrs, lens, size_per_block_, num_layers_); |
There was a problem hiding this comment.
these engine core modifications been tested on NPU?
There was a problem hiding this comment.
Not tested due to the fact that NPU is currently not using Mooncake as disagg PD backend. But I think it is going t o be fine on NPU since MLU and NPU has no difference on engine part via Mooncake. We can test it in the future if NPU has switched to Mooncake
| } | ||
|
|
||
| Transport::SegmentHandle handle; | ||
| handle = engine_->openSegment(remote_addr); |
There was a problem hiding this comment.
Previously, during open_session, the decode node would establish a connection with the P-node,
informing the P-node of its own address (addr_) to enable KV cache transfer, with cluster_id set to 0.
However, it now appears to directly open_segment. Where does the remote_addr come from in this new setup?
There was a problem hiding this comment.
remote_addr is not generated inside open_session(). It comes from the peer cache-info registration path: each worker exposes its Mooncake transfer address via get_cache_info(), the scheduler stores those addresses in instance metadata, and link_cluster() passes the registered peer address into open_session().
The reason for changing the flow is that the new MLU PD path needs more than just session bootstrap. After the session is established, we also need to fetch the peer's registered memory metadata and validate the memory layout before issuing block moves, especially for the new per-buffer descriptor path (key/value/index) and asymmetric capacity cases. That is why the local side now opens the peer segment directly and then calls GetRegisteredMemory().
We still keep the reverse OpenSession(addr_) RPC step when cluster_id != 0, so the peer can establish the reverse session as before. The new part here is the explicit remote registered-memory sync/validation, not removing the previous bootstrap mechanism.
f024a9f to
609b8d6
Compare
609b8d6 to
95427f6
Compare
| std::string hostname = host_ip_ + ":" + std::to_string(listen_port_); | ||
| #if defined(USE_NPU) | ||
| int32_t device_id = dev.index(); | ||
| hostname += ":" + get_transfer_engine_device_name(device_id); |
There was a problem hiding this comment.
Npu does not need to add device name now.
| const std::vector<uint64_t>& src_blocks, | ||
| const std::vector<uint64_t>& dst_blocks, | ||
| const std::vector<int64_t>& layer_ids); | ||
| const std::vector<int64_t>& group_ids); |
There was a problem hiding this comment.
what is group ids?
No description provided.