Skip to content

feat: support mla disagg pd pull mode via mooncake transfer on mlu device.#1167

Closed
phantomlei3 wants to merge 4 commits intojd-opensource:mainfrom
phantomlei3:feat/pd-support-mlu
Closed

feat: support mla disagg pd pull mode via mooncake transfer on mlu device.#1167
phantomlei3 wants to merge 4 commits intojd-opensource:mainfrom
phantomlei3:feat/pd-support-mlu

Conversation

@phantomlei3
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables Disaggregated Prefill/Decode (PD) support for the MLU backend by integrating and refactoring the Mooncake transfer engine. Key changes include the introduction of a descriptor-based memory registration system (using BufferDesc and MemInfo) and a new GetRegisteredMemory RPC to synchronize memory layouts between nodes, which is essential for supporting complex architectures like MLA. Additionally, the PR enhances validation logic in LLMEngine::pull_kv_blocks, automates device_ip resolution, and adds comprehensive unit tests for the transfer engine. Review feedback identifies several opportunities to align with the project's style guide, specifically regarding the use of emplace_back for vector operations and the annotation of constant arguments in function calls.

Comment thread xllm/core/framework/kv_cache/mooncake_transfer_engine.cpp Outdated
Comment thread xllm/core/framework/kv_cache/mooncake_transfer_engine.cpp Outdated
Comment thread xllm/core/framework/kv_cache/mooncake_transfer_engine.cpp Outdated
Comment thread xllm/core/framework/kv_cache/mooncake_transfer_engine.cpp Outdated
Comment thread xllm/core/framework/kv_cache/mooncake_transfer_engine_test.cpp Outdated
Comment thread xllm/core/framework/kv_cache/mooncake_transfer_engine_test.cpp Outdated
JimHsiung
JimHsiung previously approved these changes Apr 3, 2026
@phantomlei3 phantomlei3 force-pushed the feat/pd-support-mlu branch from c8d119d to f024a9f Compare April 7, 2026 08:32
@XuZhang99 XuZhang99 changed the title feat: support mlu mla pd pull with mooncake transfer. feat: support mla disagg pd pull mode via mooncake transfer on mlu device. Apr 7, 2026
return os.str();
}

std::vector<proto::BufferDesc> to_proto_buffer_descs(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transfer_engine is designed to facilitate data transfer between specified source and destination addresses. It should remain agnostic to the higher-level semantics of the data being transferred, whether it's KV cache, model weights, or any other type.

size_per_block_ = size_per_block;
num_layers_ = reg_layers;
local_memory_info_ = std::move(mem_info);
core_.set_registered_memory_info(addrs, lens, size_per_block_, num_layers_);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these engine core modifications been tested on NPU?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested due to the fact that NPU is currently not using Mooncake as disagg PD backend. But I think it is going t o be fine on NPU since MLU and NPU has no difference on engine part via Mooncake. We can test it in the future if NPU has switched to Mooncake

}

Transport::SegmentHandle handle;
handle = engine_->openSegment(remote_addr);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, during open_session, the decode node would establish a connection with the P-node,
informing the P-node of its own address (addr_) to enable KV cache transfer, with cluster_id set to 0.
However, it now appears to directly open_segment. Where does the remote_addr come from in this new setup?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote_addr is not generated inside open_session(). It comes from the peer cache-info registration path: each worker exposes its Mooncake transfer address via get_cache_info(), the scheduler stores those addresses in instance metadata, and link_cluster() passes the registered peer address into open_session().

The reason for changing the flow is that the new MLU PD path needs more than just session bootstrap. After the session is established, we also need to fetch the peer's registered memory metadata and validate the memory layout before issuing block moves, especially for the new per-buffer descriptor path (key/value/index) and asymmetric capacity cases. That is why the local side now opens the peer segment directly and then calls GetRegisteredMemory().

We still keep the reverse OpenSession(addr_) RPC step when cluster_id != 0, so the peer can establish the reverse session as before. The new part here is the explicit remote registered-memory sync/validation, not removing the previous bootstrap mechanism.

@phantomlei3 phantomlei3 force-pushed the feat/pd-support-mlu branch from f024a9f to 609b8d6 Compare April 8, 2026 09:13
@phantomlei3 phantomlei3 force-pushed the feat/pd-support-mlu branch from 609b8d6 to 95427f6 Compare April 8, 2026 13:32
@phantomlei3 phantomlei3 closed this Apr 9, 2026
std::string hostname = host_ip_ + ":" + std::to_string(listen_port_);
#if defined(USE_NPU)
int32_t device_id = dev.index();
hostname += ":" + get_transfer_engine_device_name(device_id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Npu does not need to add device name now.

const std::vector<uint64_t>& src_blocks,
const std::vector<uint64_t>& dst_blocks,
const std::vector<int64_t>& layer_ids);
const std::vector<int64_t>& group_ids);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is group ids?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants