Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 13, 2025

Fix #17989

Related discussion: #16736 (comment)

Argument Explanation
--kv-unified, -kvu use single unified KV buffer shared across all sequences (default: enabled if number of slots is auto)
(env: LLAMA_ARG_KV_UNIFIED)
-np, --parallel N number of server slots (default: -1, -1 = auto)
(env: LLAMA_ARG_N_PARALLEL)

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this 👍

@ngxson ngxson merged commit 7b1db3d into ggml-org:master Dec 16, 2025
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: "--parallel 1" initializes 4 slots, while docs say default is 1

2 participants