Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6050
Fix params bug in diffusion example (#14993)
b6049
llama : allow other bufts when overriding to CPU, add --no-repack opt…
b6048
Vulkan: Fix minor debug mode issues (#14899) * vulkan: fix debug mode issues * vulkan: remove broken check_results GGML_OP_SET_ROWS support
b6047
mtmd : support MiniCPM-V 4.0 (#14983) * support minicpm-v 4 * add md * support MiniCPM-o 4.0 * add default location * temp rm MiniCPM-o 4.0 * fix code * fix "minicpmv_projector" default path
b6045
server : implement universal assisted decoding (#12635) * llama-server : implement universal assisted decoding * Erase prompt tail for kv-cache * set vocab_dft_compatible in common_speculative * rename ctx_main to ctx_tgt * move vocab_dft_compatible to spec struct * clear mem_dft, remove mem * detokenize id_last for incompatible models * update comment * add --spec-replace flag * accept special tokens when translating between draft/main models * Escape spec-replace * clamp draft result to size to params.n_draft * fix comment * clean up code * restore old example * log common_speculative_are_compatible in speculative example * fix * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6044
llama : merge build_moe_ffn_from_probs function into build_moe_ffn (#…
b6043
server : add openai-style logit_bias support (#14946) Signed-off-by: Lukas Straub <[email protected]>
b6042
Add LLaDA 8b Diffusion model (#14771) * Add support for Llada-8b: diffusion model * Add README * Fix README and convert_hf_to_gguf * convert_hf_to_gguf.py: address review comments * Make everything in a single example * Remove model-specific sampling * Remove unused argmax * Remove braced initializers, improve README.md a bit * Add diffusion specific gguf params in set_vocab, remove setting rope_theta and rms_norm_eps * Remove adding the mask token * Move add_add_bos_token to set_vocab * use add_bool in gguf_writer.py
b6041
CANN: Improve loading efficiency after converting weights to NZ forma…
b6040
graph : reduce splits for recurrent and hybrid models (#14825) * graph : avoid creating redundant s_copy views * graph : comment the s_copy views