Skip to content

Releases: ggml-org/llama.cpp

b6050

31 Jul 18:29
7845240

Choose a tag to compare

Fix params bug in diffusion example (#14993)

b6049

31 Jul 17:45
d6818d0

Choose a tag to compare

llama : allow other bufts when overriding to CPU, add --no-repack opt…

b6048

31 Jul 17:49
e08a988

Choose a tag to compare

Vulkan: Fix minor debug mode issues (#14899)

* vulkan: fix debug mode issues

* vulkan: remove broken check_results GGML_OP_SET_ROWS support

b6047

31 Jul 15:59
952a47f

Choose a tag to compare

mtmd : support MiniCPM-V 4.0 (#14983)

* support minicpm-v 4

* add md

* support MiniCPM-o 4.0

* add default location

* temp rm MiniCPM-o 4.0

* fix code

* fix "minicpmv_projector" default path

b6045

31 Jul 14:36
94933c8

Choose a tag to compare

server : implement universal assisted decoding (#12635)

* llama-server : implement universal assisted decoding

* Erase prompt tail for kv-cache

* set vocab_dft_compatible in common_speculative

* rename ctx_main to ctx_tgt

* move vocab_dft_compatible to spec struct

* clear mem_dft, remove mem

* detokenize id_last for incompatible models

* update comment

* add --spec-replace flag

* accept special tokens when translating between draft/main models

* Escape spec-replace

* clamp draft result to size to params.n_draft

* fix comment

* clean up code

* restore old example

* log common_speculative_are_compatible in speculative example

* fix

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6044

31 Jul 14:33
c1dacaa

Choose a tag to compare

llama : merge build_moe_ffn_from_probs function into build_moe_ffn (#…

b6043

31 Jul 14:20
a9f77a8

Choose a tag to compare

server : add openai-style logit_bias support (#14946)

Signed-off-by: Lukas Straub <[email protected]>

b6042

31 Jul 14:09
8a4a856

Choose a tag to compare

Add LLaDA 8b Diffusion model (#14771)

* Add support for Llada-8b: diffusion model

* Add README

* Fix README and convert_hf_to_gguf

* convert_hf_to_gguf.py: address review comments

* Make everything in a single example

* Remove model-specific sampling

* Remove unused argmax

* Remove braced initializers, improve README.md a bit

* Add diffusion specific gguf params in set_vocab, remove setting rope_theta and rms_norm_eps

* Remove adding the mask token

* Move add_add_bos_token to set_vocab

* use add_bool in gguf_writer.py

b6041

31 Jul 13:53
11490b3

Choose a tag to compare

CANN: Improve loading efficiency after converting weights to NZ forma…

b6040

31 Jul 06:28
66625a5

Choose a tag to compare

graph : reduce splits for recurrent and hybrid models (#14825)

* graph : avoid creating redundant s_copy views

* graph : comment the s_copy views