Releases · ggml-org/llama.cpp

19 Dec 12:28

98c1c7a

b7480 Latest

Latest

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

presets: refactor, allow cascade presets from different sources, add global section (#18169)

presets: refactor, allow cascade presets from different sources
update docs
fix neg arg handling
fix empty mmproj
also filter out server-controlled args before to_ini()
skip loading custom_models if not specified
fix unset_reserved_args
fix crash on windows

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-12-19T12:28:45Z
cudart-llama-bin-win-cuda-13.1-x64.zip

sha256:f96935e7e385e3b2d0189239077c10fe8fd7e95690fea4afec455b1b6c7e3f18

384 MB 2025-12-19T12:28:56Z
llama-b7480-bin-310p-openEuler-aarch64.tar.gz

sha256:844a94ca5a4e431a21f4775454f1dd59441249467e925a010503a10584aef4e5

41.8 MB 2025-12-19T12:29:07Z
llama-b7480-bin-310p-openEuler-x86.tar.gz

sha256:df913e301b92d1e16402a225bdcf4558812612e124da3f3b9121cca04f403322

45.8 MB 2025-12-19T12:29:09Z
llama-b7480-bin-910b-openEuler-aarch64.tar.gz

sha256:b82fcff877ea8888e1b89d2b65ed1c45e5631c26fc5ab46beda2d556cefc4f04

41.8 MB 2025-12-19T12:29:11Z
llama-b7480-bin-910b-openEuler-x86.tar.gz

sha256:10a9be98dac9a35dbbec62c77ffabbc1a8eb5e1f0e421f98b2a4c0e829552e9f

45.8 MB 2025-12-19T12:29:12Z
llama-b7480-bin-macos-arm64.tar.gz

sha256:06e61f053739bce155e5562f0b261a05c5102343ce279e89f930bc201ffbb3d1

15.9 MB 2025-12-19T12:29:14Z
llama-b7480-bin-macos-arm64.zip

sha256:23e7609d6edd7d0ecc9dfb72223eb5cf59871ab91c4106c6337c85e587ada8d7

15.9 MB 2025-12-19T12:29:15Z
llama-b7480-bin-macos-x64.tar.gz

sha256:450bc3340fd4debfba9cb420ffff18e72c840e9dcf236fa3773d300a311d9fcf

40.9 MB 2025-12-19T12:29:16Z
llama-b7480-bin-macos-x64.zip

sha256:d399e59858658ab369c68b51fe3a77441f1034e61dc1d740bba635cf5bdbbd7e

40.9 MB 2025-12-19T12:29:18Z
Source code (zip)

2025-12-19T11:08:20Z
Source code (tar.gz)

2025-12-19T11:08:20Z

19 Dec 06:44

github-actions

b7476

cdbada8

b7476

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

vulkan: Add perf logger mode with concurrency (#17944)

This implements a variation of the perf logger where rather than timing each
operation individually with effectively a barrier in between, we put the
timing boundaries where we already synchronize and time the groups of work
that normally overlap. This can be useful to help understand whether
individual operations need to be optimized, or if the group is already running
efficiently.

GGML_VK_PERF_LOGGER_CONCURRENT=1 enables the new mode (when
GGML_VK_PERF_LOGGER is also set).

GGML_VK_SYNC_LOGGER=1 replaces the ENABLE_SYNC_LOGGING compile time switch.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

19 Dec 00:47

github-actions

b7475

8ea958d

b7475

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)

ASR with LFM2-Audio-1.5B
Set rope_theta
Fix comment
Remove rope_theta setting
Address PR feedback
rename functions to conformer
remove some redundant ggml_cont
fix missing tensor
add prefix "a." for conv tensors
remove redundant reshape
clean up
add test model

Co-authored-by: Tarek Dakhran [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

18 Dec 16:20

github-actions

b7472

4d1316c

b7472

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

arg: fix ASAN error on sampler_type_names empty (#18167)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

18 Dec 15:01

github-actions

b7470

54189c0

b7470

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

remove i_major_dual (#18157)

Co-authored-by: zhang hui [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

17 Dec 16:45

github-actions

b7446

5c0d188

b7446

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413)

UI: implement basic UI components
util: implement performance monitor; wrap it with a viewmodel
util: implement user preferences utility
UI: implement core flow's screens
UI: add a new MainActivity; update manifest
[WIP] DI: implement simple local vm factory provider
UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark
UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark
UI: split a nested parent settings screen into separate child settings screens
UI: polish system prompt setup UI
Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject
DB: setup Room database
data: introduce repo for System Prompt; flow data from Room to VM
bugfix: properly handle user's quitting conversation screen while tokens in generation
UI: rename ModeSelection to ModelLoading for better clarity
UI: update app name to be more Arm
UI: polish conversation screen
data: code polish
UI: code polish
bugfix: handle user quitting on model loading
UI: locks user in alert dialog when model is unloading
vm: replace token metrics stubs with actual implementation
UI: refactor top app bars
nit: combine temperatureMetrics and useFahrenheit
DI: introduce Hilt plugin + processor + lib dependencies
DI: make app Hilt injectable
DI: make viewmodels Hilt injectable
DI: replace manual DI with Hilt DI
UI: optimize AppContent's composing
bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController
UI: navigation with more natural animated transitions
DI: Optimize AppModule
Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule
UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel
DI: abstract the protocol of SystemPromptRepository; update AppModule
data: [WIP] prepare for ModelRepository refactor & impl
data: introduce Model entity and DAO; update DI module
UI: replace Models Management screen's stubbing with instrumentation
UI: polish sort order menu
data: import local model with file picker
bugfix: use List instead of Collection for ModelDao's deletion
data: add a util file for extracting file name & size and model metadata
UI: enrich ModelManagementState; extract filename to show correct importing UI
UI: implement multiple models deletion; update Models Management screen
UI: handle back navigation when user is in multi-selection mode
util: extract file size formatting into ModelUtils
UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog
UI: extract a shared ModelCard component
UI: replace model selection screen's data stubbing; add empty view
nit: tidy SystemPromptViewModel
Util: split FileUtils from ModelUtils; extract copy methods into FileUtils
data: pass through getModelById from ModelDao into ModelRepository
core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine
vm: split mono MainViewModel into separate individual ViewModels
vm: merge SystemPromptViewModel into ModelLoadingViewModel
core: break down InferenceManager due to Interface Segregation Principle
UI: show model card in Model Loading screen
UI: show model card in Conversation screen
UI: unify Model Card components
core: swap in LLamaAndroid and mark stub engine for testing only
data: allow canceling the ongoing model import
UI: update UI ongoing model import's cancellation
LLama: update engine state after handling the cancellation of sendUserPrompt
VM: handle the cancellation of ongoing token generation
LLama: refactor loadModel by splitting the system prompt setting into a separate method
feature: check for available space before copying local model
UI: centralize the AppScaffold and modularize its configs
UI: refactor BottomBarConfig.ModelsManagement APIs
UI: combine TopBarConfig and BottomBarConfig into each route's ScaffoldConfig
UI: replace ugly optional as casts in AppScaffold with extension functions
UI: fix the typo totalGb in StorageMetrics
UI: remove code duplication in sort menu
LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages
UI: refactor back handling by removing centralized BackHandlerSetup and UnloadModelConfirmationDialog from AppContent
UI: implement BenchmarkScreen's individual back handling
LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized
UI: Introduce an abstract ViewModel to handle additional model unloading logics
UI: expose a single facade ModelUnloadDialogHandler; move UnloadModelState into ModelUnloadingViewModel.kt
UI: migrate ModelLoadingScreen onto ModelLoadingViewModel; update & refine ModelLoadingScreen
UI: migrate ConversationViewModel onto ModelLoadingViewModel; update & refine ConversationScreen
nit: extract app name into a constant value; remove unused onBackPressed callbacks
UI: update AppContent to pass in correct navigation callbacks
nit: polish ModelLoadingScreen UI
core: throw Exception instead of returning null if model fails to load
navigation: sink model loading state management from AppContent down into ModelLoadingScreen; pass ModelLoadingMetrics to Benchmark and Conversation screens
gguf: add GGUF metadata data holder and its corresponding extractor implementation
DB: introduce Kotlin serialization extension's library and plugin; add Room runtime library
GGUF: make GgufMetadata serializable in order to be compatible with Room
nit: refactor data.local package structure
nit: rename lastUsed field to dateLastUsed; add dateAdded field
UI: refactor ModelCard UI to show GGUF metadata
UI: update ModelSelectionScreen with a preselect mechanism
UI: polish model card
nit: allow deselect model on Model Selection screen
nit: revert accidental committing of debug code
UI: polish ModelLoading screen
util: extract formatting helper functions from FileUtils into a new FormatUtils
UI: polish model cards on Benchmark and Conversation screens to show model loading metrics
UI: show a Snack bar to warn user that system prompt is not always supported
UI: handle back press on Model Selection screen
UI: finally support theme modes; remove hardcoded color schemes, default to dynamic color scheme implementation
feature: support searching on Model Selection screen
nit: move scaffold related UI components into a separate package
UI: extract InfoView out into a separate file for reusability
data: move Model related actions (query, filter, sort) into ModelInfo file
UI: animate FAB on model preselection states
feature: support filtering in Model Management screen
ui: show empty models info in Model Management screen
ui: add filter off icon to "Clear filters" menu item
[WIP] ui: polish Benchmark screen; implement its bottom app bar
ui: polish Benchmark screen; implement its bottom app bar's rerun and share
nit: disable mode selection's radio buttons when loading model
feature: implement Conversation screen's bottom app bar
pkg: restructure BottomAppBars into separate files in a child package
pkg: restructure TopBarApps into separate files in a child package
pkg: restructure system metrics into a separate file
UI: polish Conversation screen
data: update system prompt presets
UI: allow hide or show model card on Conversation & Benchmark screens; fix message arrangement
data: update & enhance system prompt presets
deps: introduce Retrofit2
data: implement HuggingFace data model, data source with Retrofit API
data: update Model data repository to support fetching HuggingFace models
[WIP] UI: replace the HuggingFace stub in Model Management screen with actual API call
UI: map language codes into country Emojis
ui: add "clear results" action to Benchmark screen
nit: print current pp & tg in llama-bench
UI: disable landscape mode; prevent duplicated benchmark running
llama: migrate C/CXX flags into CMakeList
[WIP] llama: ABI split builds five .so artifacts.

However, all .so are performing on SVE level

[WIP] llama: ABI split where five tiers are built sequentially.
[WIP] llama: disable OpenMP in ABI split since most SoCs are big.LITTLE
[WIP] llama: enable KleidiAI and disable tier 4 due to +sve+sve2 bug caused by ggml_add_cpu_backend_variant_impl as explained below

if (NOT SME_ENABLED MATCHES -1)
...
    set(PRIVATE_ARCH_FLAGS "-fno-tree-vectorize;${PRIVATE_ARCH_FLAGS}+sve+sve2")
...

core: add Google's cpu_features as a submodule
core: implement cpu_detector native lib
core: swap out hardcoded LlamaAndroid library loading
core: add back OpenMP due to huge perf loss on TG128
misc: reorg the pkg structure
misc: rename LlamaAndroid related class to InferenceEngine prefixes
[WIP] lib: move GgufMetadata into the lib submodule
lib: expose GgufMetadataReader as interface only
lib: replace the naive & plain SharedPreferences with DataStore implementation
lib: hide the internal implementations, only expose a facade and interfaces
lib: expose Arm features
di: add a stub TierDetection; provide both actual impl and stub in AppModule
UI: add visualizer UI for Arm features
misc: UI polish
lib: refactored InferenceEngineLoader; added a NONE Llama Tier
UI: ...

Assets 28

17 Dec 12:25

github-actions

b7445

4b2a477

b7445

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

arg: allow -kvu flag for llama-perplexity (#18117)

The -kvu (--kv-unified) flag is required for hellaswag and winogrande
benchmarks which use coupled sequences. Without unified KV cache,
these benchmarks fail with:

split_equal: sequential split is not supported when there are
coupled sequences in the input batch (you may need to use the -kvu flag)

This change adds LLAMA_EXAMPLE_PERPLEXITY to the allowed examples for
the -kvu argument, enabling its use with llama-perplexity.

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

17 Dec 07:31

github-actions

b7444

5806286

b7444

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml : use WARP_SIZE/2 for argmax reduction offset (#18092)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

17 Dec 05:23

github-actions

b7442

d0794e8

b7442

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama-fit-params: force disable mlock (#18103)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

17 Dec 04:56

github-actions

b7441

9dcac6c

b7441

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama-fit-params: lower ctx size for multi GPU (#18101)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 28

Releases: ggml-org/llama.cpp

b7480

Uh oh!

b7476

Uh oh!

b7475

Uh oh!

b7472

Uh oh!

b7470

Uh oh!

b7446

Uh oh!

b7445

Uh oh!

b7444

Uh oh!

b7442

Uh oh!

b7441

Uh oh!