Releases: ggml-org/llama.cpp
b7480
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
presets: refactor, allow cascade presets from different sources, add global section (#18169)
-
presets: refactor, allow cascade presets from different sources
-
update docs
-
fix neg arg handling
-
fix empty mmproj
-
also filter out server-controlled args before to_ini()
-
skip loading custom_models if not specified
-
fix unset_reserved_args
-
fix crash on windows
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7476
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
vulkan: Add perf logger mode with concurrency (#17944)
This implements a variation of the perf logger where rather than timing each
operation individually with effectively a barrier in between, we put the
timing boundaries where we already synchronize and time the groups of work
that normally overlap. This can be useful to help understand whether
individual operations need to be optimized, or if the group is already running
efficiently.
GGML_VK_PERF_LOGGER_CONCURRENT=1 enables the new mode (when
GGML_VK_PERF_LOGGER is also set).
GGML_VK_SYNC_LOGGER=1 replaces the ENABLE_SYNC_LOGGING compile time switch.
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7475
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
-
ASR with LFM2-Audio-1.5B
-
Set rope_theta
-
Fix comment
-
Remove rope_theta setting
-
Address PR feedback
-
rename functions to conformer
-
remove some redundant ggml_cont
-
fix missing tensor
-
add prefix "a." for conv tensors
-
remove redundant reshape
-
clean up
-
add test model
Co-authored-by: Tarek Dakhran [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7472
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
arg: fix ASAN error on sampler_type_names empty (#18167)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7470
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
remove i_major_dual (#18157)
Co-authored-by: zhang hui [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7446
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413)
-
UI: implement basic UI components
-
util: implement performance monitor; wrap it with a viewmodel
-
util: implement user preferences utility
-
UI: implement core flow's screens
-
UI: add a new MainActivity; update manifest
-
[WIP] DI: implement simple local vm factory provider
-
UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark
-
UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark
-
UI: split a nested parent settings screen into separate child settings screens
-
UI: polish system prompt setup UI
-
Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject
-
DB: setup Room database
-
data: introduce repo for System Prompt; flow data from Room to VM
-
bugfix: properly handle user's quitting conversation screen while tokens in generation
-
UI: rename
ModeSelectiontoModelLoadingfor better clarity -
UI: update app name to be more Arm
-
UI: polish conversation screen
-
data: code polish
-
UI: code polish
-
bugfix: handle user quitting on model loading
-
UI: locks user in alert dialog when model is unloading
-
vm: replace token metrics stubs with actual implementation
-
UI: refactor top app bars
-
nit: combine temperatureMetrics and useFahrenheit
-
DI: introduce Hilt plugin + processor + lib dependencies
-
DI: make app Hilt injectable
-
DI: make viewmodels Hilt injectable
-
DI: replace manual DI with Hilt DI
-
UI: optimize AppContent's composing
-
bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController
-
UI: navigation with more natural animated transitions
-
DI: Optimize AppModule
-
Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule
-
UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel
-
DI: abstract the protocol of SystemPromptRepository; update AppModule
-
data: [WIP] prepare for ModelRepository refactor & impl
-
data: introduce Model entity and DAO; update DI module
-
UI: replace Models Management screen's stubbing with instrumentation
-
UI: polish sort order menu
-
data: import local model with file picker
-
bugfix: use List instead of Collection for ModelDao's deletion
-
data: add a util file for extracting file name & size and model metadata
-
UI: enrich ModelManagementState; extract filename to show correct importing UI
-
UI: implement multiple models deletion; update Models Management screen
-
UI: handle back navigation when user is in multi-selection mode
-
util: extract file size formatting into ModelUtils
-
UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog
-
UI: extract a shared ModelCard component
-
UI: replace model selection screen's data stubbing; add empty view
-
nit: tidy SystemPromptViewModel
-
Util: split FileUtils from ModelUtils; extract copy methods into FileUtils
-
data: pass through getModelById from ModelDao into ModelRepository
-
core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine
-
vm: split mono MainViewModel into separate individual ViewModels
-
vm: merge SystemPromptViewModel into ModelLoadingViewModel
-
core: break down InferenceManager due to Interface Segregation Principle
-
UI: show model card in Model Loading screen
-
UI: show model card in Conversation screen
-
UI: unify Model Card components
-
core: swap in LLamaAndroid and mark stub engine for testing only
-
data: allow canceling the ongoing model import
-
UI: update UI ongoing model import's cancellation
-
LLama: update engine state after handling the cancellation of sendUserPrompt
-
VM: handle the cancellation of ongoing token generation
-
LLama: refactor loadModel by splitting the system prompt setting into a separate method
-
feature: check for available space before copying local model
-
UI: centralize the AppScaffold and modularize its configs
-
UI: refactor BottomBarConfig.ModelsManagement APIs
-
UI: combine TopBarConfig and BottomBarConfig into each route's ScaffoldConfig
-
UI: replace ugly optional as casts in AppScaffold with extension functions
-
UI: fix the typo
totalGbinStorageMetrics -
UI: remove code duplication in sort menu
-
LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages
-
UI: refactor back handling by removing centralized BackHandlerSetup and UnloadModelConfirmationDialog from AppContent
-
UI: implement BenchmarkScreen's individual back handling
-
LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized
-
UI: Introduce an abstract ViewModel to handle additional model unloading logics
-
UI: expose a single facade ModelUnloadDialogHandler; move UnloadModelState into ModelUnloadingViewModel.kt
-
UI: migrate ModelLoadingScreen onto ModelLoadingViewModel; update & refine ModelLoadingScreen
-
UI: migrate ConversationViewModel onto ModelLoadingViewModel; update & refine ConversationScreen
-
nit: extract app name into a constant value; remove unused onBackPressed callbacks
-
UI: update AppContent to pass in correct navigation callbacks
-
nit: polish ModelLoadingScreen UI
-
core: throw Exception instead of returning null if model fails to load
-
navigation: sink model loading state management from AppContent down into ModelLoadingScreen; pass ModelLoadingMetrics to Benchmark and Conversation screens
-
gguf: add GGUF metadata data holder and its corresponding extractor implementation
-
DB: introduce Kotlin serialization extension's library and plugin; add Room runtime library
-
GGUF: make GgufMetadata serializable in order to be compatible with Room
-
nit: refactor data.local package structure
-
nit: rename lastUsed field to dateLastUsed; add dateAdded field
-
UI: refactor ModelCard UI to show GGUF metadata
-
UI: update ModelSelectionScreen with a preselect mechanism
-
UI: polish model card
-
nit: allow deselect model on Model Selection screen
-
nit: revert accidental committing of debug code
-
UI: polish ModelLoading screen
-
util: extract formatting helper functions from FileUtils into a new FormatUtils
-
UI: polish model cards on Benchmark and Conversation screens to show model loading metrics
-
UI: show a Snack bar to warn user that system prompt is not always supported
-
UI: handle back press on Model Selection screen
-
UI: finally support theme modes; remove hardcoded color schemes, default to dynamic color scheme implementation
-
feature: support searching on Model Selection screen
-
nit: move scaffold related UI components into a separate package
-
UI: extract InfoView out into a separate file for reusability
-
data: move Model related actions (query, filter, sort) into ModelInfo file
-
UI: animate FAB on model preselection states
-
feature: support filtering in Model Management screen
-
ui: show empty models info in Model Management screen
-
ui: add filter off icon to "Clear filters" menu item
-
[WIP] ui: polish Benchmark screen; implement its bottom app bar
-
ui: polish Benchmark screen; implement its bottom app bar's rerun and share
-
nit: disable mode selection's radio buttons when loading model
-
feature: implement Conversation screen's bottom app bar
-
pkg: restructure BottomAppBars into separate files in a child package
-
pkg: restructure TopBarApps into separate files in a child package
-
pkg: restructure system metrics into a separate file
-
UI: polish Conversation screen
-
data: update system prompt presets
-
UI: allow hide or show model card on Conversation & Benchmark screens; fix message arrangement
-
data: update & enhance system prompt presets
-
deps: introduce Retrofit2
-
data: implement HuggingFace data model, data source with Retrofit API
-
data: update Model data repository to support fetching HuggingFace models
-
[WIP] UI: replace the HuggingFace stub in Model Management screen with actual API call
-
UI: map language codes into country Emojis
-
ui: add "clear results" action to Benchmark screen
-
nit: print current pp & tg in llama-bench
-
UI: disable landscape mode; prevent duplicated benchmark running
-
llama: migrate C/CXX flags into CMakeList
-
[WIP] llama: ABI split builds five .so artifacts.
However, all .so are performing on SVE level
-
[WIP] llama: ABI split where five tiers are built sequentially.
-
[WIP] llama: disable OpenMP in ABI split since most SoCs are big.LITTLE
-
[WIP] llama: enable KleidiAI and disable tier 4 due to
+sve+sve2bug caused byggml_add_cpu_backend_variant_implas explained below
if (NOT SME_ENABLED MATCHES -1)
...
set(PRIVATE_ARCH_FLAGS "-fno-tree-vectorize;${PRIVATE_ARCH_FLAGS}+sve+sve2")
...-
core: add Google's cpu_features as a submodule
-
core: implement cpu_detector native lib
-
core: swap out hardcoded LlamaAndroid library loading
-
core: add back OpenMP due to huge perf loss on TG128
-
misc: reorg the pkg structure
-
misc: rename LlamaAndroid related class to InferenceEngine prefixes
-
[WIP] lib: move GgufMetadata into the lib submodule
-
lib: expose GgufMetadataReader as interface only
-
lib: replace the naive & plain SharedPreferences with DataStore implementation
-
lib: hide the internal implementations, only expose a facade and interfaces
-
lib: expose Arm features
-
di: add a stub TierDetection; provide both actual impl and stub in AppModule
-
UI: add visualizer UI for Arm features
-
misc: UI polish
-
lib: refactored InferenceEngineLoader; added a
NONELlama Tier -
UI: ...
b7445
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
arg: allow -kvu flag for llama-perplexity (#18117)
The -kvu (--kv-unified) flag is required for hellaswag and winogrande
benchmarks which use coupled sequences. Without unified KV cache,
these benchmarks fail with:
split_equal: sequential split is not supported when there are
coupled sequences in the input batch (you may need to use the -kvu flag)
This change adds LLAMA_EXAMPLE_PERPLEXITY to the allowed examples for
the -kvu argument, enabling its use with llama-perplexity.
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7444
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ggml : use WARP_SIZE/2 for argmax reduction offset (#18092)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7442
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama-fit-params: force disable mlock (#18103)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7441
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama-fit-params: lower ctx size for multi GPU (#18101)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: