Releases: ggml-org/llama.cpp
b7475
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
-
ASR with LFM2-Audio-1.5B
-
Set rope_theta
-
Fix comment
-
Remove rope_theta setting
-
Address PR feedback
-
rename functions to conformer
-
remove some redundant ggml_cont
-
fix missing tensor
-
add prefix "a." for conv tensors
-
remove redundant reshape
-
clean up
-
add test model
Co-authored-by: Tarek Dakhran [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7472
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
arg: fix ASAN error on sampler_type_names empty (#18167)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7470
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
remove i_major_dual (#18157)
Co-authored-by: zhang hui [email protected]
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7446
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413)
-
UI: implement basic UI components
-
util: implement performance monitor; wrap it with a viewmodel
-
util: implement user preferences utility
-
UI: implement core flow's screens
-
UI: add a new MainActivity; update manifest
-
[WIP] DI: implement simple local vm factory provider
-
UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark
-
UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark
-
UI: split a nested parent settings screen into separate child settings screens
-
UI: polish system prompt setup UI
-
Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject
-
DB: setup Room database
-
data: introduce repo for System Prompt; flow data from Room to VM
-
bugfix: properly handle user's quitting conversation screen while tokens in generation
-
UI: rename
ModeSelectiontoModelLoadingfor better clarity -
UI: update app name to be more Arm
-
UI: polish conversation screen
-
data: code polish
-
UI: code polish
-
bugfix: handle user quitting on model loading
-
UI: locks user in alert dialog when model is unloading
-
vm: replace token metrics stubs with actual implementation
-
UI: refactor top app bars
-
nit: combine temperatureMetrics and useFahrenheit
-
DI: introduce Hilt plugin + processor + lib dependencies
-
DI: make app Hilt injectable
-
DI: make viewmodels Hilt injectable
-
DI: replace manual DI with Hilt DI
-
UI: optimize AppContent's composing
-
bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController
-
UI: navigation with more natural animated transitions
-
DI: Optimize AppModule
-
Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule
-
UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel
-
DI: abstract the protocol of SystemPromptRepository; update AppModule
-
data: [WIP] prepare for ModelRepository refactor & impl
-
data: introduce Model entity and DAO; update DI module
-
UI: replace Models Management screen's stubbing with instrumentation
-
UI: polish sort order menu
-
data: import local model with file picker
-
bugfix: use List instead of Collection for ModelDao's deletion
-
data: add a util file for extracting file name & size and model metadata
-
UI: enrich ModelManagementState; extract filename to show correct importing UI
-
UI: implement multiple models deletion; update Models Management screen
-
UI: handle back navigation when user is in multi-selection mode
-
util: extract file size formatting into ModelUtils
-
UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog
-
UI: extract a shared ModelCard component
-
UI: replace model selection screen's data stubbing; add empty view
-
nit: tidy SystemPromptViewModel
-
Util: split FileUtils from ModelUtils; extract copy methods into FileUtils
-
data: pass through getModelById from ModelDao into ModelRepository
-
core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine
-
vm: split mono MainViewModel into separate individual ViewModels
-
vm: merge SystemPromptViewModel into ModelLoadingViewModel
-
core: break down InferenceManager due to Interface Segregation Principle
-
UI: show model card in Model Loading screen
-
UI: show model card in Conversation screen
-
UI: unify Model Card components
-
core: swap in LLamaAndroid and mark stub engine for testing only
-
data: allow canceling the ongoing model import
-
UI: update UI ongoing model import's cancellation
-
LLama: update engine state after handling the cancellation of sendUserPrompt
-
VM: handle the cancellation of ongoing token generation
-
LLama: refactor loadModel by splitting the system prompt setting into a separate method
-
feature: check for available space before copying local model
-
UI: centralize the AppScaffold and modularize its configs
-
UI: refactor BottomBarConfig.ModelsManagement APIs
-
UI: combine TopBarConfig and BottomBarConfig into each route's ScaffoldConfig
-
UI: replace ugly optional as casts in AppScaffold with extension functions
-
UI: fix the typo
totalGbinStorageMetrics -
UI: remove code duplication in sort menu
-
LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages
-
UI: refactor back handling by removing centralized BackHandlerSetup and UnloadModelConfirmationDialog from AppContent
-
UI: implement BenchmarkScreen's individual back handling
-
LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized
-
UI: Introduce an abstract ViewModel to handle additional model unloading logics
-
UI: expose a single facade ModelUnloadDialogHandler; move UnloadModelState into ModelUnloadingViewModel.kt
-
UI: migrate ModelLoadingScreen onto ModelLoadingViewModel; update & refine ModelLoadingScreen
-
UI: migrate ConversationViewModel onto ModelLoadingViewModel; update & refine ConversationScreen
-
nit: extract app name into a constant value; remove unused onBackPressed callbacks
-
UI: update AppContent to pass in correct navigation callbacks
-
nit: polish ModelLoadingScreen UI
-
core: throw Exception instead of returning null if model fails to load
-
navigation: sink model loading state management from AppContent down into ModelLoadingScreen; pass ModelLoadingMetrics to Benchmark and Conversation screens
-
gguf: add GGUF metadata data holder and its corresponding extractor implementation
-
DB: introduce Kotlin serialization extension's library and plugin; add Room runtime library
-
GGUF: make GgufMetadata serializable in order to be compatible with Room
-
nit: refactor data.local package structure
-
nit: rename lastUsed field to dateLastUsed; add dateAdded field
-
UI: refactor ModelCard UI to show GGUF metadata
-
UI: update ModelSelectionScreen with a preselect mechanism
-
UI: polish model card
-
nit: allow deselect model on Model Selection screen
-
nit: revert accidental committing of debug code
-
UI: polish ModelLoading screen
-
util: extract formatting helper functions from FileUtils into a new FormatUtils
-
UI: polish model cards on Benchmark and Conversation screens to show model loading metrics
-
UI: show a Snack bar to warn user that system prompt is not always supported
-
UI: handle back press on Model Selection screen
-
UI: finally support theme modes; remove hardcoded color schemes, default to dynamic color scheme implementation
-
feature: support searching on Model Selection screen
-
nit: move scaffold related UI components into a separate package
-
UI: extract InfoView out into a separate file for reusability
-
data: move Model related actions (query, filter, sort) into ModelInfo file
-
UI: animate FAB on model preselection states
-
feature: support filtering in Model Management screen
-
ui: show empty models info in Model Management screen
-
ui: add filter off icon to "Clear filters" menu item
-
[WIP] ui: polish Benchmark screen; implement its bottom app bar
-
ui: polish Benchmark screen; implement its bottom app bar's rerun and share
-
nit: disable mode selection's radio buttons when loading model
-
feature: implement Conversation screen's bottom app bar
-
pkg: restructure BottomAppBars into separate files in a child package
-
pkg: restructure TopBarApps into separate files in a child package
-
pkg: restructure system metrics into a separate file
-
UI: polish Conversation screen
-
data: update system prompt presets
-
UI: allow hide or show model card on Conversation & Benchmark screens; fix message arrangement
-
data: update & enhance system prompt presets
-
deps: introduce Retrofit2
-
data: implement HuggingFace data model, data source with Retrofit API
-
data: update Model data repository to support fetching HuggingFace models
-
[WIP] UI: replace the HuggingFace stub in Model Management screen with actual API call
-
UI: map language codes into country Emojis
-
ui: add "clear results" action to Benchmark screen
-
nit: print current pp & tg in llama-bench
-
UI: disable landscape mode; prevent duplicated benchmark running
-
llama: migrate C/CXX flags into CMakeList
-
[WIP] llama: ABI split builds five .so artifacts.
However, all .so are performing on SVE level
-
[WIP] llama: ABI split where five tiers are built sequentially.
-
[WIP] llama: disable OpenMP in ABI split since most SoCs are big.LITTLE
-
[WIP] llama: enable KleidiAI and disable tier 4 due to
+sve+sve2bug caused byggml_add_cpu_backend_variant_implas explained below
if (NOT SME_ENABLED MATCHES -1)
...
set(PRIVATE_ARCH_FLAGS "-fno-tree-vectorize;${PRIVATE_ARCH_FLAGS}+sve+sve2")
...-
core: add Google's cpu_features as a submodule
-
core: implement cpu_detector native lib
-
core: swap out hardcoded LlamaAndroid library loading
-
core: add back OpenMP due to huge perf loss on TG128
-
misc: reorg the pkg structure
-
misc: rename LlamaAndroid related class to InferenceEngine prefixes
-
[WIP] lib: move GgufMetadata into the lib submodule
-
lib: expose GgufMetadataReader as interface only
-
lib: replace the naive & plain SharedPreferences with DataStore implementation
-
lib: hide the internal implementations, only expose a facade and interfaces
-
lib: expose Arm features
-
di: add a stub TierDetection; provide both actual impl and stub in AppModule
-
UI: add visualizer UI for Arm features
-
misc: UI polish
-
lib: refactored InferenceEngineLoader; added a
NONELlama Tier -
UI: ...
b7445
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
arg: allow -kvu flag for llama-perplexity (#18117)
The -kvu (--kv-unified) flag is required for hellaswag and winogrande
benchmarks which use coupled sequences. Without unified KV cache,
these benchmarks fail with:
split_equal: sequential split is not supported when there are
coupled sequences in the input batch (you may need to use the -kvu flag)
This change adds LLAMA_EXAMPLE_PERPLEXITY to the allowed examples for
the -kvu argument, enabling its use with llama-perplexity.
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7444
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ggml : use WARP_SIZE/2 for argmax reduction offset (#18092)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7442
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama-fit-params: force disable mlock (#18103)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7441
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama-fit-params: lower ctx size for multi GPU (#18101)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7440
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama-fit-params: fix underflow for dense models (#18095)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7439
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama-fit-params: QoL impr. for prints/errors (#18089)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12)
- Windows x64 (CUDA 13)
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: