Skip to content

Releases: ggml-org/llama.cpp

b7475

19 Dec 00:47
8ea958d

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)

  • ASR with LFM2-Audio-1.5B

  • Set rope_theta

  • Fix comment

  • Remove rope_theta setting

  • Address PR feedback

  • rename functions to conformer

  • remove some redundant ggml_cont

  • fix missing tensor

  • add prefix "a." for conv tensors

  • remove redundant reshape

  • clean up

  • add test model


Co-authored-by: Tarek Dakhran [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

b7472

18 Dec 16:20
4d1316c

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

arg: fix ASAN error on sampler_type_names empty (#18167)

macOS/iOS:

Linux:

Windows:

openEuler:

b7470

18 Dec 15:01
54189c0

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

remove i_major_dual (#18157)

Co-authored-by: zhang hui [email protected]

macOS/iOS:

Linux:

Windows:

openEuler:

b7446

17 Dec 16:45
5c0d188

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413)

  • UI: implement basic UI components

  • util: implement performance monitor; wrap it with a viewmodel

  • util: implement user preferences utility

  • UI: implement core flow's screens

  • UI: add a new MainActivity; update manifest

  • [WIP] DI: implement simple local vm factory provider

  • UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark

  • UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark

  • UI: split a nested parent settings screen into separate child settings screens

  • UI: polish system prompt setup UI

  • Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject

  • DB: setup Room database

  • data: introduce repo for System Prompt; flow data from Room to VM

  • bugfix: properly handle user's quitting conversation screen while tokens in generation

  • UI: rename ModeSelection to ModelLoading for better clarity

  • UI: update app name to be more Arm

  • UI: polish conversation screen

  • data: code polish

  • UI: code polish

  • bugfix: handle user quitting on model loading

  • UI: locks user in alert dialog when model is unloading

  • vm: replace token metrics stubs with actual implementation

  • UI: refactor top app bars

  • nit: combine temperatureMetrics and useFahrenheit

  • DI: introduce Hilt plugin + processor + lib dependencies

  • DI: make app Hilt injectable

  • DI: make viewmodels Hilt injectable

  • DI: replace manual DI with Hilt DI

  • UI: optimize AppContent's composing

  • bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController

  • UI: navigation with more natural animated transitions

  • DI: Optimize AppModule

  • Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule

  • UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel

  • DI: abstract the protocol of SystemPromptRepository; update AppModule

  • data: [WIP] prepare for ModelRepository refactor & impl

  • data: introduce Model entity and DAO; update DI module

  • UI: replace Models Management screen's stubbing with instrumentation

  • UI: polish sort order menu

  • data: import local model with file picker

  • bugfix: use List instead of Collection for ModelDao's deletion

  • data: add a util file for extracting file name & size and model metadata

  • UI: enrich ModelManagementState; extract filename to show correct importing UI

  • UI: implement multiple models deletion; update Models Management screen

  • UI: handle back navigation when user is in multi-selection mode

  • util: extract file size formatting into ModelUtils

  • UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog

  • UI: extract a shared ModelCard component

  • UI: replace model selection screen's data stubbing; add empty view

  • nit: tidy SystemPromptViewModel

  • Util: split FileUtils from ModelUtils; extract copy methods into FileUtils

  • data: pass through getModelById from ModelDao into ModelRepository

  • core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine

  • vm: split mono MainViewModel into separate individual ViewModels

  • vm: merge SystemPromptViewModel into ModelLoadingViewModel

  • core: break down InferenceManager due to Interface Segregation Principle

  • UI: show model card in Model Loading screen

  • UI: show model card in Conversation screen

  • UI: unify Model Card components

  • core: swap in LLamaAndroid and mark stub engine for testing only

  • data: allow canceling the ongoing model import

  • UI: update UI ongoing model import's cancellation

  • LLama: update engine state after handling the cancellation of sendUserPrompt

  • VM: handle the cancellation of ongoing token generation

  • LLama: refactor loadModel by splitting the system prompt setting into a separate method

  • feature: check for available space before copying local model

  • UI: centralize the AppScaffold and modularize its configs

  • UI: refactor BottomBarConfig.ModelsManagement APIs

  • UI: combine TopBarConfig and BottomBarConfig into each route's ScaffoldConfig

  • UI: replace ugly optional as casts in AppScaffold with extension functions

  • UI: fix the typo totalGb in StorageMetrics

  • UI: remove code duplication in sort menu

  • LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages

  • UI: refactor back handling by removing centralized BackHandlerSetup and UnloadModelConfirmationDialog from AppContent

  • UI: implement BenchmarkScreen's individual back handling

  • LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized

  • UI: Introduce an abstract ViewModel to handle additional model unloading logics

  • UI: expose a single facade ModelUnloadDialogHandler; move UnloadModelState into ModelUnloadingViewModel.kt

  • UI: migrate ModelLoadingScreen onto ModelLoadingViewModel; update & refine ModelLoadingScreen

  • UI: migrate ConversationViewModel onto ModelLoadingViewModel; update & refine ConversationScreen

  • nit: extract app name into a constant value; remove unused onBackPressed callbacks

  • UI: update AppContent to pass in correct navigation callbacks

  • nit: polish ModelLoadingScreen UI

  • core: throw Exception instead of returning null if model fails to load

  • navigation: sink model loading state management from AppContent down into ModelLoadingScreen; pass ModelLoadingMetrics to Benchmark and Conversation screens

  • gguf: add GGUF metadata data holder and its corresponding extractor implementation

  • DB: introduce Kotlin serialization extension's library and plugin; add Room runtime library

  • GGUF: make GgufMetadata serializable in order to be compatible with Room

  • nit: refactor data.local package structure

  • nit: rename lastUsed field to dateLastUsed; add dateAdded field

  • UI: refactor ModelCard UI to show GGUF metadata

  • UI: update ModelSelectionScreen with a preselect mechanism

  • UI: polish model card

  • nit: allow deselect model on Model Selection screen

  • nit: revert accidental committing of debug code

  • UI: polish ModelLoading screen

  • util: extract formatting helper functions from FileUtils into a new FormatUtils

  • UI: polish model cards on Benchmark and Conversation screens to show model loading metrics

  • UI: show a Snack bar to warn user that system prompt is not always supported

  • UI: handle back press on Model Selection screen

  • UI: finally support theme modes; remove hardcoded color schemes, default to dynamic color scheme implementation

  • feature: support searching on Model Selection screen

  • nit: move scaffold related UI components into a separate package

  • UI: extract InfoView out into a separate file for reusability

  • data: move Model related actions (query, filter, sort) into ModelInfo file

  • UI: animate FAB on model preselection states

  • feature: support filtering in Model Management screen

  • ui: show empty models info in Model Management screen

  • ui: add filter off icon to "Clear filters" menu item

  • [WIP] ui: polish Benchmark screen; implement its bottom app bar

  • ui: polish Benchmark screen; implement its bottom app bar's rerun and share

  • nit: disable mode selection's radio buttons when loading model

  • feature: implement Conversation screen's bottom app bar

  • pkg: restructure BottomAppBars into separate files in a child package

  • pkg: restructure TopBarApps into separate files in a child package

  • pkg: restructure system metrics into a separate file

  • UI: polish Conversation screen

  • data: update system prompt presets

  • UI: allow hide or show model card on Conversation & Benchmark screens; fix message arrangement

  • data: update & enhance system prompt presets

  • deps: introduce Retrofit2

  • data: implement HuggingFace data model, data source with Retrofit API

  • data: update Model data repository to support fetching HuggingFace models

  • [WIP] UI: replace the HuggingFace stub in Model Management screen with actual API call

  • UI: map language codes into country Emojis

  • ui: add "clear results" action to Benchmark screen

  • nit: print current pp & tg in llama-bench

  • UI: disable landscape mode; prevent duplicated benchmark running

  • llama: migrate C/CXX flags into CMakeList

  • [WIP] llama: ABI split builds five .so artifacts.

However, all .so are performing on SVE level

  • [WIP] llama: ABI split where five tiers are built sequentially.

  • [WIP] llama: disable OpenMP in ABI split since most SoCs are big.LITTLE

  • [WIP] llama: enable KleidiAI and disable tier 4 due to +sve+sve2 bug caused by ggml_add_cpu_backend_variant_impl as explained below

if (NOT SME_ENABLED MATCHES -1)
...
    set(PRIVATE_ARCH_FLAGS "-fno-tree-vectorize;${PRIVATE_ARCH_FLAGS}+sve+sve2")
...
  • core: add Google's cpu_features as a submodule

  • core: implement cpu_detector native lib

  • core: swap out hardcoded LlamaAndroid library loading

  • core: add back OpenMP due to huge perf loss on TG128

  • misc: reorg the pkg structure

  • misc: rename LlamaAndroid related class to InferenceEngine prefixes

  • [WIP] lib: move GgufMetadata into the lib submodule

  • lib: expose GgufMetadataReader as interface only

  • lib: replace the naive & plain SharedPreferences with DataStore implementation

  • lib: hide the internal implementations, only expose a facade and interfaces

  • lib: expose Arm features

  • di: add a stub TierDetection; provide both actual impl and stub in AppModule

  • UI: add visualizer UI for Arm features

  • misc: UI polish

  • lib: refactored InferenceEngineLoader; added a NONE Llama Tier

  • UI: ...

Read more

b7445

17 Dec 12:25
4b2a477

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

arg: allow -kvu flag for llama-perplexity (#18117)

The -kvu (--kv-unified) flag is required for hellaswag and winogrande
benchmarks which use coupled sequences. Without unified KV cache,
these benchmarks fail with:

split_equal: sequential split is not supported when there are
coupled sequences in the input batch (you may need to use the -kvu flag)

This change adds LLAMA_EXAMPLE_PERPLEXITY to the allowed examples for
the -kvu argument, enabling its use with llama-perplexity.

macOS/iOS:

Linux:

Windows:

openEuler:

b7444

17 Dec 07:31
5806286

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml : use WARP_SIZE/2 for argmax reduction offset (#18092)

macOS/iOS:

Linux:

Windows:

openEuler:

b7442

17 Dec 05:23
d0794e8

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama-fit-params: force disable mlock (#18103)

macOS/iOS:

Linux:

Windows:

openEuler:

b7441

17 Dec 04:56
9dcac6c

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama-fit-params: lower ctx size for multi GPU (#18101)

macOS/iOS:

Linux:

Windows:

openEuler:

b7440

17 Dec 03:32
0e49a7b

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama-fit-params: fix underflow for dense models (#18095)

macOS/iOS:

Linux:

Windows:

openEuler:

b7439

17 Dec 03:32
4164596

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama-fit-params: QoL impr. for prints/errors (#18089)

macOS/iOS:

Linux:

Windows:

openEuler: