model: support GLM4V vision encoder #18042

ngxson · 2025-12-14T21:31:23Z

On first look, it seems to be an easy model to support as the HF implementation is pretty much the same as Qwen2.5VL

However, there are some very subtle differences that even some LLMs will miss (I tried both Grok and Gemini 3 and they both missed the 2 first points):

For text model, M-RoPE ordering is non-Neox. And because ggml's M-RoPE uses the Neox ordering by default, we need to convert the weight to neox ordering upon conversion. This is by far the most complicated change to support this model
Learned position embedding interpolation uses bicubic instead of bilinear
Added a norm layer right after patch bias
Use RMS norm

The embedding output was tested against HF transformers and confirmed to be matched

Important

RoPE ordering was corrected upon conversion - no more backend changes here in this PR

Testing

https://huggingface.co/zai-org/GLM-4.6V-Flash

I'm using the ./tools/mtmd/test-1.jpeg already included in this repo:

llama-mtmd-cli -m ..... -mm ..... --image ./tools/mtmd/test-1.jpeg -p "extract all texts from this image" --temp 0 -n 1024

Output:

"All the News That's Fit to Print"

Then the newspaper title: "The New York Times"

"LATE CITY EDITION"

"VOL. CXLVIII, No. 40,711"

"NEW YORK, MONDAY, JULY 21, 1969"

"10 CENTS"

Then the main headline: "MEN WALK ON MOON"

Next: "ASTRONAUTS LAND ON PLAIN; COLLECT ROCKS, PLANT FLAG"

Then a section: "Voice From Moon: 'Eagle Has Landed'"

Then the article by John Noble Wilford: "A Powdery Surface Is Closely Explored"

Now, let's transcribe each part carefully, including smaller text.

First, the top left box:

"All the News  
That's Fit to Print"

Then the newspaper header:

"The New York Times"

"LATE CITY EDITION"

"VOL. CXLVIII, No. 40,711"

"NEW YORK, MONDAY, JULY 21, 1969"

"10 CENTS"

tarruda · 2025-12-15T12:20:11Z

@ngxson Can this branch be used with GLM 4.6V (the 106B one)? I can assist with testing if desired.

ngxson · 2025-12-15T14:29:40Z

@tarruda just tested, it should work with the latest commit (feel free to give it a try)

tarruda · 2025-12-15T14:31:27Z

@tarruda just tested, it should work with the latest commit (feel free to give it a try)

Will do. Did you publish any GGUF weights?

ngxson · 2025-12-15T14:34:00Z

src/llama-model.cpp

+        case LLM_ARCH_GLM4:
+            return model->hparams.use_mrope() ? LLAMA_ROPE_TYPE_MROPE : LLAMA_ROPE_TYPE_NORM;
+        case LLM_ARCH_GLM4_MOE:
+            return model->hparams.use_mrope() ? LLAMA_ROPE_TYPE_MROPE : LLAMA_ROPE_TYPE_NEOX;


Because the 2 models (vision and non-vision) are mostly the same, except for the rope mode, so I was quite lazy not to duplicate it into a new arch (which adds involves quite a lot of copy-paste code)

I hope that we can somewhat allow de-duplicating some code via #18051

In the meantime, lmk if you're OK with keeping this hack, or a new arch is still preferable @ggerganov @CISC

ngxson · 2025-12-15T14:34:39Z

Will do. Did you publish any GGUF weights?

No because there is a chance we will change the arch name

convert_hf_to_gguf.py

tools/mtmd/clip-graph.h

Co-authored-by: Georgi Gerganov <[email protected]>

src/llama-hparams.cpp

src/llama-hparams.h

IIIIIllllIIIIIlllll · 2025-12-16T02:21:01Z

@tarruda just tested, it should work with the latest commit (feel free to give it a try)

Will do. Did you publish any GGUF weights?

Sorry for the interruption.

I have quantized a Q4 model, but the current PR does not yet support the vision module.

Edited: 2025-12-16 10:37
Sorry, I have too many local branches and I got confused.

CISC · 2025-12-16T02:32:51Z

Sorry for the interruption.

I have quantized a Q4 model, but the current PR does not yet support the vision module.

Yes it does?

IIIIIllllIIIIIlllll · 2025-12-16T02:54:46Z

it work well (the 106B one) :)

ngxson added 14 commits December 13, 2025 17:19

convert ok

b24d366

no deepstack

7b13c8e

less new tensors

f3f8fb4

cgraph ok

4e81ab4

add mrope for text model

306f342

faster patch merger

6a6e301

add GGML_ROPE_TYPE_MRNORM

c78c2e3

add support for metal

037e76e

move glm4v do dedicated graph

b4e65dc

convert: add norm_embd

7d6a1e0

clip: add debugging fn

5047d8e

working correctly

ad85426

fix style

f00127e

use bicubic

1514734

This was referenced Dec 14, 2025

mtmd: add GLM4V multimodal model with conversion support #17998

Closed

Proposal: ggml_rope_v2 ggml-org/ggml#1401

Open

github-actions bot added model Model specific examples python python script changes ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 14, 2025

ngxson added 2 commits December 14, 2025 23:57

fix mrope metal

cadaedb

improve cpu

4a0b89a

loci-dev mentioned this pull request Dec 14, 2025

UPSTREAM PR #18042: model: support GLM4V vision encoder auroralabs-loci/llama.cpp#570

Open

convert to neox ordering on conversion

d00d11e

github-actions bot added the testing Everything test related label Dec 15, 2025

revert backend changes

f8aad31

ngxson removed testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 15, 2025

force stop if using old weight

8700158

ngxson mentioned this pull request Dec 15, 2025

arch: refactor LLM_TENSOR_NAMES #18051

Open

ngxson added 3 commits December 15, 2025 14:42

support moe variant

33fb59a

fix conversion

c8fd94f

fix convert (2)

785ccf4

ngxson marked this pull request as ready for review December 15, 2025 14:30

ngxson requested review from CISC and ggerganov as code owners December 15, 2025 14:30

ngxson commented Dec 15, 2025

View reviewed changes

CISC reviewed Dec 15, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

ggerganov approved these changes Dec 15, 2025

View reviewed changes

tools/mtmd/clip-graph.h Outdated Show resolved Hide resolved

ngxson and others added 4 commits December 15, 2025 22:00

Update tools/mtmd/clip-graph.h

7d53c0f

Co-authored-by: Georgi Gerganov <[email protected]>

Merge branch 'master' into xsn/glm4v

b81c03c

process mrope_section on TextModel base class

dd66aba

Merge branch 'master' into xsn/glm4v

35ad5a5

CISC approved these changes Dec 15, 2025

View reviewed changes

src/llama-hparams.cpp Outdated Show resolved Hide resolved

src/llama-hparams.h Outdated Show resolved Hide resolved

resolve conflict merge

f969d4f

ngxson merged commit 3d86c6c into ggml-org:master Dec 16, 2025
73 of 80 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: support GLM4V vision encoder #18042

model: support GLM4V vision encoder #18042

ngxson commented Dec 14, 2025 •

edited

Loading

Uh oh!

tarruda commented Dec 15, 2025

Uh oh!

ngxson commented Dec 15, 2025

Uh oh!

tarruda commented Dec 15, 2025

Uh oh!

ngxson Dec 15, 2025

Uh oh!

ngxson commented Dec 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IIIIIllllIIIIIlllll commented Dec 16, 2025 •

edited

Loading

Uh oh!

CISC commented Dec 16, 2025

Uh oh!

IIIIIllllIIIIIlllll commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

model: support GLM4V vision encoder #18042

model: support GLM4V vision encoder #18042

Conversation

ngxson commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

tarruda commented Dec 15, 2025

Uh oh!

ngxson commented Dec 15, 2025

Uh oh!

tarruda commented Dec 15, 2025

Uh oh!

ngxson Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Dec 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IIIIIllllIIIIIlllll commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Dec 16, 2025

Uh oh!

IIIIIllllIIIIIlllll commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngxson commented Dec 14, 2025 •

edited

Loading

IIIIIllllIIIIIlllll commented Dec 16, 2025 •

edited

Loading