Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 14, 2025

On first look, it seems to be an easy model to support as the HF implementation is pretty much the same as Qwen2.5VL

However, there are some very subtle differences that even some LLMs will miss (I tried both Grok and Gemini 3 and they both missed the 2 first points):

  1. For text model, M-RoPE ordering is non-Neox. And because ggml's M-RoPE uses the Neox ordering by default, we need to convert the weight to neox ordering upon conversion. This is by far the most complicated change to support this model
  2. Learned position embedding interpolation uses bicubic instead of bilinear
  3. Added a norm layer right after patch bias
  4. Use RMS norm

The embedding output was tested against HF transformers and confirmed to be matched

Important

RoPE ordering was corrected upon conversion - no more backend changes here in this PR

Testing

https://huggingface.co/zai-org/GLM-4.6V-Flash

I'm using the ./tools/mtmd/test-1.jpeg already included in this repo:

llama-mtmd-cli -m ..... -mm ..... --image ./tools/mtmd/test-1.jpeg -p "extract all texts from this image" --temp 0 -n 1024

Output:

"All the News That's Fit to Print"

Then the newspaper title: "The New York Times"

"LATE CITY EDITION"

"VOL. CXLVIII, No. 40,711"

"NEW YORK, MONDAY, JULY 21, 1969"

"10 CENTS"

Then the main headline: "MEN WALK ON MOON"

Next: "ASTRONAUTS LAND ON PLAIN; COLLECT ROCKS, PLANT FLAG"

Then a section: "Voice From Moon: 'Eagle Has Landed'"

Then the article by John Noble Wilford: "A Powdery Surface Is Closely Explored"

Now, let's transcribe each part carefully, including smaller text.

First, the top left box:

"All the News  
That's Fit to Print"

Then the newspaper header:

"The New York Times"

"LATE CITY EDITION"

"VOL. CXLVIII, No. 40,711"

"NEW YORK, MONDAY, JULY 21, 1969"

"10 CENTS"

@github-actions github-actions bot added model Model specific examples python python script changes ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 14, 2025
@github-actions github-actions bot added the testing Everything test related label Dec 15, 2025
@ngxson ngxson removed testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 15, 2025
@tarruda
Copy link

tarruda commented Dec 15, 2025

@ngxson Can this branch be used with GLM 4.6V (the 106B one)? I can assist with testing if desired.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 15, 2025

@tarruda just tested, it should work with the latest commit (feel free to give it a try)

@ngxson ngxson marked this pull request as ready for review December 15, 2025 14:30
@tarruda
Copy link

tarruda commented Dec 15, 2025

@tarruda just tested, it should work with the latest commit (feel free to give it a try)

Will do. Did you publish any GGUF weights?

Comment on lines +7840 to +7843
case LLM_ARCH_GLM4:
return model->hparams.use_mrope() ? LLAMA_ROPE_TYPE_MROPE : LLAMA_ROPE_TYPE_NORM;
case LLM_ARCH_GLM4_MOE:
return model->hparams.use_mrope() ? LLAMA_ROPE_TYPE_MROPE : LLAMA_ROPE_TYPE_NEOX;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the 2 models (vision and non-vision) are mostly the same, except for the rope mode, so I was quite lazy not to duplicate it into a new arch (which adds involves quite a lot of copy-paste code)

I hope that we can somewhat allow de-duplicating some code via #18051

In the meantime, lmk if you're OK with keeping this hack, or a new arch is still preferable @ggerganov @CISC

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 15, 2025

Will do. Did you publish any GGUF weights?

No because there is a chance we will change the arch name

@IIIIIllllIIIIIlllll
Copy link

IIIIIllllIIIIIlllll commented Dec 16, 2025

@tarruda just tested, it should work with the latest commit (feel free to give it a try)

Will do. Did you publish any GGUF weights?

Sorry for the interruption.

I have quantized a Q4 model, but the current PR does not yet support the vision module.

Edited: 2025-12-16 10:37
Sorry, I have too many local branches and I got confused.

@CISC
Copy link
Collaborator

CISC commented Dec 16, 2025

Sorry for the interruption.

I have quantized a Q4 model, but the current PR does not yet support the vision module.

Yes it does?

@IIIIIllllIIIIIlllll
Copy link

it work well (the 106B one) :)

image

@ngxson ngxson merged commit 3d86c6c into ggml-org:master Dec 16, 2025
73 of 80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants