mtmd: add Eagle2-VL vision and projector support #17360

YaelGitAccount · 2025-11-18T12:16:04Z

This PR adds initial support for the Eagle2-VL multimodal models (1B / 2B) in the MTMD pipeline.

The update introduces a dedicated converter path and runtime builder for the Eagle2-VL vision tower and its 2-layer projector.
All changes are fully self-contained and do not affect any existing model architectures.

Converter (convert_hf_to_gguf.py)

Registers a new model handler Eagle2VLVisionModel.
Writes VisionProjectorType=EAGLE2VL into GGUF metadata.
Extracts Eagle2-VL vision metadata (image/patch size, mean/std, block count, RMSNorm eps).
Supports metadata-driven spatial merge (spatial_merge_size, default: 2×2).
Canonicalizes projector weights (mm.0, mm.2) to [n_in, n_out]; supports optional biases.
Converts Conv3D patch-embed kernels into two Conv2D kernels when present.
Normalizes HF checkpoint prefixes to align with MTMD conventions.

GGUF (gguf-py/gguf/constants.py)

Adds the new projector type EAGLE2VL.

Runtime (tools/mtmd/clip.cpp)

Adds a dedicated build_eagle2vl() vision path:
- ViT with learned absolute position embeddings (including dynamic-resize support).
- Metadata-driven spatial merge prior to the projector.
- 2-layer MLP projector (mm.0 → GELU → mm.2) using canonical [n_in, n_out] weights.
Updates dispatcher to route PROJECTOR_TYPE_EAGLE2VL to the new builder.
Final embedding dimension derived from mm_2_w->ne[1].

Integration & Compatibility

Loader extended to read Eagle2-VL projector tensors.
No CLI changes.
No impact on other projector types or existing model architectures.

Validation

Tested locally on Eagle2-VL 1B and 2B checkpoints:

GGUF conversion produces expected metadata.
Vision tower + spatial merge + projector run end-to-end.
All matmuls operate on canonical weights (no runtime transposes).
Inference completes successfully.

Scope

This PR focuses on Eagle2-VL (1B / 2B).
Support for additional Eagle2 variants (e.g., 9B) will be handled in a follow-up.

Closes #16704

ngxson

Can you explicitly confirm if part of the PR is generated by AI? I feel very suspicious about some redundant code

While you said in the PR description that you tested it, you haven't even mentioned the link to the model, as well as how you tested it.

ngxson · 2025-11-18T20:25:58Z

tools/mtmd/clip.cpp

+                                learned_pos_embd,
+                                nullptr);
+
+    // keep runtime quiet in normal runs; shapes are correct by construction


some indentations seem off here

ngxson · 2025-11-18T20:26:37Z

tools/mtmd/clip.cpp

+        if (model.mm_0_b) {
+            embeddings = ggml_add(ctx0, embeddings, model.mm_0_b);
+        }
+
+        embeddings = ggml_gelu(ctx0, embeddings);
+
+        GGML_ASSERT(model.mm_2_w != nullptr);
+        // keep [n_in, n_tokens] layout for the second matmul as well
+        embeddings = ggml_reshape_2d(ctx0, embeddings, embeddings->ne[0], embeddings->ne[1]);
+        embeddings = ggml_cont_2d(ctx0, embeddings, embeddings->ne[0], embeddings->ne[1]);
+        // Weights are canonicalized at conversion time to [n_in, n_out]; multiply directly.
+        embeddings = ggml_mul_mat(ctx0, model.mm_2_w, embeddings);
+        if (model.mm_2_b) {
+            embeddings = ggml_add(ctx0, embeddings, model.mm_2_b);
+        }


better replacing this whole block with build_ffn

ngxson · 2025-11-18T20:32:26Z

convert_hf_to_gguf.py

+        mlp_pos = name.find("mlp1.")
+        if mlp_pos != -1:
+            mlp_suffix = name[mlp_pos + len("mlp1."):]
+            # Skip LayerNorm (mlp1.0.*)
+            if mlp_suffix.startswith("0."):
+                return []
+            # Map first Linear (mlp1.1.*) -> mm.0.*
+            if mlp_suffix.startswith("1."):
+                new_name = "mm.0." + mlp_suffix[2:]
+                if new_name.endswith(".weight"):


I think all of these code are redundant. This model: https://huggingface.co/nvidia/Eagle2-1B has simple .mlp.fc1 and .mlp.fc2 MLP, there is no nesting mlp1.1.* as you described

ngxson · 2025-11-18T20:33:12Z

convert_hf_to_gguf.py

+            ]
+
+        # 5) Conv3D patch embed -> two Conv2D kernels
+        if name.endswith("patch_embed.proj.weight") and data_torch.ndim == 5:


are you sure about this? seems like bad copy-paste code from QwenVL

Copilot

Pull request overview

This PR adds Eagle2-VL multimodal model support (1B/2B variants) to the MTMD pipeline. The implementation introduces a dedicated projector type with a 2-layer MLP architecture (LayerNorm → Linear → GELU → Linear) that operates on spatially-merged vision tokens. The changes are self-contained and follow established MTMD patterns for projector implementations.

Key Changes

New Eagle2VL projector type with metadata-driven spatial merge (default 2×2) and learned absolute position embeddings
Python converter handles HuggingFace checkpoint normalization, including projector weight canonicalization to [n_in, n_out] layout and QKV tensor splitting
Runtime graph builder implements ViT encoder with RMS normalization, spatial merge, and 2-layer MLP projector using existing helper functions

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
gguf-py/gguf/constants.py	Adds EAGLE2VL projector type constant to VisionProjectorType enum
tools/mtmd/clip-impl.h	Registers PROJECTOR_TYPE_EAGLE2VL enum and string mapping "eagle2vl"
tools/mtmd/clip.cpp	Implements build_eagle2vl() graph builder, parameter loading, preprocessing integration, and embedding dimension calculation
convert_hf_to_gguf.py	Adds Eagle2VLVisionModel converter with metadata extraction, tensor name normalization, projector weight canonicalization, and QKV splitting

convert_hf_to_gguf.py

Co-authored-by: Copilot <[email protected]>

mtmd : add Eagle2-VL vision support

c5ac776

YaelGitAccount requested review from CISC and ngxson as code owners November 18, 2025 12:16

DajanaV mentioned this pull request Nov 18, 2025

UPSTREAM PR #17360: mtmd: add Eagle2-VL vision and projector support auroralabs-loci/llama.cpp#259

Open

github-actions bot added examples python python script changes labels Nov 18, 2025

Update convert_hf_to_gguf.py

e35a94b

ngxson requested changes Nov 18, 2025

View reviewed changes

YaelGitAccount and others added 7 commits December 7, 2025 11:34

style: fix indentation in build_eagle2vl

6247fd2

Refactor MLP projector to use build_ffn helper

0dda80f

add projector LayerNorm weight mapping (converter side)

eb83923

Update convert_hf_to_gguf.py

f4af853

apply projector LayerNorm at runtime

b4f660f

eagle2-vl: drop Conv3D patch embed handling

e51fd1f

Use vision_config.layer_norm_eps instead of text rms_norm_eps

a5cf45a

Copilot AI review requested due to automatic review settings December 11, 2025 15:39

Copilot started reviewing on behalf of YaelGitAccount December 11, 2025 15:55 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

YaelGitAccount and others added 5 commits December 14, 2025 10:58

converter: add warnings for unexpected mm.2 weight shapes

6711a0d

Co-authored-by: Copilot <[email protected]>

converter: clarify Eagle2-VL projector layer indexing

50712b6

Co-authored-by: Copilot <[email protected]>

refactor(mmproj): clarify projector weight transpose logic

3140d30

Co-authored-by: Copilot <[email protected]>

clarify mm.2 projector dimension comment

e60a100

Co-authored-by: Copilot <[email protected]>

Merge branch 'master' into feat/eagle2-vl-support

a449251

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: add Eagle2-VL vision and projector support #17360

mtmd: add Eagle2-VL vision and projector support #17360

YaelGitAccount commented Nov 18, 2025

Uh oh!

ngxson left a comment

Uh oh!

ngxson Nov 18, 2025

Uh oh!

ngxson Nov 18, 2025

Uh oh!

ngxson Nov 18, 2025

Uh oh!

ngxson Nov 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mtmd: add Eagle2-VL vision and projector support #17360

Are you sure you want to change the base?

mtmd: add Eagle2-VL vision and projector support #17360

Conversation

YaelGitAccount commented Nov 18, 2025

Converter (convert_hf_to_gguf.py)

GGUF (gguf-py/gguf/constants.py)

Runtime (tools/mtmd/clip.cpp)

Integration & Compatibility

Validation

Scope

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants