Skip to content

[quantization] Introduce wrapper for Qwen3VLVisionBlock#500

Merged
dayo09 merged 1 commit intoSamsung:mainfrom
dvsav:quant_vision_block
Mar 5, 2026
Merged

[quantization] Introduce wrapper for Qwen3VLVisionBlock#500
dayo09 merged 1 commit intoSamsung:mainfrom
dvsav:quant_vision_block

Conversation

@dvsav
Copy link
Copy Markdown
Contributor

@dvsav dvsav commented Feb 19, 2026

This change introduces QuantQwen3VLVisionBlock wrapper to support post-training quantization of Qwen3VLVisionBlock module.

Why?

Qwen3VLVisionBlock module is used in the image encoder part of Qwen model.
Trying to quantize Qwen3VLVisionBlock via PTQ generates exception PTQQuantizer: no quantization wrapper for Qwen3VLVisionBlock.

What

This change introduces:

  • Class QuantQwen3VLVisionBlock (tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_block.py).
  • Unit tests: class TestQuantQwen3VLVisionBlock (test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py) - skipped if transformers package is not installed.
  • New entry in _CORE_MODULES (tico/quantization/wrapq/wrappers/registry.py).
  • Example of Qwen3VLVisionBlock quantization and conversion to Circle (tico/quantization/wrapq/examples/qwen/quantize_vision_block.py).

Unit Tests

Below unit tests run is presented along with coverage information (irrelevant files replaced with ellipsis ...):

$ coverage run -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py -v
======================================================================================= test session starts ========================================================================================
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python3
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 7 items                                                                                                                                                                                  

test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_different_num_patches            PASSED                                               [ 14%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_forward_diff                     PASSED                                               [ 28%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_mode_transitions                 PASSED                                               [ 42%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_observer_count                   PASSED                                               [ 57%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_output_shape                     PASSED                                               [ 71%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_registration_in_registry         PASSED                                               [ 85%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_vision_block.py::TestQuantQwen3VLVisionBlock::test_residual_connection_preservation PASSED                                               [100%]

================================================================================== 7 passed, 2 warnings in 7.21s ===================================================================================
$ coverage report -m
Name                                                                    Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------------------------------------
...
tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_block.py             42      0   100%
...
-----------------------------------------------------------------------------------------------------
TOTAL                                                                   10670   6671    37%

Example Script

$ python3 tico/quantization/wrapq/examples/qwen/quantize_vision_block.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.139611
│ PEIR       : 8.666391 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 5.4┤                                            │
    │                                        ••  │
    │                                     •• •   │
 3.6┤                                  ••••••    │
    │                                 ••••••     │
    │                              ••••••        │
    │                           ••••••••         │
 1.9┤                          •••••••           │
    │                       ••••••••             │
    │                     ••••••••               │
 0.1┤                   ••••••••                 │
    │                 ••••••••                   │
    │              •••••••••                     │
    │             ••••••••                       │
-1.6┤           ••••••••                         │
    │          ••••••                            │
    │       ••••••••                             │
-3.4┤      ••••••                                │
    │    •••••                                   │
    │  •••••                                     │
    │                                            │
-5.1┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -5.1       -2.5        0.1       2.8       5.4 


Circle model saved as 'quantized_vision_block.circle'

@dvsav
Copy link
Copy Markdown
Contributor Author

dvsav commented Mar 3, 2026

Reference Code

Below is the source code of Qwen3VLVisionBlock:

# transformers/models/qwen3_vl/modeling_qwen3_vl.py
class Qwen3VLVisionBlock(GradientCheckpointingLayer):
    def __init__(self, config, attn_implementation: str = "sdpa") -> None:
        super().__init__()
        self.norm1 = nn.LayerNorm(config.hidden_size, eps=1e-6)
        self.norm2 = nn.LayerNorm(config.hidden_size, eps=1e-6)
        self.attn = Qwen3VLVisionAttention(config=config)
        self.mlp = Qwen3VLVisionMLP(config=config)

    def forward(
        self,
        hidden_states: torch.Tensor,
        cu_seqlens: torch.Tensor,
        rotary_pos_emb: torch.Tensor | None = None,
        position_embeddings: tuple[torch.Tensor, torch.Tensor] | None = None,
        **kwargs,
    ) -> torch.Tensor:
        hidden_states = hidden_states + self.attn(
            self.norm1(hidden_states),
            cu_seqlens=cu_seqlens,
            rotary_pos_emb=rotary_pos_emb,
            position_embeddings=position_embeddings,
            **kwargs,
        )
        hidden_states = hidden_states + self.mlp(self.norm2(hidden_states))
        return hidden_states

This change introduces QuantQwen3VLVisionBlock wrapper to support post-training quantization of Qwen3VLVisionBlock module.

TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
@dvsav dvsav force-pushed the quant_vision_block branch from ff088fc to 4607679 Compare March 3, 2026 16:18
@dvsav dvsav marked this pull request as ready for review March 3, 2026 16:26
@dayo09 dayo09 requested review from dayo09 and mhs4670go March 4, 2026 06:09
Copy link
Copy Markdown
Contributor

@mhs4670go mhs4670go left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@dayo09 dayo09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :-D

FYI, we will need to split heads of VLM vision attention blocks likewise we did in llama attention blocks.

@dayo09 dayo09 merged commit b57a455 into Samsung:main Mar 5, 2026
7 checks passed
@dvsav dvsav deleted the quant_vision_block branch March 5, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants