Support Quantization #209

qinyiqun · 2026-01-29T08:24:17Z

No description provided.

PanZezhong1725 · 2026-01-29T09:00:00Z

csrc/models/llama/llama_attention.cpp

    auto v_reshaped = v->view({seq_len, num_key_value_heads_, head_dim_});

-    if (use_qk_norm_) {
+    if (model_config_->get<std::string>("model_type") == "qwen3") {


得把这个llama模型提出来共用部分了，在llama attention里判断model type是qwen3太怪了

支持nv w8 1batch 1tp 增加json支持 InfiniLM 增加量化层和global config 以一种比较优雅的方式增加了quant config的支持修改部分代码结构，删除无用代码跟随inifnicore修改删除所有的model_config，统一使用global_config 跟随InfiniLM最新代码修改修改函数参数顺序改名global config 为model config Refactor: add new API alongside legacy interfaces with deprecation warnings 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

qy_page_131: add qy device success qy inference_server.py

Signed-off-by: Ceng23333 <[email protected]>

qinyiqun requested a review from a team January 29, 2026 08:24

PanZezhong1725 reviewed Jan 29, 2026

View reviewed changes

PanZezhong1725 force-pushed the demo131 branch from be5878b to 4340dff Compare January 30, 2026 05:50

qinyiqun force-pushed the demo131_quant branch 2 times, most recently from 0faef9b to 2cd04c1 Compare February 5, 2026 09:14

qinyiqun force-pushed the demo131_quant branch 2 times, most recently from 36d173e to 4bf20d6 Compare February 10, 2026 06:21

wooway777 force-pushed the demo131 branch from 60d6545 to ee59b3f Compare February 10, 2026 10:26

wooway777 and others added 13 commits February 11, 2026 01:07

issue/204 - support graph in server scripts

dc100cd

issue/208 - adapt to ali ppu

558c460

issue/175 - qy device support

3ec83da

qy_page_131: add qy device success qy inference_server.py

Issue/170 - Add HYGON support and improve device type handling.

248a9b6

Issue/193: feats for deployment

2976c8b

Signed-off-by: Ceng23333 <[email protected]>

skip responding eos token

6d25eb3

Signed-off-by: Ceng23333 <[email protected]>

issue/143 use add_rmsnorm, nt flash attn, nt kv caching

c7352e2

issue/204 - support graph in server scripts

668944b

issue/208 - adapt to ali ppu

e54aaeb

rebase main

2892172

issue/216 feat: support static kv cache in server

fe8db3f

fix llm server cache config

4f64019

wooway777 force-pushed the demo131_quant branch from 4bf20d6 to 4f64019 Compare February 11, 2026 01:50

wooway777 added 3 commits February 11, 2026 01:58

demo131 - resolve mishandled conflicts

9e3d413

demo131 - further adjust attn and caching logic

675df6b

demo131 - resolve merge requirements

cb5075e

wooway777 self-requested a review February 11, 2026 02:29

wooway777 approved these changes Feb 11, 2026

View reviewed changes

wooway777 merged commit 71c7058 into InfiniTensor:demo131 Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Quantization #209

Support Quantization #209

Uh oh!

qinyiqun commented Jan 29, 2026

Uh oh!

PanZezhong1725 Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Support Quantization #209

Support Quantization #209

Uh oh!

Conversation

qinyiqun commented Jan 29, 2026

Uh oh!

PanZezhong1725 Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants