Skip to content

Conversation

@qinyiqun
Copy link
Contributor

No description provided.

@qinyiqun qinyiqun requested a review from a team January 29, 2026 08:24
auto v_reshaped = v->view({seq_len, num_key_value_heads_, head_dim_});

if (use_qk_norm_) {
if (model_config_->get<std::string>("model_type") == "qwen3") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

得把这个llama模型提出来共用部分了,在llama attention里判断model type是qwen3太怪了

@qinyiqun qinyiqun force-pushed the demo131_quant branch 2 times, most recently from 0faef9b to 2cd04c1 Compare February 5, 2026 09:14
@qinyiqun qinyiqun force-pushed the demo131_quant branch 2 times, most recently from 36d173e to 4bf20d6 Compare February 10, 2026 06:21
wooway777 and others added 13 commits February 11, 2026 01:07
支持nv w8 1batch 1tp

增加json支持

InfiniLM 增加量化层和global config

以一种比较优雅的方式增加了quant config的支持

修改部分代码结构,删除无用代码

跟随inifnicore修改

删除所有的model_config,统一使用global_config

跟随InfiniLM最新代码修改

修改函数参数顺序

改名global config 为model config

Refactor: add new API alongside legacy interfaces with deprecation warnings

添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore

添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore
qy_page_131: add qy device

success qy inference_server.py
@wooway777 wooway777 self-requested a review February 11, 2026 02:29
@wooway777 wooway777 merged commit 71c7058 into InfiniTensor:demo131 Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants