update: support new Llama API + assess OpenGVLab/OmniQuant#113 by Tfloow · Pull Request #114 · OpenGVLab/OmniQuant

Tfloow · 2026-04-11T11:48:11Z

See #113 for issue

Also support new Llama API. Work in Progress, Need some code review to make sure I don't break any other things in Omniquant.

Tfloow · 2026-04-11T11:48:53Z

Also, I updated the weight_only=False for torch to be able to load properly the testloader/dataloader

Tfloow · 2026-04-11T12:14:33Z

Benchmark results:

python main.py --model meta-llama/Llama-3.2-1B --epochs 0 --output_dir ./log --eval_ppl --wbits 4 --abits 16 --group_size 128 --lwc

On Llama-3.2-1B with Wikitext2 epochs 0 lwc

Previous version: PPL = 11.66
My proposition: PPL = 11.56

Roughly 1% PPL reduction by removing unnecessary quantization of activation when --abits 16

update: support new Llama API + assess OpenGVLab#113

6fc42bf

update: removed c4 from dataset in evaluate (deprecated)

bde2419

Tfloow mentioned this pull request Apr 13, 2026

I encounter a error: "AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb'",when i run code with llama-1-7b. It happened in int_llama_layer.py: self.rotary_emb = copy.deepcopy(org_module.rotary_emb) #104

Open

update: proper support for the Cache API

4a23054

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update: support new Llama API + assess OpenGVLab/OmniQuant#113#114

update: support new Llama API + assess OpenGVLab/OmniQuant#113#114
Tfloow wants to merge 3 commits into
OpenGVLab:mainfrom
Tfloow:llama-and-act-update

Tfloow commented Apr 11, 2026

Uh oh!

Tfloow commented Apr 11, 2026

Uh oh!

Tfloow commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Tfloow commented Apr 11, 2026

Uh oh!

Tfloow commented Apr 11, 2026

Uh oh!

Tfloow commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant