Skip to content

update: support new Llama API + assess OpenGVLab/OmniQuant#113#114

Open
Tfloow wants to merge 3 commits into
OpenGVLab:mainfrom
Tfloow:llama-and-act-update
Open

update: support new Llama API + assess OpenGVLab/OmniQuant#113#114
Tfloow wants to merge 3 commits into
OpenGVLab:mainfrom
Tfloow:llama-and-act-update

Conversation

@Tfloow

@Tfloow Tfloow commented Apr 11, 2026

Copy link
Copy Markdown

See #113 for issue

Also support new Llama API. Work in Progress, Need some code review to make sure I don't break any other things in Omniquant.

@Tfloow

Tfloow commented Apr 11, 2026

Copy link
Copy Markdown
Author

Also, I updated the weight_only=False for torch to be able to load properly the testloader/dataloader

@Tfloow

Tfloow commented Apr 11, 2026

Copy link
Copy Markdown
Author

Benchmark results:

python main.py --model meta-llama/Llama-3.2-1B --epochs 0 --output_dir ./log --eval_ppl --wbits 4 --abits 16 --group_size 128 --lwc 

On Llama-3.2-1B with Wikitext2 epochs 0 lwc

  • Previous version: PPL = 11.66
  • My proposition: PPL = 11.56

Roughly 1% PPL reduction by removing unnecessary quantization of activation when --abits 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant