Add GPTQ to prototype #3517

jcaip · 2025-12-19T06:20:21Z

This PR adds GPTQ support to torchao.prototype.gptq.

It is exposed via a new config, GPTQConfig, which can have two steps, "observe" and "convert".

When quantize_(model, GPTQConfig(step="observe")) is run, observer tensors are attached to the weight tensors, which keep track of linear / torch.bmm ops and updates the hessian matrix based on the observed inputs.

When quantize_(model, GPTQConfig(step="convert")) is run, we will find any observer tensors, take the Hessian and do int4 GPTQ quantization to find the weights. The core of this function is in gptq_quantize.

Currently only int4 is hardcoded, but if we enable dequantization and the ability to create a tensor with existing qparams, we should be able to do this for any config.

Also included is an example script, gptq_example.py that does sequential / nonsequential quantization on helllaswag for a simple example.

- Add GPTQ quantization algorithm implementation in torchao/prototype/gptq - Add ObserverTensor for activation tracking during calibration - Add unified GPTQConfig that handles both observation and quantization phases - Add gptq_quantize function for weight quantization with Hessian - Add comprehensive test suite for GPTQ and ObserverTensor - Add example script demonstrating GPTQ quantization workflow - All Int4 helper functions are self-contained in gptq module

pytorch-bot · 2025-12-19T06:20:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3517

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures

As of commit 5b99ce2 with merge base c4273fe ():

NEW FAILURES - The following jobs have failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio fbgemm-gpu-genai --index-url https... / linux-job (gh)
test/integration/test_load_and_run_checkpoint.py::TestLoadAndRunCheckpoint::test_deprecated_hf_models_model_info2
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t a4c20911358b899b0e1905cb04f9ee0bbdc5a0fd1bc73f200da952d762951eef /exec failed with exit code 2
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.1 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t cd711469f48f78abcecccb9d16347baadfbd77fea9ca7bbbe79c47cbbcd98b2f /exec failed with exit code 2
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 9314c3e021e6bedb4b6c40c47bf933f9dce9e0faf24f85a6813731cbbfd97a9c /exec failed with exit code 2
Run Regression Tests / test (CPU 2.9, linux.4xlarge, torch==2.9.1 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 658ba4a82bbf84b3faf218869a0d1581d57d2eb6973d5ab9e20b05e152f211da /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 4d7e18da76e104685ab9f4f99ae0a330f5d50d781e8d83a4f739a4340e475160 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.1, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t b45ef6a862d9ffd48801fd3601df5b3e796ba0771ee2d9fc89d2c3da00eba614 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 25c668fd5683a337ed479618012db3995ab12754262c43257747eaaf8e7333b5 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.9, linux.g5.12xlarge.nvidia.gpu, torch==2.9.1, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t becb899352c2d7fd6ffe691902ef8211fbfad08099569274556baba93694e465 /exec failed with exit code 2
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t 01924f475ac9cda3461d084e12b42b02b83699f940c71f19688258ebdbd959fe /exec failed with exit code 2
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 9a9a753bc9378a6aeab791999f789f2205b2a8a18e49be0bd1c941763d150ce6 /exec failed with exit code 2
Run TorchAO Experimental MPS Tests / test-mps-ops / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-12-19T20:36:57Z

test/prototype/test_gptq.py

@@ -0,0 +1,453 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


nit: put in test/prototype/gptq folder to match codebase convention

vkuzo · 2025-12-19T20:37:27Z

torchao/prototype/gptq/__init__.py

+from .observer import ObserverTensor
+
+
+@dataclass


nit: put logic in some_file.py instead of __init__.py

vkuzo · 2025-12-19T20:38:37Z

torchao/prototype/gptq/__init__.py

+    dead = torch.diag(H) == 0
+    H[dead, dead] = 1
+    W[:, dead] = 0
+
+    damp = percdamp * torch.mean(torch.diag(H))
+    diag = torch.arange(columns, device=device)
+    H[diag, diag] += damp
+    H = torch.linalg.cholesky(H)
+    H = torch.cholesky_inverse(H)
+    H = torch.linalg.cholesky(H, upper=True)
+    Hinv = H


nit: add some comments linking to the right page of the paper

vkuzo · 2025-12-19T20:39:07Z

torchao/prototype/gptq/__init__.py

+
+    all_qparams = []
+
+    for W_quantize_block, block_start in zip(


nit: add comments as needed to explain how this code is following the algorithm in the paper

vkuzo · 2025-12-19T20:39:50Z

torchao/prototype/gptq/__init__.py

+def _calculate_hessian(inputs, device=None):
+    """Calculate Hessian matrix from input activations for GPTQ.
+
+    DEPRECATED: This function is kept for backward compatibility in tests only.


backward compatibility with what?

vkuzo · 2025-12-19T20:40:46Z

torchao/prototype/gptq/observer.py

+from torchao.utils import TorchAOBaseTensor
+
+
+class ObserverTensor(TorchAOBaseTensor):


the logic here is specific to GPTQ, can we ensure the name specifies that

jcaip added 3 commits December 18, 2025 20:08

update

4e4bbb2

update

8a1ce46

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2025

jcaip added 3 commits December 19, 2025 11:03

update

5e8ed50

remove print

8353c4b

fix tests

5b99ce2

jcaip added topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) accuracy Accuracy related labels Dec 19, 2025

jcaip marked this pull request as ready for review December 19, 2025 19:29

jcaip requested review from jerryzh168 and vkuzo December 19, 2025 19:29

vkuzo reviewed Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPTQ to prototype #3517

Add GPTQ to prototype #3517

jcaip commented Dec 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

vkuzo Dec 19, 2025

Uh oh!

vkuzo Dec 19, 2025

Uh oh!

vkuzo Dec 19, 2025

Uh oh!

vkuzo Dec 19, 2025

Uh oh!

vkuzo Dec 19, 2025

Uh oh!

vkuzo Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,453 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

		from torchao.utils import TorchAOBaseTensor


		class ObserverTensor(TorchAOBaseTensor):

Add GPTQ to prototype #3517

Are you sure you want to change the base?

Add GPTQ to prototype #3517

Conversation

jcaip commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3517

❌ 12 New Failures

Uh oh!

vkuzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcaip commented Dec 19, 2025 •

edited

Loading

pytorch-bot bot commented Dec 19, 2025 •

edited

Loading