Adds int4 Quantization Support #21435

JyotinderSingh · 2025-06-29T09:34:52Z

Summary

This PR introduces support for int4 weight-only quantization for the Dense layer. The implementation includes the necessary logic for packing and unpacking int4 values, performing the quantized matrix multiplication, and ensuring compatibility with features like LoRA.

The code currently implements W4A8 quantization scheme.

Description

The core changes include:

Support for int4 quantization mode.
Packing and Unpacking Utilities:
- pack_int4 takes an int8 tensor (representing int4 values) and packs two 4-bit values into a single int8 byte.
- unpack_int4 performs the reverse operation, unpacking the int8 tensor back into an int8 tensor of int4 values.
Dense Layer Modifications:
- _int4_build: Builds a packed kernel of int8 dtype and a kernel_scale variable. The original input dimension is saved in _orig_input_dim to handle unpacking correctly.
- _int4_call: Defines the forward pass for the int4 quantized layer. It uses a custom_gradient to perform the matrix multiplication with the unpacked kernel and correctly computes the gradients with respect to the original inputs.
- The quantize method now handles mode="int4". It quantizes the float weights to int4 values and then packs them using pack_int4.
- LoRA Compatibility:
  - The enable_lora method correctly determines the input dimension for the LoRA matrices when the layer is int4 quantized by using the saved _orig_input_dim.
  - The _get_kernel_with_merged_lora method handles the unpacking of the int4 kernel before merging the LoRA weights, followed by re-quantization and re-packing.

Testing

Added tests for int4 quantization in dense_test.py. These tests cover basic correctness, serialization (saving/loading models), behavior with LoRA enabled, and various edge cases.
Added unit tests for the pack_int4 and unpack_int4 functions in quantizers_test.py to ensure they work correctly for various tensor shapes and axes.

Benchmarking

Note: Results collected with warmed-up GPUs and pre-loaded models and kernels.

Text Generation Micro-Benchmark with OPT 125M using KerasHub: colab notebook

Classifier Micro-Benchmark with DistilBERT (fine-tuned on SST2) using KerasHub: colab notebook

Limitation

The current implementation performs a kernel unpack on every forward-pass (to unpack the int4 kernel from it's packed int8 representation where each byte stores two nibbles). This means that we lose some memory savings at runtime along with some performance penalty.

We may be able to work around this in the future by writing custom kernels which operate directly on the packed int4 representation.

Further work

Exploring calibration methods discussed in AWQ (Activation-aware Weight Quantization) and GPTQ papers which could potentially be used to expose new APIs to allow better inference performance.

codecov-commenter · 2025-06-29T09:40:53Z

Codecov Report

Attention: Patch coverage is 90.57971% with 13 lines in your changes missing coverage. Please review.

Project coverage is 82.80%. Comparing base (744b8be) to head (98fa1ed).
Report is 19 commits behind head on master.

Files with missing lines	Patch %	Lines
keras/src/layers/core/dense.py	89.55%	2 Missing and 5 partials ⚠️
keras/src/quantizers/quantizers.py	93.22%	2 Missing and 2 partials ⚠️
keras/api/_tf_keras/keras/quantizers/__init__.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #21435      +/-   ##
==========================================
+ Coverage   74.94%   82.80%   +7.86%     
==========================================
  Files         565      565              
  Lines       55224    55505     +281     
  Branches     8610     8662      +52     
==========================================
+ Hits        41386    45962    +4576     
+ Misses      11880     7429    -4451     
- Partials     1958     2114     +156

Flag	Coverage Δ
keras	`82.61% <90.57%> (+7.84%)`	⬆️
keras-jax	`63.39% <87.68%> (+0.06%)`	⬆️
keras-numpy	`58.60% <73.18%> (?)`
keras-openvino	`33.71% <8.69%> (?)`
keras-tensorflow	`63.84% <90.57%> (+0.10%)`	⬆️
keras-torch	`63.51% <87.68%> (+0.14%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fchollet

Thanks for the PR! The code generally looks good to me. What is the performance profile? How did you benchmark the change?

JyotinderSingh · 2025-07-04T13:21:09Z

Thanks for the PR! The code generally looks good to me. What is the performance profile? How did you benchmark the change?

I hadn't yet benchmarked the code. I've now created two micro-benchmarks and have linked them in the PR description, please take a look!

JyotinderSingh · 2025-07-10T11:32:27Z

There was some issue with the original benchmarking script. I've fixed it, and now we're seeing significantly better results for GPU memory usage.

Text Generation Micro-Benchmark with OPT 125M using KerasHub: colab notebook

Classifier Micro-Benchmark with DistilBERT (fine-tuned on SST2) using KerasHub: colab notebook

We can expect further speedups once we support quantization in more layers.

fchollet

Thanks for the PR! The code looks good. The new results look reasonable.

keras/src/layers/core/dense.py

keras/src/quantizers/quantizers.py

divyashreepathihalli · 2025-07-11T00:41:50Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for int4 quantization in the Dense layer, including packing/unpacking utilities and LoRA compatibility. The changes are well-structured, and the addition of int4 quantization is a valuable enhancement.

keras/src/layers/core/dense.py

keras/src/quantizers/quantizers.py

Original PR #21435 by JyotinderSingh Original: keras-team/keras#21435

Merged from original PR #21435 Original: keras-team/keras#21435

Original PR #21435 by JyotinderSingh Original: keras-team/keras#21435

Merged from original PR #21435 Original: keras-team/keras#21435

Original PR #21435 by JyotinderSingh Original: keras-team/keras#21435

int4 quantization support

c02e302

google-ml-butler bot added the size:L label Jun 29, 2025

google-ml-butler bot assigned gbaned Jun 29, 2025

JyotinderSingh changed the title ~~[DRAFT] int4 quantization support~~ [DRAFT] Add int4 Quantization Support to Dense Layers and DType Policies Jun 29, 2025

JyotinderSingh changed the title ~~[DRAFT] Add int4 Quantization Support to Dense Layers and DType Policies~~ [DRAFT] Add int4 Quantization Support to Dense Layer Jun 29, 2025

refactor packing utils into quantizers

dd11851

JyotinderSingh force-pushed the int4_quantization branch from 410977a to 71c116a Compare June 29, 2025 15:51

generalize int4 packing

777b5e6

JyotinderSingh force-pushed the int4_quantization branch from c1a58b7 to 777b5e6 Compare June 29, 2025 16:14

restored pytest skip conditions

72a8cbc

gbaned requested a review from mattdangerw June 30, 2025 08:18

google-ml-butler bot added the awaiting review label Jun 30, 2025

gbaned added this to PR Queue Jun 30, 2025

github-project-automation bot moved this to Assigned Reviewer in PR Queue Jun 30, 2025

gbaned removed the awaiting review label Jun 30, 2025

JyotinderSingh added 4 commits June 30, 2025 13:50

fixes 'tuple' object has no attribute 'rank' error

efe244e

fix dtype check to work across backends

7297410

fixed torch compatibility

3a9e26c

fixed jax compatibility

9e25042

JyotinderSingh changed the title ~~[DRAFT] Add int4 Quantization Support to Dense Layer~~ [DRAFT - DO NOT REVIEW] Add int4 Quantization Support to Dense Layer Jun 30, 2025

JyotinderSingh changed the title ~~[DRAFT - DO NOT REVIEW] Add int4 Quantization Support to Dense Layer~~ [DRAFT] Add int4 Quantization Support to Dense Layer Jun 30, 2025

JyotinderSingh added 2 commits June 30, 2025 21:36

removes redundant self._orig_input_dim initialization

1aa86de

improves readability

f9013ae

JyotinderSingh changed the title ~~[DRAFT] Add int4 Quantization Support to Dense Layer~~ Add int4 Quantization Support to Dense Layer Jul 1, 2025

JyotinderSingh changed the title ~~Add int4 Quantization Support to Dense Layer~~ Add int4 Quantization Support Jul 1, 2025

W4A8

f334156

fchollet reviewed Jul 3, 2025

View reviewed changes

added _int4_call stub

f187306

JyotinderSingh force-pushed the int4_quantization branch from a2715b2 to 052f7b6 Compare July 9, 2025 08:57

handle negative axis for pack/unpack

a87687d

standardize docs formatting

9e2901c

fchollet reviewed Jul 10, 2025

View reviewed changes

keras/src/layers/core/dense.py Show resolved Hide resolved

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved

fix docstring format

519e6d7

JyotinderSingh requested a review from fchollet July 10, 2025 23:37

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

keras/src/layers/core/dense.py Show resolved Hide resolved

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved

JyotinderSingh added 3 commits July 11, 2025 09:24

Reduce duplication in _get_kernel_with_merged_lora

41cac4b

remove unnecessary cast ops

0d5c3bd

removes unused var

98fa1ed

divyashreepathihalli approved these changes Jul 11, 2025

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jul 11, 2025

github-project-automation bot moved this from Assigned Reviewer to Approved by Reviewer in PR Queue Jul 11, 2025

kokoro-team removed the kokoro:force-run label Jul 11, 2025

fchollet merged commit 89d953e into keras-team:master Jul 11, 2025
10 checks passed

google-ml-butler bot removed the awaiting review label Jul 11, 2025

github-project-automation bot moved this from Approved by Reviewer to Merged in PR Queue Jul 11, 2025

google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Jul 11, 2025

JyotinderSingh deleted the int4_quantization branch July 11, 2025 17:40

ryantqiu pushed a commit to snorkel-marlin-repos/keras-team_keras_pr_21435_805642a0-a181-4e29-8ae6-da45d62f5235 that referenced this pull request Oct 1, 2025

Adds int4 Quantization Support

b005ba3

Original PR #21435 by JyotinderSingh Original: keras-team/keras#21435

ryantqiu mentioned this pull request Oct 1, 2025

Adds int4 Quantization Support snorkel-marlin-repos/keras-team_keras_pr_21435_805642a0-a181-4e29-8ae6-da45d62f5235#1

Merged

ryantqiu added a commit to snorkel-marlin-repos/keras-team_keras_pr_21435_805642a0-a181-4e29-8ae6-da45d62f5235 that referenced this pull request Oct 1, 2025

Merge pull request #1: Adds int4 Quantization Support

2fcfed3

Merged from original PR #21435 Original: keras-team/keras#21435

ryantqiu pushed a commit to snorkel-marlin-repos/keras-team_keras_pr_21435_09a834a7-1077-435d-a909-d9f3451bb1f0 that referenced this pull request Oct 2, 2025

Adds int4 Quantization Support

2d4f2dd

Original PR #21435 by JyotinderSingh Original: keras-team/keras#21435

ryantqiu mentioned this pull request Oct 2, 2025

Adds int4 Quantization Support snorkel-marlin-repos/keras-team_keras_pr_21435_09a834a7-1077-435d-a909-d9f3451bb1f0#1

Merged

ryantqiu added a commit to snorkel-marlin-repos/keras-team_keras_pr_21435_09a834a7-1077-435d-a909-d9f3451bb1f0 that referenced this pull request Oct 2, 2025

Merge pull request #1: Adds int4 Quantization Support

07e1ab8

Merged from original PR #21435 Original: keras-team/keras#21435

snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/keras-team_keras_pr_21435_58588082-aeb8-4a26-95f8-c66865366dba that referenced this pull request Oct 2, 2025

Adds int4 Quantization Support

53e4987

Original PR #21435 by JyotinderSingh Original: keras-team/keras#21435

snorkelopstesting1-a11y mentioned this pull request Oct 2, 2025

Adds int4 Quantization Support snorkel-marlin-repos/keras-team_keras_pr_21435_58588082-aeb8-4a26-95f8-c66865366dba#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds int4 Quantization Support #21435

Adds int4 Quantization Support #21435

Uh oh!

JyotinderSingh commented Jun 29, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 29, 2025 •

edited

Loading

Uh oh!

fchollet left a comment

Uh oh!

JyotinderSingh commented Jul 4, 2025

Uh oh!

JyotinderSingh commented Jul 10, 2025 •

edited

Loading

Uh oh!

fchollet left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divyashreepathihalli commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Adds int4 Quantization Support #21435

Adds int4 Quantization Support #21435

Uh oh!

Conversation

JyotinderSingh commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

Testing

Benchmarking

Limitation

Further work

Uh oh!

codecov-commenter commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

JyotinderSingh commented Jul 4, 2025

Uh oh!

JyotinderSingh commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divyashreepathihalli commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

JyotinderSingh commented Jun 29, 2025 •

edited

Loading

codecov-commenter commented Jun 29, 2025 •

edited

Loading

JyotinderSingh commented Jul 10, 2025 •

edited

Loading