Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

faran928 · 2025-01-17T01:46:53Z

Summary:
Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 .

This diff supports:

The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types
Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16

The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype.

This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host.

Differential Revision: D68187234

facebook-github-bot · 2025-01-17T01:47:02Z

This pull request was exported from Phabricator. Differential Revision: D68187234

netlify · 2025-01-17T01:48:37Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`cd85b52`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/678a987b0976080008d5a7c4
😎 Deploy Preview	https://deploy-preview-3584--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-17T01:50:32Z

This pull request was exported from Phabricator. Differential Revision: D68187234

Summary: Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-17T17:50:57Z

This pull request was exported from Phabricator. Differential Revision: D68187234

facebook-github-bot added the cla signed label Jan 17, 2025

facebook-github-bot added the fb-exported label Jan 17, 2025

faran928 force-pushed the export-D68187234 branch from 004b36b to 867b7f7 Compare January 17, 2025 01:50

faran928 force-pushed the export-D68187234 branch from 867b7f7 to cd85b52 Compare January 17, 2025 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

faran928 commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

netlify bot commented Jan 17, 2025 •

edited

Loading

facebook-github-bot commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

Are you sure you want to change the base?

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

Conversation

faran928 commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

netlify bot commented Jan 17, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

netlify bot commented Jan 17, 2025 •

edited

Loading