Skip to content

Conversation

@dbaranchuk
Copy link
Contributor

No description provided.

Comment on lines +364 to +369
assert state.CBt is None, "CBt should not be stored in state"
CB = state.CB.half()
SCB = state.SCB.unsqueeze(1).half()
SCBt = state.SCBt.unsqueeze(1).half()
Bt = (CB * SCB).t().contiguous()
CBt = (Bt / SCBt).t().to(torch.int8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[opinion] this block of code looks like the most memory-intensive; perhaps we can optimize some of it with in-place divisions / transpositions (please ping me if you're interested)

@TimDettmers
Copy link
Collaborator

This version of backward had too high of an error to be useful. As such, we will not merge it into main. See #33 for the new memory efficient tbackwards that uses fp16 computation and which seems to work fine.

Titus-von-Koeller pushed a commit that referenced this pull request May 24, 2024
Support extract_outliers, quantize_4bit and dequantize_4bit with Device Abstraction PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants