Skip to content

Conversation

@justheuristic
Copy link
Collaborator

@justheuristic justheuristic commented Dec 12, 2022

huggingface-hub==0.11.1
transformers==4.25.1
protobuf>=3.20.3,<4.0dev
hivemind==1.1.3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also gonna bump it, but it's a separate PR

@@ -0,0 +1,74 @@
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not new. It was renamed from model.py, but git does not recognize the diff

Copy link
Collaborator

@borzunov borzunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've found some bugs, pending their resolution.


for i in range(0, num_embeddings, self.chunk_size):
chunk = word_embeddings[i : i + self.chunk_size].float()
output[..., i : i + self.chunk_size] = F.linear(hidden_states, chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is worth doing, but maybe you can do torch.matmul(hidden_states, chunk, out=output[..., i : i + self.chunk_size]) to avoid allocating memory for the intermediate result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to do the same thing, but to no avail
On GPU, it appears that F.linear has a better support for some optimizations like TF32 (enabled by default)
On CPU, this has no effect.

Comment on lines +72 to +73
key_past = key_cache.flatten(0, 1)[:, :, :prefix_length] # [batch * num_heads, head_dim, kv_length]
value_past = value_cache.flatten(0, 1)[:, :prefix_length, :] # [batch * num_heads, kv_length, head_dim]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just directly reshape the past tensors to these shapes like you've done in src/petals/server/handler.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we cannot

  • hypo_ids need shape [2, batch_size, ...]
  • training needs key [batch_size * heads, ..., length] and value [..., length, :], making them non-concat-able
  • handler needs them to be concat-able in a single tensor

@justheuristic justheuristic merged commit b04982c into main Dec 13, 2022
@justheuristic justheuristic deleted the bump branch December 13, 2022 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants