Skip to content

DeepSeek_v3 support#1735

Merged
regisss merged 21 commits into
huggingface:mainfrom
srajabos:DeepSeek_v3
Feb 21, 2025
Merged

DeepSeek_v3 support#1735
regisss merged 21 commits into
huggingface:mainfrom
srajabos:DeepSeek_v3

Conversation

@srajabos
Copy link
Copy Markdown
Contributor

What does this PR do?

DeepSeek v3 support on OH

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@srajabos srajabos marked this pull request as draft January 30, 2025 05:39
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 30, 2025

@srajabos FYI there is an open PR to add Deepseek V3 to Transformers: huggingface/transformers#35926

We won't be able to rely on the Transformers implementation before Transformers v4.49 is released, but I thought this might be interesting to you.

@srajabos
Copy link
Copy Markdown
Contributor Author

@regiss, I'll keep this as draft until verified with Transformers v4.49.
Thanks for the update.

@anishagartia
Copy link
Copy Markdown

anishagartia commented Feb 5, 2025

@srajabos FYI there is an open PR to add Deepseek V3 to Transformers: huggingface/transformers#35926

We won't be able to rely on the Transformers implementation before Transformers v4.49 is released, but I thought this might be interesting to you.

Deepseek V3 (and hence R1) requriements.txt says the minimum version of transformer required is 4.46.3
OH currently is uses 4.45.2 per requirements.txt
Could be a easier step to enable 4.46.3 on Gaudi than wait for 4.49. Then adding the model files to it could work.

@srajabos
Copy link
Copy Markdown
Contributor Author

srajabos commented Feb 5, 2025

@anishagartia, currently we are adding the model files and optimizing for Gaudi. Once we have performant data the plan is to get it in. Thanks for the link.

@srajabos srajabos marked this pull request as ready for review February 12, 2025 16:15
@srajabos
Copy link
Copy Markdown
Contributor Author

@yao-matrix @gyou2021 @IT-Forrest - kindly review the code.

Copy link
Copy Markdown
Contributor

@ssarkar2 ssarkar2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] are just comments to help follow the hpu code. no changes required for those comments. sorry for spamming comments in this category, thought it might be useful for future readers going thru the change and for others looking ot port similar models

[clarifications] some question from my end. Sometimes these are marked with [minor] if they are minor nitpicks

Comment thread tests/test_text_generation_example.py Outdated
from habana_frameworks.torch.hpex.kernels import FusedSDPA
except ImportError:
print("Not using HPU fused scaled dot-product attention kernel.")
FusedSDPA = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] Import hpu fused ops


def forward(self, hidden_states):
if hidden_states.device.type == "hpu" and FusedRMSNorm:
# mixed dtypes are not good for FusedRMSNorm, both inputs need to have same dtype
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] use fused ops

self.register_buffer("inv_freq", inv_freq, persistent=False)

# Build here to make `torch.jit.trace` work.
self.max_seq_len_cached = max_position_embeddings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] make it static (max_position_embeddings ) instead of updating depending on longest eq_len seen till now: "seq_len > self.max_seq_len_cached"


def apply_customized_rope(q, k, cos, sin, position_ids):
if q.device.type == "hpu" and FusedRoPE:
return FusedRoPE.apply(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] fused hpu op

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[clarification][minor] Could we call apply_customized_rope here?

def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
return tensor.view(bsz, seq_len, self.num_heads, self.v_head_dim).transpose(1, 2).contiguous()

def split_kv_b_proj(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[clarification] this is present only in deepseek attention (v2/v3). Can we add some comment about this?

self.q_absorb = kv_b_proj_weight[:, : self.qk_nope_head_dim, :].unsqueeze(0).transpose(0, 1)
self.out_absorb = kv_b_proj_weight[:, self.qk_nope_head_dim :, :].unsqueeze(0)

def compress_kv(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[clarification] this is present only in deepseek attention (v2/v3). Can we add some comment about this? In the original deepseek code this is not a function, any particular reason of functionify-ing this? just want to clarify if making this a function is a stylistic choice or there is some reason

key_states, value_states, self.layer_idx, cache_kwargs
)
# optimization
if use_flash_attention and FusedSDPA is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] hpu specific, similar to other modelling files in OH


past_key_values_length = 0
if past_key_values is not None:
past_key_values_length = past_key_values[0][0].shape[2]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[explanatory] hpu kv cache management, similar to other OH models

Copy link
Copy Markdown
Contributor

@ssarkar2 ssarkar2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Maximum number of experts supported by dynamic MoE op (mixture_of_experts)
SLICE_MAX_EXPERT = 80

# import hpu fused ops
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] stray comment

# Build here to make `torch.jit.trace` work.

# make it static (max_position_embeddings) instead of updating depending on
# longest eq_len seen till now: seq_len > self.max_seq_len_cached
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] eq_len -> seq_len

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ssarkar2 . Let me clean those up

Comment thread examples/text-generation/README.md
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I left a couple of comments

Comment thread optimum/habana/transformers/modeling_utils.py Outdated
Comment thread tests/test_text_generation_example.py Outdated
@libinta libinta added the run-test Run CI for PRs from external contributors label Feb 20, 2025
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit c5a715c into huggingface:main Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants