-
Notifications
You must be signed in to change notification settings - Fork 568
[API] Update head_first parameter default to False
#324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -302,7 +302,7 @@ def chunk_fwd_h( | |
| h0: torch.Tensor, | ||
| output_final_state: bool, | ||
| offsets: Optional[torch.Tensor] = None, | ||
| head_first: bool = True, | ||
| head_first: bool = False, | ||
| chunk_size: int = 64, | ||
| split_size: Optional[int] = None, | ||
| states_in_fp32: bool = False | ||
|
|
@@ -364,7 +364,7 @@ def chunk_bwd_dh( | |
| dht: torch.Tensor, | ||
| scale: float, | ||
| offsets: Optional[torch.Tensor] = None, | ||
| head_first: bool = True, | ||
| head_first: bool = False, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Default parameter value changed from The default value of This is part of a larger change across the codebase standardizing on This change may impact existing code that relies on the default value. Ensure that all callers either explicitly set this parameter or are updated to work with the new tensor dimension ordering. |
||
| chunk_size: int = 64, | ||
| split_size: Optional[int] = None, | ||
| states_in_fp32: bool = False | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -301,7 +301,7 @@ def fused_recurrent_delta_rule_fwd( | |
| initial_state: torch.Tensor, | ||
| output_final_state: bool, | ||
| offsets: Optional[torch.LongTensor] = None, | ||
| head_first: bool = True | ||
| head_first: bool = False | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainDefault value changed from The default value of 🏁 Script executed: #!/bin/bash
# Identify potential usage of fused_recurrent_delta_rule_fwd without explicitly specifying head_first
rg "fused_recurrent_delta_rule_fwd\s*\(" --type py | grep -v "head_first\s*="Length of output: 251 Action: Address breaking change due to updated default for In the file
Please review and either update the call sites to explicitly pass the correct |
||
| ) -> Tuple[torch.Tensor, torch.Tensor]: | ||
| if head_first: | ||
| B, H, T, K, V = *k.shape, v.shape[-1] | ||
|
|
@@ -359,7 +359,7 @@ def fused_recurrent_delta_rule_bwd( | |
| scale: float, | ||
| initial_state: torch.Tensor, | ||
| offsets: Optional[torch.LongTensor] = None, | ||
| head_first: bool = True | ||
| head_first: bool = False | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainDefault value changed from The default value of 🏁 Script executed: #!/bin/bash
# Identify potential usage of fused_recurrent_delta_rule_bwd without explicitly specifying head_first
rg "fused_recurrent_delta_rule_bwd\s*\(" --type py | grep -v "head_first\s*="Length of output: 253 Action Required: Explicitly Specify The verification confirms that the call sites for
|
||
| ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: | ||
| if head_first: | ||
| B, H, T, K, V = *k.shape, v.shape[-1] | ||
|
|
@@ -438,7 +438,7 @@ def forward( | |
| initial_state: torch.Tensor, | ||
| output_final_state: bool, | ||
| offsets: Optional[torch.LongTensor] = None, | ||
| head_first: bool = True, | ||
| head_first: bool = False, | ||
| use_qk_l2norm_in_kernel: bool = False | ||
| ): | ||
| q_orig = q | ||
|
|
@@ -501,7 +501,7 @@ def fused_recurrent_delta_rule( | |
| initial_state: torch.Tensor = None, | ||
| output_final_state: bool = False, | ||
| cu_seqlens: Optional[torch.LongTensor] = None, | ||
| head_first: bool = True, | ||
| head_first: bool = False, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainDefault value changed from The default value of 🏁 Script executed: #!/bin/bash
# Identify potential usage of fused_recurrent_delta_rule without explicitly specifying head_first
rg "fused_recurrent_delta_rule\s*\(" --type py | grep -v "head_first\s*="Length of output: 830 Critical: Adjust Default Value for The default for the Action Required:
|
||
| use_qk_l2norm_in_kernel: bool = False | ||
| ) -> Tuple[torch.Tensor, torch.Tensor]: | ||
| r""" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -327,7 +327,7 @@ def fwd_prepare_wy_repr( | |
| beta: torch.Tensor, | ||
| offsets: Optional[torch.LongTensor], | ||
| indices: Optional[torch.LongTensor], | ||
| head_first: bool = True, | ||
| head_first: bool = False, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Default parameter value changed from The default value of This is part of a larger change across the codebase standardizing on This change may impact existing code that relies on the default value. Ensure that all callers either explicitly set this parameter or are updated to work with the new tensor dimension ordering. |
||
| chunk_size: int = 64 | ||
| ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: | ||
| if head_first: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default parameter value changed from
TruetoFalseThe default value of
head_firstparameter has been changed fromTruetoFalse. This affects how tensor dimensions are interpreted in thechunk_fwd_hfunction. Whenhead_firstisFalse, the function expects input tensors to have shape[B, T, H, K]rather than[B, H, T, K].This is part of a larger change across the codebase standardizing on
head_first=Falseas the default format.This change may impact existing code that relies on the default value. Ensure that all callers either explicitly set this parameter or are updated to work with the new tensor dimension ordering.