-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add long sequence strategies #8076
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8076 +/- ##
===========================================
- Coverage 56.56% 55.41% -1.16%
===========================================
Files 589 600 +11
Lines 89964 91642 +1678
===========================================
- Hits 50889 50782 -107
- Misses 39075 40860 +1785 ☔ View full report in Codecov by Sentry. |
llm/finetune_generation.py
Outdated
@@ -152,7 +152,7 @@ def main(): | |||
) | |||
if hasattr(model_config, "use_flash_attention"): | |||
model_config.use_flash_attention = model_args.use_flash_attention | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件不需要被修改吧
llm/llama/sft_argument.json
Outdated
"zero_padding": false, | ||
"use_flash_attention": false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个json也不需要被修改
|
||
class AttentionWithLinearBias(nn.Layer): | ||
""" | ||
init_args:bool_attention_mask,num_heads,dtype,tensor_parallel_degree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ self._get_interleave(2 * closest_power_of_2)[0::2][: n - closest_power_of_2] | ||
) | ||
|
||
def forward(self, bool_attention_mask: Tensor, num_heads: int, dtype: paddle.dtype, tensor_parallel_degree=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
传入tensor_parallel_degree的用处是?
def _get_interleave(self, n): | ||
def _get_interleave_power_of_2(n): | ||
start = 2 ** (-(2 ** -(math.log2(n) - 3))) | ||
ratio = start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个ratio和start相等?是否可以复用
""" | ||
try: | ||
import_class = importlib.import_module(f"paddlenlp.transformers.LongSequenceStrategies.{strategy_type}") | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是ModuleNotFoundError?
strategy_class = getattr(import_class, stratety_name) | ||
strategy_instance = strategy_class(**init_args) | ||
return strategy_instance | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是strategy_class,报的错误是AttributeError?
@@ -0,0 +1,49 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文件命名,目录命名,小写
PR types
PR changes
Models、APIs
Description
将长序列方案和模型解耦