Add support of tp_size != WORLD_SIZE to bloom#428
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
regisss
left a comment
There was a problem hiding this comment.
Interesting use case, does it bring any speedup to use a TP size different from the number of devices?
I left a couple of comments because we should not manage this in adapt_transformers_to_gaudi.
Besides, I guess we would need to modify this line in the text-generation example right?
There was a problem hiding this comment.
We should not manage it in this file but rather in modeling_bloom.py
There was a problem hiding this comment.
Is this issue only in bloom ? I guess it will also be for LLaMA . Should we do generic fix or a pointed ?
There was a problem hiding this comment.
I would rather have it defined in the modeling of all the models that are compatible with DeepSpeed-inference. It may be a bit redundant but that way we'll have better control over the scope of these changes.
|
|
||
|
|
||
| class GaudiBloomForCausalLM(BloomForCausalLM): | ||
| inference_tp_size = None |
There was a problem hiding this comment.
I would rather use a TP_SIZE env variable than adding a class attribute here
Thanks @regisss. To clarify: we are using optimum-habana to improve performance (which it does) but the issue here is a functional one and not performance related. when using the code as is we see tensor size mismatches which result in errors when running using more than one card. the DS chat code doesn't expect this generation inference to run in tensor parallel (the DS chat model combines both inference and training of different models as part of its run). regarding the line from text generation, I'm not sure as we did not modify it as part of our attempts to fix our issue, but this general assumption may not be true in all cases. regarding the current design to set the value using the adapt_transformers_to_gaudi,
upside: no APIs are changed, and changes can be made in modeling_bloom only making it the most focused solution.
|
|
@NirSonnenschein Ah okay, I understand better the use case! Thanks for the summary of the pros and cons for both solutions. I mostly agree with you, except that I don't think def __init__(self, config: BloomConfig, inference_tp_size: int=None):
super().__init__(config)
self.inference_tp_size = inference_tp_sizeThen, you would just need to give the argument |
32324d0 to
404257e
Compare
Add suport for cases where tp_size != world size we encountered a case during develoment of DS chat step 3 where the tp size for inference needs to be different than the world size. The code in modeling_bloom.py assumes that tp size is always equal to world size in inference and this change allows to configure it differently.
404257e to
f5690e0
Compare
|
@regisss
|
regisss
left a comment
There was a problem hiding this comment.
LGTM! I just spotted a typo in a method name, let's correct it and then it will be good to merge :)
|
thanks @regisss, |
|
@NirSonnenschein Can you also run the style formatter from the root of the repo as follows please? |
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
0f45d7f to
636f1a8
Compare
|
@regisss thanks, it wanted a space after the comma, fixed. |
Co-authored-by: tianyuan211 <77777807+tianyuan211@users.noreply.github.com> Co-authored-by: Yuan Tian <tian.yuan@intel.com>
Add suport for cases where tp_size != world size
we encountered a case during develoment of DS chat step 3 where the tp size for inference needs to be different than the world size. The code in modeling_bloom.py assumes that tp size is always equal to world size in inference and this change allows to configure it differently.
What does this PR do?
Add suport for cases where tp_size != world size
we encountered a case during develoment of DS chat step 3 where the tp size for inference needs to be different than the world size. The code in modeling_bloom.py assumes that tp size is always equal to world size in inference and this change allows to configure it differently.