Add support of tp_size != WORLD_SIZE to bloom by NirSonnenschein · Pull Request #428 · huggingface/optimum-habana

NirSonnenschein · 2023-09-27T07:32:26Z

Add suport for cases where tp_size != world size
we encountered a case during develoment of DS chat step 3 where the tp size for inference needs to be different than the world size. The code in modeling_bloom.py assumes that tp size is always equal to world size in inference and this change allows to configure it differently.

What does this PR do?

Add suport for cases where tp_size != world size
we encountered a case during develoment of DS chat step 3 where the tp size for inference needs to be different than the world size. The code in modeling_bloom.py assumes that tp size is always equal to world size in inference and this change allows to configure it differently.

HuggingFaceDocBuilderDev · 2023-09-27T07:53:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

regisss

Interesting use case, does it bring any speedup to use a TP size different from the number of devices?

I left a couple of comments because we should not manage this in adapt_transformers_to_gaudi.

Besides, I guess we would need to modify this line in the text-generation example right?

optimum-habana/examples/text-generation/run_generation.py

Line 309 in 1487597

ds_inference_kwargs["tensor_parallel"] = {"tp_size": world_size}

regisss · 2023-09-27T08:27:44Z

We should not manage it in this file but rather in modeling_bloom.py

Is this issue only in bloom ? I guess it will also be for LLaMA . Should we do generic fix or a pointed ?

I would rather have it defined in the modeling of all the models that are compatible with DeepSpeed-inference. It may be a bit redundant but that way we'll have better control over the scope of these changes.

regisss · 2023-09-27T08:30:53Z



 class GaudiBloomForCausalLM(BloomForCausalLM):
+    inference_tp_size = None


I would rather use a TP_SIZE env variable than adding a class attribute here

NirSonnenschein · 2023-09-27T20:05:09Z

Interesting use case, does it bring any speedup to use a TP size different from the number of devices?

I left a couple of comments because we should not manage this in adapt_transformers_to_gaudi.

Besides, I guess we would need to modify this line in the text-generation example right?

optimum-habana/examples/text-generation/run_generation.py

Line 309 in 1487597

ds_inference_kwargs["tensor_parallel"] = {"tp_size": world_size}

Thanks @regisss. To clarify: we are using optimum-habana to improve performance (which it does) but the issue here is a functional one and not performance related. when using the code as is we see tensor size mismatches which result in errors when running using more than one card. the DS chat code doesn't expect this generation inference to run in tensor parallel (the DS chat model combines both inference and training of different models as part of its run).

regarding the line from text generation, I'm not sure as we did not modify it as part of our attempts to fix our issue, but this general assumption may not be true in all cases.
ds_inference_kwargs["tensor_parallel"] = {"tp_size": world_size}

regarding the current design to set the value using the adapt_transformers_to_gaudi,
these are some of the considerations raised in our internal discussions:

we view the general assumption that is in the code which assumes tp_size = WORLD_SIZE is not always correct (particularly not in our case). however, this way of working is currently used by the existing working flows, and we didn't want to change this, just add support for our use case.
our current workaround for this issue does use env variables, this has some innate downsides and a big upside:
a. they are process global and there may be a flow where both the new scenario and the classic are both required which could
complicate things in future.
b. they can (and often are) set in an entirely different frame as the code reading them which can lead to weird issues if they
are changed mid run or in a place not expected (e.g. threads).
c. adding a global env variable like this sets up a new external configuration and needs to be very specific or it will be expected
to affect other parts of Habana optimum / other models which we had no intention of modifying (e.g. if a TP_SIZE env
variable was supported it might be expected to allow changing the TP size in other cases which is not our intention).

upside: no APIs are changed, and changes can be made in modeling_bloom only making it the most focused solution.

adapt_transformers_to_gaudi is used as it is the main setup API called and is a "synchronous" way to configure this option without altering the current flows (we want all flows other than the special case to run as normal). once stored it can be used by the affected flows. this allows for a slightly more synchronous approach which is still fairly contained to the area that needs the change.
we originally considered propagating this parameter normally through the function calls, but this would add it to some functions along the way to the alibi generation function where it may not make sense, and would require support for it in other uses cases which may complicate the code. hence the attempt to make the fix as focused as possible.

regisss · 2023-09-28T14:59:24Z

@NirSonnenschein Ah okay, I understand better the use case!

Thanks for the summary of the pros and cons for both solutions. I mostly agree with you, except that I don't think adapt_transformers_to_gaudi is the place to enable this. Would overriding the __init__ method of BloomForCausalLM in GaudiBloomForCausalLM be a good solution for you? Something like:

def __init__(self, config: BloomConfig, inference_tp_size: int=None):
    super().__init__(config)
    self.inference_tp_size = inference_tp_size

Then, you would just need to give the argument inference_tp_size=N to the from_pretrained method when initializing the model. WDYT?

Add suport for cases where tp_size != world size we encountered a case during develoment of DS chat step 3 where the tp size for inference needs to be different than the world size. The code in modeling_bloom.py assumes that tp size is always equal to world size in inference and this change allows to configure it differently.

NirSonnenschein · 2023-10-18T09:02:29Z

@regisss
sorry for the delay, updated branch with the following changes:

configuration is no longer done through adapt_transformers_to_gaudi and it is not modified
value set via GaudiBloomForCausalLM directly
enforced tp_size can be set to either world size or 1 to remove edge cases which may not be supported correctly

regisss

LGTM! I just spotted a typo in a method name, let's correct it and then it will be good to merge :)

NirSonnenschein · 2023-10-23T07:14:40Z

thanks @regisss,
hopefully this can now move forward.

regisss · 2023-10-23T14:26:15Z

@NirSonnenschein Can you also run the style formatter from the root of the repo as follows please?

pip install --upgrade black ruff
make style

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

NirSonnenschein · 2023-10-23T16:12:01Z

@regisss thanks, it wanted a space after the comma, fixed.

Co-authored-by: tianyuan211 <77777807+tianyuan211@users.noreply.github.com> Co-authored-by: Yuan Tian <tian.yuan@intel.com>

NirSonnenschein requested a review from a user September 27, 2023 07:32

NirSonnenschein requested a review from regisss as a code owner September 27, 2023 07:32

vivekgoe added the run-test Run CI for PRs from external contributors label Sep 27, 2023

regisss reviewed Sep 27, 2023

View reviewed changes

NirSonnenschein force-pushed the inference_tp_size_issue branch from 32324d0 to 404257e Compare October 18, 2023 08:13

NirSonnenschein force-pushed the inference_tp_size_issue branch from 404257e to f5690e0 Compare October 18, 2023 08:17

regisss reviewed Oct 20, 2023

View reviewed changes

Comment thread optimum/habana/transformers/models/bloom/modeling_bloom.py Outdated

ghost approved these changes Oct 23, 2023

View reviewed changes

Update optimum/habana/transformers/models/bloom/modeling_bloom.py

636f1a8

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

NirSonnenschein force-pushed the inference_tp_size_issue branch from 0f45d7f to 636f1a8 Compare October 23, 2023 16:10

regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Oct 23, 2023

regisss approved these changes Oct 23, 2023

View reviewed changes

regisss merged commit 64b1499 into huggingface:main Oct 23, 2023

vivekgoe pushed a commit to vivekgoe/optimum-habana that referenced this pull request Oct 25, 2023

Add support of tp_size != WORLD_SIZE to bloom (huggingface#428)

b08f6f9



		class GaudiBloomForCausalLM(BloomForCausalLM):
		inference_tp_size = None

Conversation

NirSonnenschein commented Sep 27, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 27, 2023

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

regisss Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

ghost Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

regisss Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

regisss Sep 27, 2023

Choose a reason for hiding this comment

Uh oh!

NirSonnenschein commented Sep 27, 2023

Uh oh!

regisss commented Sep 28, 2023

Uh oh!

NirSonnenschein commented Oct 18, 2023

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NirSonnenschein commented Oct 23, 2023

Uh oh!

regisss commented Oct 23, 2023

Uh oh!

NirSonnenschein commented Oct 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants