only enable query key scaling during fp16 #7946

gshennvm · 2023-11-27T22:11:17Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Enable query key scaling only during fp16, since this is when TE enables query key scaling. https://github.com/NVIDIA/TransformerEngine/blob/666539f36275fa9c0fbc99f9ea50f2d6e29e336f/transformer_engine/pytorch/attention.py#L940

Signed-off-by: Gerald Shen <[email protected]>

ericharper · 2023-11-28T00:28:03Z

jenkins

Signed-off-by: Gerald Shen <[email protected]>

ericharper · 2023-11-29T07:22:14Z

jenkins

ericharper · 2023-11-29T07:25:05Z

nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

@@ -1544,6 +1544,11 @@ def build_transformer_config(self) -> TransformerConfig:

        attention_softmax_in_fp32 = False  # not currently used in NeMo unless apply_query_key_layer_scaling is True
        apply_query_key_layer_scaling = self.cfg.get('apply_query_key_layer_scaling', False)
+
+        if apply_query_key_layer_scaling and not model_parallel_config.fp16:
+            logging.warning("apply_query_key_layer_scaling is only enabled when using FP16, setting it to False")


I don't think model_parallel_config.fp16 is the right check though. That arg is for fp16 + megatron_amp_O2.

Maybe we should just check trainer.precision?

can I check against self.torch_dtype?

NeMo/nemo/collections/nlp/models/language_modeling/megatron_base_model.py

Line 112 in 8b27f3a

self.torch_dtype = utils_funcs.torch_dtype_from_precision(self.cfg.precision) # Mixed precision datatype

fixed 25b7349

…layer_scaling

Signed-off-by: Gerald Shen <[email protected]>

gshennvm · 2023-11-30T07:30:10Z

jenkins

Signed-off-by: Gerald Shen <[email protected]>

gshennvm · 2023-11-30T19:46:06Z

jenkins

Signed-off-by: Gerald Shen <[email protected]>

gshennvm · 2023-12-01T02:30:47Z

jenkins

ericharper · 2023-12-01T17:07:16Z

nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

+            if fp16_enabled:
+                os.environ["NVTE_APPLY_QK_LAYER_SCALING"] = "1"
+            else:
+                logging.warning("apply_query_key_layer_scaling is only enabled when using FP16, setting it to False")


Should we set the env var here as well to "0" ?

good point, otherwise it will error 366cf61

Signed-off-by: Gerald Shen <[email protected]>

gshennvm · 2023-12-01T17:33:13Z

jenkins

ericharper

LGTM. Thanks!

* only enable query key scaling during fp16 Signed-off-by: Gerald Shen <[email protected]> * add warning Signed-off-by: Gerald Shen <[email protected]> * fixup! only enable query key scaling during fp16 Signed-off-by: Gerald Shen <[email protected]> * remove var from jenkens file Signed-off-by: Gerald Shen <[email protected]> * fix test by setting TE var Signed-off-by: Gerald Shen <[email protected]> * set to 0 if disabled Signed-off-by: Gerald Shen <[email protected]> --------- Signed-off-by: Gerald Shen <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]>

* only enable query key scaling during fp16 Signed-off-by: Gerald Shen <[email protected]> * add warning Signed-off-by: Gerald Shen <[email protected]> * fixup! only enable query key scaling during fp16 Signed-off-by: Gerald Shen <[email protected]> * remove var from jenkens file Signed-off-by: Gerald Shen <[email protected]> * fix test by setting TE var Signed-off-by: Gerald Shen <[email protected]> * set to 0 if disabled Signed-off-by: Gerald Shen <[email protected]> --------- Signed-off-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

* only enable query key scaling during fp16 Signed-off-by: Gerald Shen <[email protected]> * add warning Signed-off-by: Gerald Shen <[email protected]> * fixup! only enable query key scaling during fp16 Signed-off-by: Gerald Shen <[email protected]> * remove var from jenkens file Signed-off-by: Gerald Shen <[email protected]> * fix test by setting TE var Signed-off-by: Gerald Shen <[email protected]> * set to 0 if disabled Signed-off-by: Gerald Shen <[email protected]> --------- Signed-off-by: Gerald Shen <[email protected]>

only enable query key scaling during fp16

267c53a

Signed-off-by: Gerald Shen <[email protected]>

gshennvm requested a review from ericharper November 27, 2023 22:11

github-actions bot added the NLP label Nov 27, 2023

add warning

4d5edf4

Signed-off-by: Gerald Shen <[email protected]>

ericharper reviewed Nov 29, 2023

View reviewed changes

gshennvm added 2 commits November 29, 2023 23:23

Merge remote-tracking branch 'origin/main' into geshen/fix_query_key_…

0b8ef61

…layer_scaling

fixup! only enable query key scaling during fp16

25b7349

Signed-off-by: Gerald Shen <[email protected]>

remove var from jenkens file

0dff43c

Signed-off-by: Gerald Shen <[email protected]>

github-actions bot added the CI label Nov 30, 2023

fix test by setting TE var

fb1544b

Signed-off-by: Gerald Shen <[email protected]>

ericharper reviewed Dec 1, 2023

View reviewed changes

set to 0 if disabled

366cf61

Signed-off-by: Gerald Shen <[email protected]>

ericharper approved these changes Dec 1, 2023

View reviewed changes

gshennvm merged commit a7f0bc1 into main Dec 1, 2023
15 checks passed

gshennvm deleted the geshen/fix_query_key_layer_scaling branch December 1, 2023 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only enable query key scaling during fp16 #7946

only enable query key scaling during fp16 #7946

gshennvm commented Nov 27, 2023

ericharper commented Nov 28, 2023

ericharper commented Nov 29, 2023

ericharper Nov 29, 2023

gshennvm Nov 29, 2023

gshennvm Nov 30, 2023

gshennvm commented Nov 30, 2023

gshennvm commented Nov 30, 2023

gshennvm commented Dec 1, 2023

ericharper Dec 1, 2023

gshennvm Dec 1, 2023

gshennvm commented Dec 1, 2023

ericharper left a comment

only enable query key scaling during fp16 #7946

only enable query key scaling during fp16 #7946

Conversation

gshennvm commented Nov 27, 2023

What does this PR do ?

ericharper commented Nov 28, 2023

ericharper commented Nov 29, 2023

ericharper Nov 29, 2023

Choose a reason for hiding this comment

gshennvm Nov 29, 2023

Choose a reason for hiding this comment

gshennvm Nov 30, 2023

Choose a reason for hiding this comment

gshennvm commented Nov 30, 2023

gshennvm commented Nov 30, 2023

gshennvm commented Dec 1, 2023

ericharper Dec 1, 2023

Choose a reason for hiding this comment

gshennvm Dec 1, 2023

Choose a reason for hiding this comment

gshennvm commented Dec 1, 2023

ericharper left a comment

Choose a reason for hiding this comment