In-framework inference fixes #10698

janekl · 2024-10-01T16:45:42Z

What does this PR do ?

Fixing two things in r2.0.0 release testing, to be cherry-picked to the release branch: - repro below for an oldish Nemo checkpoint from FP8 SFT:

python /opt/NeMo/tests/export/nemo_export.py \
  --checkpoint_dir /opt/checkpoints/LLAMA2-7B-base/LLAMA2-7B-fp8-sft.nemo \
  --model_type llama \
  --min_tps 1 \
  --max_output_len 24 \
  --test_deployment False \
  --in_framework True \
  --model_name llama

Issues are:

Loading old checkpoints - solved with 86408cc

[rank0]:   File "/opt/megatron-lm/megatron/core/dist_checkpointing/dict_utils.py", line 191, in dict_list_map_inplace
[rank0]:     x[k] = dict_list_map_inplace(f, v)
[rank0]:   File "/opt/megatron-lm/megatron/core/dist_checkpointing/dict_utils.py", line 191, in dict_list_map_inplace
[rank0]:     x[k] = dict_list_map_inplace(f, v)
[rank0]:   File "/opt/megatron-lm/megatron/core/dist_checkpointing/dict_utils.py", line 195, in dict_list_map_inplace
[rank0]:     return f(x)
[rank0]:   File "/opt/megatron-lm/megatron/core/dist_checkpointing/strategies/common.py", line 118, in load_sharded_object
[rank0]:     raise CheckpointingException(err_msg) from e
[rank0]: megatron.core.dist_checkpointing.core.CheckpointingException: Object shard /tmp/tmpuuv6zoo9/model_weights/model.decoder.layers.self_attention.core_attention._extra_state/shard_0_32.pt not found

In-framework inference issues for FP8 - solved with 56d6e6f (need to disable)

[rank0]:     mixed_qkv, _ = self.linear_qkv(hidden_states)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/megatron-lm/megatron/core/extensions/transformer_engine.py", line 336, in forward
[rank0]:     out = super().forward(x, is_first_microbatch=_is_first_microbatch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 1216, in forward
[rank0]:     out = fwd_fn(*args)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 101, in forward
[rank0]:     assert_dim_for_fp8_exec(inputmat)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/utils.py", line 237, in assert_dim_for_fp8_exec
[rank0]:     and tensor.size(1) % 16 == 0
[rank0]: AssertionError: FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16, but got tensor with dims=[12, 4096]

Collection: NLP

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

janekl · 2024-10-01T16:57:19Z

nemo/deploy/nlp/megatronllm_deployable.py

@@ -142,6 +157,11 @@ def _load_from_nemo_checkpoint(self, nemo_checkpoint_filepath: str, num_devices:
            # had to override these to make Nemotron3-22B work, see sample_sequence_batch() in text_generation_utils.py
            custom_config.activations_checkpoint_granularity = None
            custom_config.activations_checkpoint_method = None
+            custom_config.dist_ckpt_load_strictness = StrictHandling.LOG_ALL.value


@thomasdhc @mikolajblaz @dimapihtar do you have any idea why the 1st error mentioned in MR description is visible in r2.0.0 branch but on the other hand main looks good with the same repro?

Discussed with @mikolajblaz offline, this is likely due to different TE versions used in two different containers tested

correct. If you don't want to import MCore you can set a string 'log_all' here

I think it's fine to import it as it's required here anyway. This is also more transparent to me besides on what's going on.

janekl · 2024-10-01T16:59:19Z

nemo/deploy/nlp/megatronllm_deployable.py

+            if custom_config.get("fp8", False):
+                # Need to disable FP8 for in-framework inference due to shape constraints imposed by TE,
+                # see https://github.com/NVIDIA/TransformerEngine/blob/v1.8/transformer_engine/pytorch/utils.py#L229
+                custom_config.fp8 = False


I think in-framework FP8 inference is not supported

Signed-off-by: Jan Lasek <[email protected]>

Signed-off-by: janekl <[email protected]>

Signed-off-by: Jan Lasek <[email protected]>

janekl changed the title ~~Jlasek/infer in framework bugfix in main~~ In-framework inference fixes Oct 1, 2024

janekl commented Oct 1, 2024

View reviewed changes

janekl added r2.0.0 Run CICD labels Oct 1, 2024

janekl requested a review from oyilmaz-nvidia October 1, 2024 17:27

janekl and others added 6 commits October 2, 2024 16:05

Fix loading legacy checkpoints

5d6ccf7

Signed-off-by: Jan Lasek <[email protected]>

Fix inference issues FP8-trained models

01e680d

Signed-off-by: Jan Lasek <[email protected]>

Apply isort and black reformatting

1938b38

Signed-off-by: janekl <[email protected]>

Comment on TE shape contraints during inference

87bcae6

Signed-off-by: Jan Lasek <[email protected]>

Simplify import error handling

8e1e1c8

Signed-off-by: Jan Lasek <[email protected]>

Comment on issues

b29975f

Signed-off-by: Jan Lasek <[email protected]>

janekl force-pushed the jlasek/infer_in_framework_bugfix_in_main branch from a369858 to b29975f Compare October 2, 2024 14:05

janekl added Run CICD and removed Run CICD labels Oct 2, 2024

oyilmaz-nvidia approved these changes Oct 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-framework inference fixes #10698

In-framework inference fixes #10698

janekl commented Oct 1, 2024

janekl Oct 1, 2024

janekl Oct 1, 2024

mikolajblaz Oct 2, 2024

janekl Oct 2, 2024

janekl Oct 1, 2024

In-framework inference fixes #10698

Are you sure you want to change the base?

In-framework inference fixes #10698

Conversation

janekl commented Oct 1, 2024

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

janekl Oct 1, 2024

Choose a reason for hiding this comment

janekl Oct 1, 2024

Choose a reason for hiding this comment

mikolajblaz Oct 2, 2024

Choose a reason for hiding this comment

janekl Oct 2, 2024

Choose a reason for hiding this comment

janekl Oct 1, 2024

Choose a reason for hiding this comment