Skip to content

Conversation

@athitten
Copy link
Contributor

@athitten athitten commented Nov 20, 2025

Adds minor fixes required for deployment of nemotron.

  1. attention_backend from run_config.yaml in the MBridge ckpt can sometimes be read as str type, so adds code to in nemo_deploy/llm/inference/inference_base.py to convert it to AttnBackend enum as expected by MCore.
  2. Minor fix to check if tp_comm_overlap_cfg exists in the config and then check if its not none, as it can otherwise lead to AttributeError if tp_comm_overlap_cfg is not in the config.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Abhishree <[email protected]>
@athitten
Copy link
Contributor Author

/ok to test b3566b4

@athitten
Copy link
Contributor Author

/ok to test b3566b4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants