feat: MLFlow Integration for experiment tracking#534
feat: MLFlow Integration for experiment tracking#534therealnaveenkamal wants to merge 12 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
SahilJain314
left a comment
There was a problem hiding this comment.
Thanks for contributing! @terrykong to comment on the feature broadly
terrykong
left a comment
There was a problem hiding this comment.
thanks @therealnaveenkamal for the contribution! appreciate your help making nemo-rl better!
left some comments below
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
terrykong
left a comment
There was a problem hiding this comment.
a couple other comments
|
Sure @terrykong , working on your comments. I'll ping you here once I'm ready. Thanks for your support. |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
|
@terrykong I think I've addressed all your comments. please review and let me know if further edits are required. |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
|
@terrykong / @SahilJain314 not sure why the workflows have failed. do I have to do anything from my side? the lint & the copyright checks passed when I run the workflows locally. |
|
@therealnaveenkamal I believe the copyright one is orthogonal to your change so you can ignore that (it doesn't gate our CI, but it should CC: @chtruong814) The linter one fails. can you install the precommit hooks and run it on your branch? |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
e064aec to
9064a45
Compare
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
|
@terrykong thank you. resolved the lint errors. |
|
thanks @therealnaveenkamal. The CI still says there's some unresolved errors: |
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
@terrykong sorry my bad, I didn't do a lint check after resolving merge conflicts. i think everything should be good now. thanks. |
Signed-off-by: Terry Kong <terryk@nvidia.com>
|
the linter is failing due to an error that usually indicates the |
update uv.lock to account for the new dependency
|
Thanks @terrykong, merged. |
|
Hi @therealnaveenkamal . I think the CI is having some issues with your PR (CC @chtruong814 ). I'll open a new PR, resolve the issues, and make sure to credit your contribution |
|
Thanks for the support @terrykong !! Looking forward to contributing more. |
|
Appreciate doing most of the heavy lifting @therealnaveenkamal ! I just pushed it over the finish. I'll close this since #697 has landed |


What does this PR do ?
Add MLflow integration for experiment tracking and model management in NeMo RL
Issues
Closes #514 - Support for MLflow
This PR addresses the community request for MLflow support alongside existing TensorBoard and Weights & Biases logging options.
Usage
Before your PR is "Ready for review"
Pre checks:
Additional Information
This PR adds comprehensive MLflow integration including:
sft_mlflow.yaml) for SFT training with MLflowThe integration follows the existing logger architecture and provides a seamless experience for users who want to use MLflow for experiment tracking alongside or instead of TensorBoard and Weights & Biases.