-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mlflow_mixin should nest mlflow runs. This prevents the Ray Tune + MLflow scenario from working on Azure ML. #19909
Comments
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
@amogkam - It seems that you wrote the original code for this feature. Can you please take a look? |
Hi @bstollnitz i am in a similar scenario as you are: MLFlow+ Raytune + Azure where the child runs cannot be automatically logged in Azure experiments. The current work-around i have is: @mlflow_mixin
def train_func(...):
run_id = mlflow.active_run().info.run_id
with mlflow.start_run(nested=True):
<your_own_child_run_code>
mlflow.set_tag("mlflow.parentRunId", run_id) Basically I checked the source code of |
@maggielkx This also worked for me, thank you very much :) It would be good if |
Any update on this? |
Search before asking
Ray Component
Ray Tune
What happened + What you expected to happen
I tried to use Ray Tune + MLflow + Azure ML by following the "MLflow Mixin API" approach detailed in these docs: https://docs.ray.io/en/latest/tune/tutorials/tune-mlflow.html#mlflow-mixin-api, and then running training on Azure. Typically Azure understands mlflow nested runs, and is able to show separate graphs for the metrics in each child run. However, if I add Ray Tune in the mix, the metrics readings from all tune trials get dumped in a single graph on a single run. This makes the Ray Tune + MLflow + Azure ML scenario unusable.
Setting nested=True when the mlflow run is started in the mlflow_mixin might fix this issue.
Versions / Dependencies
python==3.8.10
ray[tune]==1.6.0
mlflow==1.20.2
azureml-core==1.34.0
azureml-pipeline==1.34.0
azureml-mlflow==1.34.0
azureml-defaults
Reproduction script
Here's the minimal scenario that reproduces the issue: https://docs.ray.io/en/latest/tune/tutorials/tune-mlflow.html#mlflow-mixin-api
A simple verification would be to look at the mlruns output and make sure that the tune trial runs all have a parent ID. With this in place, the scenario should work on Azure ML.
In case you want to verify on Azure ML, here are instructions on how to submit a training job: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-cli
Please feel free to reach out if you'd like me to verify on Azure ML.
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: