mlflow_mixin should nest mlflow runs. This prevents the Ray Tune + MLflow scenario from working on Azure ML. #19909

bstollnitz · 2021-10-30T03:54:27Z

Search before asking

I searched the issues and found no similar issues.

Ray Component

Ray Tune

What happened + What you expected to happen

I tried to use Ray Tune + MLflow + Azure ML by following the "MLflow Mixin API" approach detailed in these docs: https://docs.ray.io/en/latest/tune/tutorials/tune-mlflow.html#mlflow-mixin-api, and then running training on Azure. Typically Azure understands mlflow nested runs, and is able to show separate graphs for the metrics in each child run. However, if I add Ray Tune in the mix, the metrics readings from all tune trials get dumped in a single graph on a single run. This makes the Ray Tune + MLflow + Azure ML scenario unusable.

Setting nested=True when the mlflow run is started in the mlflow_mixin might fix this issue.

Versions / Dependencies

python==3.8.10
ray[tune]==1.6.0
mlflow==1.20.2
azureml-core==1.34.0
azureml-pipeline==1.34.0
azureml-mlflow==1.34.0
azureml-defaults

Reproduction script

Here's the minimal scenario that reproduces the issue: https://docs.ray.io/en/latest/tune/tutorials/tune-mlflow.html#mlflow-mixin-api
A simple verification would be to look at the mlruns output and make sure that the tune trial runs all have a parent ID. With this in place, the scenario should work on Azure ML.

In case you want to verify on Azure ML, here are instructions on how to submit a training job: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-cli
Please feel free to reach out if you'd like me to verify on Azure ML.

Anything else

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

stale · 2022-02-27T06:45:05Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

bstollnitz · 2022-03-07T23:18:59Z

@amogkam - It seems that you wrote the original code for this feature. Can you please take a look?

maggielkx · 2022-03-11T14:02:57Z

Hi @bstollnitz i am in a similar scenario as you are: MLFlow+ Raytune + Azure where the child runs cannot be automatically logged in Azure experiments. The current work-around i have is:

@mlflow_mixin
def train_func(...):
    run_id = mlflow.active_run().info.run_id
    with mlflow.start_run(nested=True):
        <your_own_child_run_code>
        mlflow.set_tag("mlflow.parentRunId", run_id)

Basically I checked the source code of tune.run() and their mlflow_mixin function, then printed out the mlflow class attributes at the beginning of my own train_func(), and i noticed although the nested run is set to True, ray cannot overwrite the mlflow used in Azure under the hood. Therefore I added it manually so that Azure recognizes each sub-run. Hope it helps!

WaterKnight1998 · 2022-07-12T15:26:35Z

@maggielkx This also worked for me, thank you very much :)

It would be good if tune.run(...).best_config returned the id of the mlflow run. I tried logging the run id in tune.report but didn't work :(

steveepreston · 2024-09-26T22:33:00Z

Any update on this?
Because @mlflow_mixin deprecated and seems setup_mlflow() doesn't support nested=True

bstollnitz added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 30, 2021

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 27, 2022

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 7, 2022

amogkam added tune Tune-related issues P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 18, 2022

SebastianBodza mentioned this issue Mar 2, 2023

[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlflow_mixin should nest mlflow runs. This prevents the Ray Tune + MLflow scenario from working on Azure ML. #19909

mlflow_mixin should nest mlflow runs. This prevents the Ray Tune + MLflow scenario from working on Azure ML. #19909

bstollnitz commented Oct 30, 2021

stale bot commented Feb 27, 2022

bstollnitz commented Mar 7, 2022

maggielkx commented Mar 11, 2022 •

edited

Loading

WaterKnight1998 commented Jul 12, 2022 •

edited

Loading

steveepreston commented Sep 26, 2024 •

edited

Loading

mlflow_mixin should nest mlflow runs. This prevents the Ray Tune + MLflow scenario from working on Azure ML. #19909

mlflow_mixin should nest mlflow runs. This prevents the Ray Tune + MLflow scenario from working on Azure ML. #19909

Comments

bstollnitz commented Oct 30, 2021

Search before asking

Ray Component

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Anything else

Are you willing to submit a PR?

stale bot commented Feb 27, 2022

bstollnitz commented Mar 7, 2022

maggielkx commented Mar 11, 2022 • edited Loading

WaterKnight1998 commented Jul 12, 2022 • edited Loading

steveepreston commented Sep 26, 2024 • edited Loading

maggielkx commented Mar 11, 2022 •

edited

Loading

WaterKnight1998 commented Jul 12, 2022 •

edited

Loading

steveepreston commented Sep 26, 2024 •

edited

Loading