Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] (RFC) Exposing a first-class tracker/logger #4423

Closed
richardliaw opened this issue Mar 20, 2019 · 9 comments
Closed

[tune] (RFC) Exposing a first-class tracker/logger #4423

richardliaw opened this issue Mar 20, 2019 · 9 comments
Assignees

Comments

@richardliaw
Copy link
Contributor

richardliaw commented Mar 20, 2019

General Motivation

TL;DR: Minimally invasive utility for logging experiment results that integrates seamlessly with Tune.

One common barrier to adoption is that users already have a developed workflow by the time they need to use a hyperparameter tuning framework.

As @gehring mentions in #4414 ,

I believe there would be a strong interest in being able to use tune without having to fit experiments within ray/tune, allowing quick transitions from running something by hand (without ray) to running large scale hyper-parameter optimization.

The solution to this is to expose a tracking/logging mechanism as a first class. This mechanism will do the following:

  1. write metrics like track.log_metrics(**metrics).
  2. write/load artifacts.
  3. Enable automatic syncing to some URI.

Note: Integration with Ray:

This will replace the reporter object in the Tune Function API. The problem here is the case where users throw this logger all of their codebase; then it doesn't correspond nicely with the notion of a tune iteration.

Broad Requirements

  • Minimally Invasive: There should be no logic rewriting in order to use this tool - it should be bolt-on.
  • Consistency: No logic rewriting needed to run a Tune experiment. Switching from one framework to another (ie, single node SVM to Spark, single process TF to Tune, etc) should not require me to rewrite logging. If I decide to add a new metric during a new experiment run, it shouldn’t require me to restructure my directory or restart the entire project.

@gehring puts it nicely:

[Ideally], the tracking API supports calls from outside of tune and a ray cluster. This would be the most powerful and flexible solution but would require careful design of the tracking API.

Implementation Notes

  • Currently, there is a WIP PR for this ([tune] Initial track integration #4362). We are following the implementation of Track, which uses a singleton implementation to achieve a cleaner API. It would be good to get some feedback on this.

  • The file directory will look like the following (matching Tune's directory setup):

Project/
  Experiment1/
    Trial1/
    TrialN/ 
  Experiment2/
    ...
  Experiment3/
    ...

Workflow

This is the ideal/proposed workflow. Notice that between the local and Ray version, nothing in the function changes. This is done by something introducing a special wrapper like this (in spirit).

Local Version

This should log a metric to disk, in the same format as Tune.

from ray.tune import track


def hello_world(config):
    print("hello world")
    track.metric(result="hello")

track.init()
hello_world({})

and

Ray Execution

import ray
from ray.tune import track

ray.init()

def hello_world(config):
    print("hello world")
    track.metric(result="hello")

from ray import tune
tune.run(hello_world)

Does this actually belong in Ray?

Unclear, but for now due to its tight integration with Tune, it makes sense to sit here.

cc @noahgolmant @gehring @ericl @vlad17

@ericl
Copy link
Contributor

ericl commented Mar 20, 2019

The high level motivation here makes a lot of sense to me.

One question is about: track.init(), should this have a good default instead and auto-init? I think one big design flaw with python logging is that it requires you to call configure somewhere -- which may be unclear if you have a large codebase with several modules.

@gehring
Copy link
Contributor

gehring commented Mar 20, 2019

@richardliaw I won't have a lot of time to dive into the current tracker implementation until the end of the week so maybe this is already clearly addressed somewhere. How is checkpointing going to be handled? I ask because repeated results can be introduced if not very careful with how trials are restored and can be difficult to resolve.

An easy solution, and one I've seen implemented before, is to have a stateful tracker instance which is saved/restored at the same time as the rest of the experiment's state, ensuring that the tracker and the experiment are always in sync. In this case, the tracker can give an increasing index to every write. This allows the tracker (if buffering results) and/or the listener to discard results if results were saved but invalid due to unexpected termination by backtracking to the restored index.

If the goal is to be minimally intrusive, a relatively safe way ensuring the user will save/restore the experiment and the tracker together would be to give the a tracker instance to the user which could be pickled like any other python object and to very clearly document it.

I would expect it much more likely for someone to forget or not expect it necessary to checkpoint the tracker if it was a module level call like from ray.tune import track; track.save(...). What do you think?

@richardliaw
Copy link
Contributor Author

richardliaw commented Mar 21, 2019

Thanks for the comments!

One question is about: track.init(), should this have a good default instead and auto-init? I think one big design flaw with python logging is that it requires you to call configure somewhere -- which may be unclear if you have a large codebase with several modules.

Assuming a singleton design, we can default and auto-init to log ~/ray_results. In my mind, users can use init() if needed (i.e., specify S3 sync up/down, specify other logging hooks, etc). I guess in the general case, calling configure is not needed.

How is checkpointing going to be handled?

For this, I'm assuming you're referring to cases where you're on (say) a single machine, your experiment somehow fails, or you're using this as the new Function API, and you need to resume from a checkpoint. One option is to force users to rely on track to checkpoint their current workflow. Note that from the Function API perspective, this is a new feature.

i.e., something like

def main(args):
  if track.can_load():
	  model = track.load()["model"]
  else:
      model = Model()
      
  for i in range(track.epochs):
    avg_train_loss = train(....)
    avg_test_loss = test(...)
    # this should also flush the train_loss stored by track.metric
    track.checkpoint(epoch=i, avg_train_loss=avg_train_loss,
                     avg_test_loss=avg_test_loss, model)

Let me know your thoughts!

@vlad17
Copy link

vlad17 commented Mar 21, 2019

Just weighing in over here from my arm chair.

I think the track goals are well-set and address my concerns with other "tracker" libraries (except track :) ).

The module-level track.report (checkpoint is a bit overloaded w/ actor checkpointing) is indeed a nice process-global replacement for the reporter, as an API replacement. I remember aggregating reports was annoying if you're distributing training too. It'd be nice for cluster-wide report()s to be available from ray, though I haven't thought about this a lot and the consumer API here seems nontrivial.

One issue I see coming up is dealing with model saving in a distributed environment. What does it mean to save models from different worker machines w/ the same path? Report the same metrics on them? Do you want to handle this case? Will this all be backed by best-effort writes to S3? That might result in conflicts, or at the very least non-obvious semantics.

Finally, one observation is that this already solved by rllib's save/restore, which (I think) just uses actor checkpointing and has those semantics in a distributed setting. As a user, should I be dumping model files + experiment state through track.save() or TorchModel/TFModel? The latter seem more ergonomic, or at least there needs to be an answer here for which api should use.

Artifacts are still useful for saving images, but honestly that seems like they should just be checkpoint()ed like all the other diagnostics, maybe as a separate argument that accepts lists of local file paths. The reason track needed model file artifacts was bc TorchModel didn't exist

@gehring
Copy link
Contributor

gehring commented Mar 22, 2019

For this, I'm assuming you're referring to cases where you're on (say) a single machine, your experiment somehow fails, or you're using this as the new Function API, and you need to resume from a checkpoint.

Essentially, yes, that was what I was thinking about, but in the context of preemptible VMs.

One option is to force users to rely on track to checkpoint their current workflow.

At first glance, I think that seems pretty reasonable!

Thinking out loud: what if you have no guarantees that the trials are restored on the same machine? Should tune be responsible with transferring the checkpoint files, and, if so, by what API and (potentially) backend specific mechanism?

Assuming a singleton design, we can default and auto-init to log ~/ray_results. In my mind, users can use init() if needed (i.e., specify S3 sync up/down, specify other logging hooks, etc). I guess in the general case, calling configure is not needed.

How about extending the trial executor API to be responsible of setting the configuration through some general mechanism, e.g., a config file/string of some sort? Related to my previous point, this could happen at the same time that the checkpoint state is setup since it would likely be backend dependent.

With implementing a new trial executor (e.g., kubernetes trial executor) in mind, it would be nice if users didn't need to know the details of how trials are launched/restored, allowing users to use exactly the same code regardless of the backend (i.e., no special init required if relying on a trial executor).

Maybe extending the trial executor is not the preferred solution, but I do think some new or old dev API should be provided to enable automatic configuration of track even when running outside a ray cluster.

@gehring
Copy link
Contributor

gehring commented Mar 22, 2019

For this, I'm assuming you're referring to cases where you're on (say) a single machine, your experiment somehow fails, or you're using this as the new Function API, and you need to resume from a checkpoint.

I forgot to say that I think the question is important even outside of the context of the Function API. For track to be an attractive solution, it is important to provide some reliability mechanism since saving/restoring is common in large scale experiments.

I'm not arguing that it should be responsible for it (though it might not be a bad idea, as you proposed) but it should be able to offer some consistency guarantees. I would expect users to be reluctant to use track if there is a chance that invalid/dirty results are logged. Fixing the results after the fact is quite difficult to do reliably, if not impossible.

@richardliaw
Copy link
Contributor Author

richardliaw commented Mar 25, 2019

OK, thanks all for feedback.

One issue I see coming up is dealing with model saving in a distributed environment.

Presumably, we're talking about individual trials; not something like a single trial that is running data-parallel. Each trial should be saving to its own Trial directory, and artifacts will be saved within there.

As a user, should I be dumping model files + experiment state through track.save() or TorchModel/TFModel? The latter seem more ergonomic, or at least there needs to be an answer here for which api should use.

​​The idea is to eventually have an API like something like track.save(object, artifact_name, save_fn=pickle.dump)​. This should allow Track to be aware of objects.

Thinking out loud: what if you have no guarantees that the trials are restored on the same machine? Should tune be responsible with transferring the checkpoint files, and, if so, by what API and (potentially) backend specific mechanism?

I'm assuming this is using Track with Tune in the cluster setting. Tune is responsible for transferring the checkpoint files. Checkpoint files should be go within the logdir of the trial, and Tune already uses rsync to transfer files in this setting.

Assuming a singleton design, we can default and auto-init to log ~/ray_results. In my mind, users can use init() if needed (i.e., specify S3 sync up/down, specify other logging hooks, etc). I guess in the general case, calling configure is not needed.

How about extending the trial executor API to be responsible of setting the configuration through some general mechanism, e.g., a config file/string of some sort?

Hm, to clarify this response is for the setting of "using Track individually outside the context of using Tune". If using Tune, then configurations can be set through tune.run (or the trial_executor).

I would expect users to be reluctant to use track if there is a chance that invalid/dirty results are logged. Fixing the results after the fact is quite difficult to do reliably, if not impossible.

Yeah, that's true. Stop and resume for track is currently not specified in this above discussion, but I think we can get a first cut in, and we can think about the APIs to expose for that... I suspect something like track.save, which will save artifacts and checkpoints, and track.load, which will load artifacts and checkpoints, and track.restore which will reset the state of the experiment logs to some specified state, should be expressive enough.

@gehring
Copy link
Contributor

gehring commented Mar 31, 2019

Hm, to clarify this response is for the setting of "using Track individually outside the context of using Tune". If using Tune, then configurations can be set through tune.run (or the trial_executor).

Thanks for the clarification! I think making it the trial executor responsibility to be the most flexible since a kubernetes executor would require a very different setup and forcing tune.run to accommodate future backends might add unnecessary complexity.

Yeah, that's true. Stop and resume for track is currently not specified in this above discussion, but I think we can get a first cut in, and we can think about the APIs to expose for that...

I think that is reasonable. What you have here is already quite good! Maybe checkpointing tools should be its own standalone API. I might be worth discussing once I get around to writing a kubernetes trial executor (most likely after I finish my first pass at the TF 2.0 model stuff).

@richardliaw richardliaw self-assigned this Apr 2, 2019
@richardliaw
Copy link
Contributor Author

Closed in #4362.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants