Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Custom OmegaConf resolvers registered at module level aren't propagated to submitit launcher processes #861

Closed
2 tasks done
calebho opened this issue Aug 11, 2020 · 7 comments · Fixed by #899
Closed
2 tasks done
Labels
bug Something isn't working
Milestone

Comments

@calebho
Copy link
Contributor

calebho commented Aug 11, 2020

🐛 Bug

Description

Not sure whether this is expected behavior. If it is, it should be documented somewhere. If you register a custom OmegaConf resolver at module scope, they won't be propagated to the launcher processes. Thus when you try to access a config node which uses a custom resolver, you get an exception.

Checklist

  • I checked on the latest version of Hydra
  • I created a minimal repro

To reproduce

# foo.py
import hydra
from omegaconf import OmegaConf


def my_custom_resolver():
    return "foo"


OmegaConf.register_resolver("my_custom_resolver", my_custom_resolver)


@hydra.main(config_name="foo")
def main(cfg):
    assert cfg.x == "foo"


if __name__ == "__main__":
    main()
# foo.yaml
x: ${my_custom_resolver:}

Regular run works fine:

$ python foo.py

Multirun with submitit throws an exception:

$ python foo.py -m hydra/launcher=submitit_local
[2020-08-10 19:56:14,485][HYDRA] Submitit 'local' sweep output dir : multirun/2020-08-10/19-56-14
[2020-08-10 19:56:14,487][HYDRA] 	#0 :
Traceback (most recent call last):
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra/_internal/utils.py", line 344, in <lambda>
    overrides=args.overrides,
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 132, in multirun
    return sweeper.sweep(arguments=task_overrides)
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 135, in sweep
    results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra_plugins/hydra_submitit_launcher/submitit_launcher.py", line 149, in launch
    return [j.results()[0] for j in jobs]
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra_plugins/hydra_submitit_launcher/submitit_launcher.py", line 149, in <listcomp>
    return [j.results()[0] for j in jobs]
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/submitit/core/core.py", line 291, in results
    raise job_exception  # pylint: disable=raising-bad-type
submitit.core.utils.FailedJobError: Job (task=0) failed during processing with trace:
----------------------
Traceback (most recent call last):
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/submitit/core/submission.py", line 47, in process_job
    result = delayed.result()
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/submitit/core/utils.py", line 123, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra_plugins/hydra_submitit_launcher/submitit_launcher.py", line 80, in __call__
    job_subdir_key="hydra.sweep.subdir",
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/hydra/core/utils.py", line 123, in run_job
    ret.return_value = task_function(task_cfg)
  File "foo.py", line 14, in main
    assert cfg.x == "foo"
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 315, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
    type_override=type_override,
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/omegaconf/_utils.py", line 669, in format_and_raise
    _raise(ex, cause)
  File "/private/home/calebh/miniconda3/envs/bench-detectron2/lib/python3.7/site-packages/omegaconf/_utils.py", line 583, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.UnsupportedInterpolationType: Unsupported interpolation type my_custom_resolver
	full_key: x
	reference_type=Optional[Dict[Any, Any]]
	object_type=dict

----------------------
You can check full logs with 'job.stderr(0)' and 'job.stdout(0)'or at paths:
  - /private/home/calebh/scratch/hydra-bug/multirun/2020-08-10/19-56-14/.submitit/34113/34113_0_log.err
  - /private/home/calebh/scratch/hydra-bug/multirun/2020-08-10/19-56-14/.submitit/34113/34113_0_log.out

Moving OmegaConf.register_resolver inside main works:

--- foo.py	2020-08-10 19:51:10.401891000 -0700
+++ foo_fixed.py	2020-08-10 19:58:19.693452000 -0700
@@ -6,11 +6,9 @@
     return "foo"


-OmegaConf.register_resolver("my_custom_resolver", my_custom_resolver)
-
-
 @hydra.main(config_name="foo")
 def main(cfg):
+    OmegaConf.register_resolver("my_custom_resolver", my_custom_resolver)
     assert cfg.x == "foo"

Expected Behavior

System information

  • Hydra Version:
    pip list | grep hydra
    hydra-core              1.0.0rc2
    hydra-submitit-launcher 1.0.0rc4
    
  • Python version : 3.7.7
  • Virtual environment type and version : conda 4.8.3
  • Operating system : Ubuntu 18.04.3

Additional context

I think this is an issue with how submitit is serializing the Python interpreter state to the worker processes. Might require a non-trivial change in its design

@calebho calebho added the bug Something isn't working label Aug 11, 2020
@omry omry added this to the 1.0.0 milestone Aug 11, 2020
@omry
Copy link
Collaborator

omry commented Aug 11, 2020

Thanks for reporting, will look into it.

@calebho
Copy link
Contributor Author

calebho commented Aug 14, 2020

@samuelstanton You can move the register_resolver inside main as I mentioned in the bug report

@thoth291
Copy link
Contributor

@calebho, simply moving it inside the main will not work perfectly - it will break normal basic launcher with multirun.
Moreover, the same behavior is happening with joblib - so it's not specific to submitit - maybe renaming an issue would help.

I had similar issue few days ago but went around it by doing this in my main:

if OmegaConf.get_resolver("my_custom_resolver") is None:
        OmegaConf.register_resolver("my_custom_resolver", my_custom_resolver)

I forgot to report it - that would save some of your time - sorry.

@Queuecumber
Copy link
Contributor

Can we revisit this? I'm not using multirun or the submitit plugin because it's currently missing some functionality I want. Instead I call submitit myself using either the local executor or the slurm executor, and I get the following error:

omegaconf.errors.UnsupportedInterpolationType: Unsupported interpolation type hydra
        full_key: data.stats
        reference_type=Any
        object_type=dict

I think it's related to this

@jieru-hu
Copy link
Contributor

Can we revisit this? I'm not using multirun or the submitit plugin because it's currently missing some functionality I want. Instead I call submitit myself using either the local executor or the slurm executor, and I get the following error:

omegaconf.errors.UnsupportedInterpolationType: Unsupported interpolation type hydra
        full_key: data.stats
        reference_type=Any
        object_type=dict

I think it's related to this

Could you provide a minimal script to reproduce this? Thank you.

@Queuecumber
Copy link
Contributor

Here's a script:

import time
from pathlib import Path
from typing import Iterator, Tuple

import hydra
from omegaconf import DictConfig
from submitit import Job
from yaspin import yaspin


def single_job(cfg : DictConfig) -> None:
    print(cfg.data.stats)


def follow(job : Job) -> Iterator[str]:
    with open(job.paths.stdout) as fo:
        with open(job.paths.stderr) as fe:
            while True:
                lo = fo.readline()
                    
                if lo:
                    yield lo   

                le = fe.readline()
                if le:
                    yield le

                if job.state != 'RUNNING' and not (le or lo):
                    break
                else:
                    time.sleep(0.1)


@hydra.main(config_name="../configs/test.yaml")
def main(cfg : DictConfig) -> None:
    executor = hydra.utils.instantiate(cfg.runner.executor)
    executor.update_parameters(**cfg.runner.params)
    job = executor.submit(single_job, cfg)

    try:
        with yaspin(text="Waiting for jobs to start"):
            while job.state != "RUNNING":
                time.sleep(1)

        with yaspin(text=f"Attaching to master process logs: {job.paths.stdout.with_suffix('')}"):
            while not job.paths.stdout.exists():
                time.sleep(1)

        loglines = follow(job)
        for line in loglines:
            print(line, end='', flush=True)
    except KeyboardInterrupt:
        d = input('Quitting ... also cancel job? [Y/n]')

        if d.lower() == 'y' or d == '':
            job.cancel()
            print('Job canceled')
        else:
            print('Job not canceled')


if __name__ == "__main__":
    main()

and a config file

data:
  stats: ${hydra:runtime.cwd}/stats/cstats.pt

runner:
  executor:
    _target_: submitit.LocalExecutor
    folder: ${now:%a.%m-%d-%y.%H:%M:%S}
  params: {}

full exception is

(qgac) mehrlich@learnfair0273:~/compression-robust$ python scripts/test.py
submitit INFO (2020-08-21 11:44:13,782) - Starting with JobEnvironment(job_id=9394, hostname=learnfair0273, local_rank=0(1), node=0(1), global_rank=0(1))
submitit ERROR (2020-08-21 11:44:13,833) - Submitted job triggered an exception
submitit INFO (2020-08-21 11:44:13,782) - Loading pickle: /private/home/mehrlich/compression-robust/outputs/2020-08-21/11-44-13/Fri.08-21-20.11:44:13/9394_submitted.pkl
Traceback (most recent call last):
submitit ERROR (2020-08-21 11:44:13,833) - Submitted job triggered an exception
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/submitit/core/submission.py", line 65, in submitit_main
    process_job(args.folder)
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/submitit/core/submission.py", line 58, in process_job
    raise error
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/submitit/core/submission.py", line 47, in process_job
    result = delayed.result()
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/submitit/core/utils.py", line 123, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "scripts/test.py", line 12, in single_job
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 297, in __getattr__
    self._format_and_raise(key=key, value=None, cause=e)
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
    type_override=type_override,
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/omegaconf/_utils.py", line 675, in format_and_raise
    _raise(ex, cause)
  File "/private/home/mehrlich/.conda/envs/qgac/lib/python3.7/site-packages/omegaconf/_utils.py", line 591, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.UnsupportedInterpolationType: Unsupported interpolation type hydra
        full_key: data.stats
        reference_type=Any
        object_type=dict

I just installed from git master (pip install --upgrade git+https://github.com/facebookresearch/hydra.git)

@omry
Copy link
Collaborator

omry commented Aug 21, 2020

Please open a new task a minimal repro. strip anything that is not relavant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants