-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Not sure if bug/feature] Where does submitit error output go? #2664
Comments
Yeah, it's an annoying behavior with the submitit launcher. Here's the recipe I'm using myself: import traceback
@hydra.main(...)
def main(cfg: DictConfig):
try:
run(cfg)
except Exception:
traceback.print_exc(file=sys.stderr)
raise ==> you'll now see errors in the stderr log |
Yeah that's what I'm doing, but is this the only way to do this? Doesn't make much sense. |
Had this exact issue, and found this, which was really helpful. Any clue why this is not the default behavior? The current state of things makes some errors really confusing. |
I am adding a +1 to this. This is quite a strange issue.
I'd add that @Ubadub solution works! However, I recommend using @hydra.main(...)
def main(cfg: DictConfig):
import sys
import traceback
# This main is used to circumvent a bug in Hydra
# See https://github.com/facebookresearch/hydra/issues/2664
try:
actual_main(cfg)
except BaseException:
traceback.print_exc(file=sys.stderr)
raise
finally:
# fflush everything
sys.stdout.flush()
sys.stderr.flush() |
A different solution : #2863 |
Hi,
I am using the submitit plugin to run a
--multirun
sweep. Some of my jobs errored- based on on the logging messages my code is printing to the log file in the sweep subdir, I can tell they are exiting prematurely; however, at the bottom of the log file, submitit nevertheless claims that theJob completed successfully
.I'm opening this issue to ask where I can find the actual stacktrace, and also why the log message erroneously claims the job completed successfully.
I do see two files:
.submitit/[JOBNAME_JOBNUMBER]/[JOBNAME_JOBNUMER]_log.out
and.submitit/[JOBNAME_JOBNUMBER]/[JOBNAME_JOBNUMER]_log.err
. The former appears to be identical to the main log file that gets placed in the sweep subdir after the run completes. The latter, confusingly, does appear to contain some error messages, but only ones produced by third party libraries (in particular, I am using the HuggingFacetransformers
library, which prints certain warning messages that are appearing in the stderr file). I am used to seeing this output when running a job directly in the console, but it's unclear to me why that output is appearing there but not anything else that I would normally expect to see in stderr. My working hypothesis is that that the error file contains messages that are printed directly tosys.stderr
, but doesn't contain error messages that are the result of raised exceptions.When all the runs complete, I do see a small, relatively unhelpful error message that usually just consists of the error message only, without a stack trace or a line number. And even if many jobs fail, only one error message appears to be produced.
Is this customizable behavior? Is there some file or setting I'm missing? How can I recover a full, useable stack trace, like what I would have seen had I run the command without
--multirun
and without the submitit launcher?The text was updated successfully, but these errors were encountered: