-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch Lightning Integration #413
Comments
Hi Jonas, Resuming experiment is not supported in the current version of Neptune - there is a new major version of Neptune coming out soon, which will support resuming experiments - please keep an eye on new Neptune versions. In the meantime, the error you're seeing appears to be caused by an attempt to create a channel in an experiment where a channel of that name already exists. This is probably due to the way you modified Neptune. BTW - was there any missing functionality in Neptune which made you tweak your own version? |
Hello, Best Jonas |
Hi Jonas, Sorry to hear you needed to tweak Neptune-PTL integration to use a proxy - we've added fixing this to our backlog. |
Having the ability to resume experiments is very much needed:) it is often the case where a machine dies in the middle of a run, and a clean API to restart from the failing point is helpful. Thanks! |
Hi Jonas, just to clarify: resuming experiments is possible in the current version of neptune - you can log metrics to an old experiment etc. What the upcoming version will add on top of that is to also update the state of the experiment when it is resumed |
@PiotrJander What's the best practice to do so using Pytorch Lightning? Thanks! |
Hi @tdvginz It is possible to resume Pytorch Lightning experiments by passing the from pytorch_lightning.loggers.neptune import NeptuneLogger
neptune_logger = NeptuneLogger(
api_key="...",
project_name="shared/pytorch-lightning-integration",
experiment_id='PYTOR-163701') |
Thanks a lot for your kind response. Best Jonas |
@PiotrJander hi, I'm currently looking into a similar issue - using neptune with pytorch lightning to log experiments run on spot (preemptible) instances. As fas as I understand the docs to the new API does not support PL integration yet, does it? And if I were to use the old one, resumed experiment would not have its status updated, nor its stdout and stderr logs or hardware consumption. In a reply above you've written that the new API fixes updating the status of the resumed experiment. How about hardware consumption metrics though? Would these resume tracking in a resumed experiment in the new API? Best, |
Support for You can monitor status of this feature here: Lightning-AI/pytorch-lightning#6867 . Best |
Hi, thanks for a quick reply. How about the gpu usage monitoring in a resumed experiment? Is this supported in the new API? How about logging stdout & stderr? Best, |
Yes, it is. Reasuming is fully supported in new API. :) |
awesome, can't wait! thanks! |
@PrzemekPobrotyn we've released the PyTorch Lightning integration as a separate package for now: Note that this is an update of the current integration with the new API, not a full re-write. It does take advantage of some of the new stuff like hierarchical structure, but we will be iterating on the "UX" of the integration so please note that there may be breaking changes once we release next (1.0+) version of this integration. |
Hello,
When reloading an experiment and continuing to train the network neptune fails when logging to existing channels.
Also to mention I am behind a proxy and have modifed the NepuneLogger of pytorch lightning:
Best Jonas
Here is the Traceback:
The text was updated successfully, but these errors were encountered: