-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TRAINS experiment manager support #1122
Merged
Merged
Changes from 13 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
3a868d8
Add allegro.ai TRAINS experiment manager support
ac1ffb4
improve docstring and type hinting, fix the bug in log_metrics, add s…
S-aiueo32 c2fd85b
complete missing docstring of constructor's arguments
S-aiueo32 196700a
fix docs
S-aiueo32 1f53519
Merge pull request #1 from S-aiueo32/master
bmartinn 5d40030
Merge remote-tracking branch 'pytorch-lightning.git/master'
12c5104
pep8
b6ab8b9
pep8
05ce77b
remove redundant typing
3a72b74
remove deprecated interface
5256015
add TrainsLogger test
7294d97
add TrainsLogger PR in CHANGELOG
81b2f9c
add id/name property documentation
a1f671a
change logging as log
430de28
Merge branch 'master' into master
bmartinn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,3 +32,4 @@ dependencies: | |
- comet_ml>=1.0.56 | ||
- wandb>=0.8.21 | ||
- neptune-client>=0.4.4 | ||
- trains>=0.13.3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,283 @@ | ||
""" | ||
Log using `allegro.ai TRAINS <https://github.com/allegroai/trains>'_ | ||
|
||
.. code-block:: python | ||
|
||
from pytorch_lightning.loggers import TrainsLogger | ||
trains_logger = TrainsLogger( | ||
project_name="pytorch lightning", | ||
task_name="default", | ||
) | ||
trainer = Trainer(logger=trains_logger) | ||
|
||
|
||
Use the logger anywhere in you LightningModule as follows: | ||
|
||
.. code-block:: python | ||
|
||
def train_step(...): | ||
# example | ||
self.logger.experiment.whatever_trains_supports(...) | ||
|
||
def any_lightning_module_function_or_hook(...): | ||
self.logger.experiment.whatever_trains_supports(...) | ||
|
||
""" | ||
|
||
import logging | ||
from argparse import Namespace | ||
from pathlib import Path | ||
from typing import Any, Dict, Optional, Union | ||
|
||
import PIL | ||
import numpy as np | ||
import pandas as pd | ||
import torch | ||
|
||
try: | ||
import trains | ||
except ImportError: | ||
raise ImportError('You want to use `TRAINS` logger which is not installed yet,' | ||
' install it with `pip install trains`.') | ||
|
||
from .base import LightningLoggerBase, rank_zero_only | ||
|
||
|
||
class TrainsLogger(LightningLoggerBase): | ||
"""Logs using TRAINS | ||
|
||
Args: | ||
project_name: The name of the experiment's project. Defaults to None. | ||
task_name: The name of the experiment. Defaults to None. | ||
task_type: The name of the experiment. Defaults to 'training'. | ||
reuse_last_task_id: Start with the previously used task id. Defaults to True. | ||
output_uri: Default location for output models. Defaults to None. | ||
auto_connect_arg_parser: Automatically grab the ArgParser | ||
and connect it with the task. Defaults to True. | ||
auto_connect_frameworks: If True, automatically patch to trains backend. Defaults to True. | ||
auto_resource_monitoring: If true, machine vitals will be | ||
sent along side the task scalars. Defaults to True. | ||
""" | ||
|
||
def __init__( | ||
self, project_name: Optional[str] = None, task_name: Optional[str] = None, | ||
task_type: str = 'training', reuse_last_task_id: bool = True, | ||
output_uri: Optional[str] = None, auto_connect_arg_parser: bool = True, | ||
auto_connect_frameworks: bool = True, auto_resource_monitoring: bool = True) -> None: | ||
super().__init__() | ||
self._trains = trains.Task.init( | ||
project_name=project_name, task_name=task_name, task_type=task_type, | ||
reuse_last_task_id=reuse_last_task_id, output_uri=output_uri, | ||
auto_connect_arg_parser=auto_connect_arg_parser, | ||
auto_connect_frameworks=auto_connect_frameworks, | ||
auto_resource_monitoring=auto_resource_monitoring | ||
) | ||
|
||
@property | ||
def experiment(self) -> trains.Task: | ||
r"""Actual TRAINS object. To use TRAINS features do the following. | ||
|
||
Example: | ||
.. code-block:: python | ||
self.logger.experiment.some_trains_function() | ||
|
||
""" | ||
return self._trains | ||
|
||
@property | ||
def id(self) -> Union[str, None]: | ||
""" | ||
ID is a uuid (string) representing this specific experiment in the entire system. | ||
""" | ||
if not self._trains: | ||
return None | ||
return self._trains.id | ||
|
||
@rank_zero_only | ||
def log_hyperparams(self, params: Union[Dict[str, Any], Namespace]) -> None: | ||
"""Log hyperparameters (numeric values) in TRAINS experiments | ||
|
||
Args: | ||
params: | ||
The hyperparameters that passed through the model. | ||
""" | ||
if not self._trains: | ||
return None | ||
if not params: | ||
return | ||
if isinstance(params, dict): | ||
self._trains.connect(params) | ||
else: | ||
self._trains.connect(vars(params)) | ||
|
||
@rank_zero_only | ||
def log_metrics(self, metrics: Dict[str, float], step: Optional[int] = None) -> None: | ||
"""Log metrics (numeric values) in TRAINS experiments. | ||
This method will be called by Trainer. | ||
|
||
Args: | ||
metrics: | ||
The dictionary of the metrics. | ||
If the key contains "/", it will be split by the delimiter, | ||
then the elements will be logged as "title" and "series" respectively. | ||
step: Step number at which the metrics should be recorded. Defaults to None. | ||
""" | ||
if not self._trains: | ||
return None | ||
|
||
if not step: | ||
step = self._trains.get_last_iteration() | ||
|
||
for k, v in metrics.items(): | ||
if isinstance(v, str): | ||
logging.warning("Discarding metric with string value {}={}".format(k, v)) | ||
continue | ||
if isinstance(v, torch.Tensor): | ||
v = v.item() | ||
parts = k.split('/') | ||
if len(parts) <= 1: | ||
series = title = k | ||
else: | ||
title = parts[0] | ||
series = '/'.join(parts[1:]) | ||
self._trains.get_logger().report_scalar( | ||
title=title, series=series, value=v, iteration=step) | ||
|
||
@rank_zero_only | ||
def log_metric(self, title: str, series: str, value: float, step: Optional[int] = None) -> None: | ||
"""Log metrics (numeric values) in TRAINS experiments. | ||
This method will be called by the users. | ||
|
||
Args: | ||
title: The title of the graph to log, e.g. loss, accuracy. | ||
series: The series name in the graph, e.g. classification, localization. | ||
value: The value to log. | ||
step: Step number at which the metrics should be recorded. Defaults to None. | ||
""" | ||
if not self._trains: | ||
return None | ||
|
||
if not step: | ||
step = self._trains.get_last_iteration() | ||
|
||
if isinstance(value, torch.Tensor): | ||
value = value.item() | ||
self._trains.get_logger().report_scalar( | ||
title=title, series=series, value=value, iteration=step) | ||
|
||
@rank_zero_only | ||
def log_text(self, text: str) -> None: | ||
"""Log console text data in TRAINS experiment | ||
|
||
Args: | ||
text: The value of the log (data-point). | ||
""" | ||
if not self._trains: | ||
return None | ||
|
||
self._trains.get_logger().report_text(text) | ||
|
||
@rank_zero_only | ||
def log_image( | ||
self, title: str, series: str, | ||
image: Union[str, np.ndarray, PIL.Image.Image, torch.Tensor], | ||
step: Optional[int] = None) -> None: | ||
"""Log Debug image in TRAINS experiment | ||
|
||
Args: | ||
title: The title of the debug image, i.e. "failed", "passed". | ||
series: The series name of the debug image, i.e. "Image 0", "Image 1". | ||
image: | ||
Debug image to log. Can be one of the following types: | ||
Torch, Numpy, PIL image, path to image file (str) | ||
If Numpy or Torch, the image is assume to be the following: | ||
shape: CHW | ||
color space: RGB | ||
value range: [0., 1.] (float) or [0, 255] (uint8) | ||
step: | ||
Step number at which the metrics should be recorded. Defaults to None. | ||
""" | ||
if not self._trains: | ||
return None | ||
|
||
if not step: | ||
step = self._trains.get_last_iteration() | ||
|
||
if isinstance(image, str): | ||
self._trains.get_logger().report_image( | ||
title=title, series=series, local_path=image, iteration=step) | ||
else: | ||
if isinstance(image, torch.Tensor): | ||
image = image.cpu().numpy() | ||
if isinstance(image, np.ndarray): | ||
image = image.transpose(1, 2, 0) | ||
self._trains.get_logger().report_image( | ||
title=title, series=series, image=image, iteration=step) | ||
|
||
@rank_zero_only | ||
def log_artifact( | ||
self, name: str, | ||
artifact: Union[str, Path, Dict[str, Any], pd.DataFrame, np.ndarray, PIL.Image.Image], | ||
metadata: Optional[Dict[str, Any]] = None, delete_after_upload: bool = False) -> None: | ||
"""Save an artifact (file/object) in TRAINS experiment storage. | ||
|
||
Args: | ||
name: Artifact name. Notice! it will override previous artifact | ||
if name already exists | ||
artifact: Artifact object to upload. Currently supports: | ||
- string / pathlib2.Path are treated as path to artifact file to upload | ||
If wildcard or a folder is passed, zip file containing the | ||
local files will be created and uploaded | ||
- dict will be stored as .json file and uploaded | ||
- pandas.DataFrame will be stored as .csv.gz (compressed CSV file) and uploaded | ||
- numpy.ndarray will be stored as .npz and uploaded | ||
- PIL.Image will be stored to .png file and uploaded | ||
metadata: | ||
Simple key/value dictionary to store on the artifact. Defaults to None. | ||
delete_after_upload: | ||
If True local artifact will be deleted (only applies if artifact_object is a | ||
local file). Defaults to False. | ||
""" | ||
if not self._trains: | ||
return None | ||
|
||
self._trains.upload_artifact( | ||
name=name, artifact_object=artifact, metadata=metadata, | ||
delete_after_upload=delete_after_upload | ||
) | ||
|
||
def save(self) -> None: | ||
pass | ||
|
||
@rank_zero_only | ||
def finalize(self, status: str) -> None: | ||
if not self._trains: | ||
return None | ||
self._trains.close() | ||
self._trains = None | ||
|
||
@property | ||
def name(self) -> Union[str, None]: | ||
""" | ||
Name is a human readable non-unique name (str) of the experiment. | ||
""" | ||
if not self._trains: | ||
return None | ||
return self._trains.name | ||
|
||
@property | ||
def version(self) -> Union[str, None]: | ||
if not self._trains: | ||
return None | ||
return self._trains.id | ||
|
||
def __getstate__(self) -> Union[str, None]: | ||
if not self._trains: | ||
return None | ||
return self._trains.id | ||
|
||
def __setstate__(self, state: str) -> None: | ||
self._rank = 0 | ||
self._trains = None | ||
if state: | ||
self._trains = trains.Task.get_task(task_id=state) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,4 +2,5 @@ neptune-client>=0.4.4 | |
comet-ml>=1.0.56 | ||
mlflow>=1.0.0 | ||
test_tube>=0.7.5 | ||
wandb>=0.8.21 | ||
wandb>=0.8.21 | ||
trains>=0.13.3 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed.