Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments Running and Management #246

Merged
merged 26 commits into from
Jul 15, 2024

Conversation

melo-gonzo
Copy link
Collaborator

This PR adds functionality to run and manage experiments in matsciml. Currently, the options for experimentation are to create a script for each experiment run and manually update parameters, or to use the pytorch lightning cli + yaml files. The first approach is time-consuming and clumsy, and the second approach does not handle multi task and multi data experiments in matsciml. This PR aims to bridge the gap between functional experimental pipelines without the headache of managing a ton of scripts. It takes inspiration from the lightning cli by managing the different aspects of an experiment with yaml files {trainer, model, dataset, experiment}.yaml and allows for cli updates by specifying a chain of parameters to traverse and update. Experimentation effectively comes down to managing these yaml files and greatly reduces the barrier to entry towards running complex experiments with multi task and multi data origins. See the README for more details.

@melo-gonzo
Copy link
Collaborator Author

Note that I updated matsciml/models/dgl/gaanet/tests/test_gala.py and matsciml/models/base.py to match changes from #228.

Copy link
Collaborator

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put in some minor comments, but the two high level things that I'll bring up for discussion are:

  1. Maybe we can finally use the registry, so instead of maintaining individual maps in task, dataset, and models, just rely on matsciml.common.registry to do the mapping?
  2. I'm not sure if I missed it, but it might be good to refactor your object creation functions (i.e. mapping args into something you pull out of a mapping) into matsciml.common.registry. The "improvement" so to speak would be to also include arg/kwarg validation, similar to this and I think also in MACEWrapper. I think you want to raise an exception if someone passes something that isn't recognized, and this would be a good place to put it (even as a staticmethod of Registry).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to move this things into the main datasets module?

You could repackage these functions as a from_config class method, then you don't need to include __init__.py in this folder. It could mess up pip installs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to keep this separate from the datasets module since it is also relying on other experiment related utils such as the command line parsing. If there ends up being other use cases where this functionality would be helpful from the dataset module we can do a refactor then.

experiments/README.md Outdated Show resolved Hide resolved
experiments/datasets/materials_project.yaml Outdated Show resolved Hide resolved
experiments/task_config/task_config.py Outdated Show resolved Hide resolved
@melo-gonzo
Copy link
Collaborator Author

bb140fb

I added the class argument verification check in this commit. I'm not sure it makes sense to refactor the object creation function instantiate_arg_dict and the argument checker verify_class_args into the Registry class, because these functions also are used to spin up lightning classes such as callbacks and loggers, and other random functions that may not be a part of the registered objects, such as matsciml.datasets.utils.element_types`, and lots of the MACE moules.

experiments/README.md Outdated Show resolved Hide resolved
import yaml


yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this optional for you, but just looks cleaner if you use pathlib instead:

from pathlib import Path
yaml_dir = Path(__file__)

for filename in yaml_dir.rglob("*.yaml"):
    ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed 759a4fc

experiments/datasets/tests/test_data_module_creation.py Outdated Show resolved Hide resolved
from torch.nn import LayerNorm


yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for datasets, use pathlib instead of os.path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed 759a4fc

import yaml


yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for pathlib

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed 759a4fc

experiments/training_script.py Outdated Show resolved Hide resolved
matsciml/models/base.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all the changes, LGTM!

@laserkelvin laserkelvin merged commit 651a2b5 into IntelLabs:main Jul 15, 2024
3 checks passed
@melo-gonzo melo-gonzo deleted the experiments-refactor-branch branch October 1, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants