Experiments Running and Management #246

melo-gonzo · 2024-06-27T16:50:11Z

This PR adds functionality to run and manage experiments in matsciml. Currently, the options for experimentation are to create a script for each experiment run and manually update parameters, or to use the pytorch lightning cli + yaml files. The first approach is time-consuming and clumsy, and the second approach does not handle multi task and multi data experiments in matsciml. This PR aims to bridge the gap between functional experimental pipelines without the headache of managing a ton of scripts. It takes inspiration from the lightning cli by managing the different aspects of an experiment with yaml files {trainer, model, dataset, experiment}.yaml and allows for cli updates by specifying a chain of parameters to traverse and update. Experimentation effectively comes down to managing these yaml files and greatly reduces the barrier to entry towards running complex experiments with multi task and multi data origins. See the README for more details.

…eriments-refactor

…in cli.

…eriments-refactor-branch

melo-gonzo · 2024-06-27T17:26:50Z

Note that I updated matsciml/models/dgl/gaanet/tests/test_gala.py and matsciml/models/base.py to match changes from #228.

laserkelvin

I've put in some minor comments, but the two high level things that I'll bring up for discussion are:

Maybe we can finally use the registry, so instead of maintaining individual maps in task, dataset, and models, just rely on matsciml.common.registry to do the mapping?
I'm not sure if I missed it, but it might be good to refactor your object creation functions (i.e. mapping args into something you pull out of a mapping) into matsciml.common.registry. The "improvement" so to speak would be to also include arg/kwarg validation, similar to this and I think also in MACEWrapper. I think you want to raise an exception if someone passes something that isn't recognized, and this would be a good place to put it (even as a staticmethod of Registry).

laserkelvin · 2024-07-01T19:23:19Z

experiments/datasets/data_module_config.py

Do you want to move this things into the main datasets module?

You could repackage these functions as a from_config class method, then you don't need to include __init__.py in this folder. It could mess up pip installs

I think I'd prefer to keep this separate from the datasets module since it is also relying on other experiment related utils such as the command line parsing. If there ends up being other use cases where this functionality would be helpful from the dataset module we can do a refactor then.

experiments/README.md

experiments/datasets/materials_project.yaml

experiments/task_config/task_config.py

…eriments-refactor-branch

melo-gonzo · 2024-07-11T21:23:13Z

bb140fb

I added the class argument verification check in this commit. I'm not sure it makes sense to refactor the object creation function instantiate_arg_dict and the argument checker verify_class_args into the Registry class, because these functions also are used to spin up lightning classes such as callbacks and loggers, and other random functions that may not be a part of the registered objects, such as matsciml.datasets.utils.element_types`, and lots of the MACE moules.

…eriments-refactor-branch

experiments/README.md

laserkelvin · 2024-07-15T21:39:25Z

experiments/datasets/__init__.py

+import yaml
+
+
+yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))


I'll make this optional for you, but just looks cleaner if you use pathlib instead:

from pathlib import Path yaml_dir = Path(__file__) for filename in yaml_dir.rglob("*.yaml"): ...

fixed 759a4fc

experiments/datasets/tests/test_data_module_creation.py

laserkelvin · 2024-07-15T21:43:08Z

experiments/models/__init__.py

+from torch.nn import LayerNorm
+
+
+yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))


Same comment as for datasets, use pathlib instead of os.path

fixed 759a4fc

laserkelvin · 2024-07-15T21:53:55Z

experiments/trainer_config/__init__.py

+import yaml
+
+
+yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))


Same comment for pathlib

fixed 759a4fc

experiments/training_script.py

matsciml/models/base.py

laserkelvin

Thanks for making all the changes, LGTM!

melo-gonzo added 15 commits June 6, 2024 12:08

refactor: continued

d5b1011

Merge branch 'main' of https://github.com/IntelLabs/matsciml into exp…

a9f4e00

…eriments-refactor

refactor: updating experiment pipeline

e359a64

fix: forcing some imports

690502d

feat: updates to training script

9b0ffe8

fix: updating logger and callbacks params

fafe4db

fix: update tests

8a21133

feat: adding cli argument parsing

98da45d

feat: model configs and some bug fixes

e684c8c

docs/fix: adding readme, updating task args setting

c42f803

fix: check if additional task args are present

4acb1cd

docs: adding a few doc strings and type hints

686d066

deps/fix: add experiments folder to install, fix update trainer args …

459b052

…in cli.

refactor: gitignore

20f2549

Merge branch 'main' of https://github.com/IntelLabs/matsciml into exp…

38804a1

…eriments-refactor-branch

melo-gonzo requested a review from laserkelvin June 27, 2024 16:50

fix: update gala test file

e73aef1

laserkelvin requested changes Jul 1, 2024

View reviewed changes

melo-gonzo added 5 commits July 9, 2024 14:56

fix: typo

81b8e06

fix: purge devset paths from mp data config

2bad280

feat: add class init_args verification

bb140fb

Merge branch 'main' of https://github.com/IntelLabs/matsciml into exp…

35014e3

…eriments-refactor-branch

feat: purge the task_map and use registry instead.

f436403

Merge branch 'main' of https://github.com/IntelLabs/matsciml into exp…

4a246dd

…eriments-refactor-branch

laserkelvin requested changes Jul 15, 2024

View reviewed changes

melo-gonzo added 3 commits July 15, 2024 15:20

fix: remove old task map references, update os.path to pathlib

759a4fc

feat: adding help to argparser

35e44e9

refactor: use long instead of int

8e11437

refactor: purge os from init files

cb51078

laserkelvin approved these changes Jul 15, 2024

View reviewed changes

laserkelvin merged commit 651a2b5 into IntelLabs:main Jul 15, 2024
3 checks passed

melo-gonzo deleted the experiments-refactor-branch branch October 1, 2024 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments Running and Management #246

Experiments Running and Management #246

melo-gonzo commented Jun 27, 2024

melo-gonzo commented Jun 27, 2024

laserkelvin left a comment

laserkelvin Jul 1, 2024

melo-gonzo Jul 12, 2024

melo-gonzo commented Jul 11, 2024

laserkelvin Jul 15, 2024

melo-gonzo Jul 15, 2024

laserkelvin Jul 15, 2024

melo-gonzo Jul 15, 2024

laserkelvin Jul 15, 2024

melo-gonzo Jul 15, 2024

laserkelvin left a comment

		import yaml


		yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))

		from torch.nn import LayerNorm


		yaml_dir = yaml_dir = os.path.dirname(os.path.abspath(__file__))

Experiments Running and Management #246

Experiments Running and Management #246

Conversation

melo-gonzo commented Jun 27, 2024

melo-gonzo commented Jun 27, 2024

laserkelvin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

melo-gonzo commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laserkelvin left a comment

Choose a reason for hiding this comment