Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified config manager for toml and command line #76

Merged
merged 1 commit into from
Feb 24, 2024
Merged

Conversation

gnadathur
Copy link
Contributor

Summary:

PR implements an unfied config manager.

  • Command line args and toml file args are now unified.
  • Defaults can be loaded from either.

options like training.batchsize will be available as config.training.batchsize where config is a config manager object.

Test Plan:

Test Plan:
============================= test session starts ============================== platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache
rootdir: /data/users/gnadathur/a/torchtrain
configfile: pyproject.toml
plugins: cov-4.1.0
collecting ... collected 5 items

test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%] test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%] test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%] test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%] test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%]

---------- coverage: platform linux, python 3.10.13-final-0 ---------- Coverage XML written to file coverage.xml

============================= slowest 20 durations ============================= 0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override 0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override ============================== 5 passed in 0.10s ===============================

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

PR implements an unfied config manager.

- Command line args and toml file args are now unified.
- Defaults can be loaded from either.

options like `training.batchsize` will be available as
`config.training.batchsize` where `config` is a config manager object.

Test Plan:

Test Plan:
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.4.0 -- /home/gnadathur/local/a/pytorch-env/bin/python
cachedir: .pytest_cache
rootdir: /data/users/gnadathur/a/torchtrain
configfile: pyproject.toml
plugins: cov-4.1.0
collecting ... collected 5 items

test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [ 20%]
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override PASSED [ 40%]
test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [ 60%]
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override PASSED [ 80%]
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist PASSED [100%]

---------- coverage: platform linux, python 3.10.13-final-0 ----------
Coverage XML written to file coverage.xml

============================= slowest 20 durations =============================
0.01s call test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s call test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s call test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s setup test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s teardown test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s teardown test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s teardown test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
============================== 5 passed in 0.10s ===============================

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 23, 2024
@@ -67,13 +68,14 @@ def partition_fn(name, module, device_mesh):


# Uses PTD FSDP AC wrapper
def checkpoint_wrapper(module, config):
# TODO: why is config needed here?
def checkpoint_wrapper(module, job_config: JobConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so that we could later add an option "selective_ac" and do either full AC or selective AC


if run_profiler:
dump_dir = config["global"]["dump_folder"]
save_trace_dir = config["profiling"]["save_traces_folder"]
dump_dir = config.job.dump_folder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

with open(config_file, "rb") as f:
args_dict = tomllib.load(f)
for k, v in args_dict.items():
class_type = type(k.title(), (), v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the title() call here mean? iiuc k here is a dict and it does not have a title method? wondering if this is toml specific and if we don't use toml this would error out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title() just makes sure that the type of the class is title case. For ex. training would be of type Training.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh got it

args_dict = self._args_to_two_level_dict(args)
else:
with open(config_file, "rb") as f:
args_dict = tomllib.load(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if for the case where the toml file does not have all the defaults, currently we would not populate the fields iiuc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this assumes toml file is complete and we dont want to mix defaults. If we want to implicitly pull missing defaults, we can do that but I was not sure if thats a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably do a follow up later if we found that's useful :)

Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! lgtm

@gnadathur
Copy link
Contributor Author

Lint passes manually of the PR: ran flake8 --config=.flake on the files

@gnadathur gnadathur merged commit a85b9d4 into main Feb 24, 2024
2 of 3 checks passed
@gnadathur gnadathur deleted the cm_redo branch February 25, 2024 01:06
lessw2020 pushed a commit that referenced this pull request Apr 18, 2024
Summary:

PR implements an unfied config manager.

- Command line args and toml file args are now unified.
- Defaults can be loaded from either.

options like `training.batchsize` will be available as
`config.training.batchsize` where `config` is a config manager object.

Test Plan:

Test Plan:
============================= test session starts
============================== platform linux -- Python 3.10.13,
pytest-8.0.1, pluggy-1.4.0 --
/home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache
rootdir: /data/users/gnadathur/a/torchtrain
configfile: pyproject.toml
plugins: cov-4.1.0
collecting ... collected 5 items

test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [
20%]
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
PASSED [ 40%]
test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [
60%]
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
PASSED [ 80%]
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
PASSED [100%]

---------- coverage: platform linux, python 3.10.13-final-0 ----------
Coverage XML written to file coverage.xml

============================= slowest 20 durations
============================= 0.01s call
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s call
test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s
call
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s call
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup
test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s
teardown test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s setup
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s teardown
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s setup
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s teardown
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s teardown
test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s
teardown
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
============================== 5 passed in 0.10s
===============================

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: gnadathur <[email protected]>
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
Summary:

PR implements an unfied config manager.

- Command line args and toml file args are now unified.
- Defaults can be loaded from either.

options like `training.batchsize` will be available as
`config.training.batchsize` where `config` is a config manager object.

Test Plan:

Test Plan:
============================= test session starts
============================== platform linux -- Python 3.10.13,
pytest-8.0.1, pluggy-1.4.0 --
/home/gnadathur/local/a/pytorch-env/bin/python cachedir: .pytest_cache
rootdir: /data/users/gnadathur/a/torchtrain
configfile: pyproject.toml
plugins: cov-4.1.0
collecting ... collected 5 items

test/test_job_config.py::TestJobConfig::test_command_line_args PASSED [
20%]
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
PASSED [ 40%]
test/test_job_config.py::TestJobConfig::test_job_config_file PASSED [
60%]
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
PASSED [ 80%]
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
PASSED [100%]

---------- coverage: platform linux, python 3.10.13-final-0 ----------
Coverage XML written to file coverage.xml

============================= slowest 20 durations
============================= 0.01s call
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s call test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s call
test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s
call
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s call
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup
test/test_job_config.py::TestJobConfig::test_command_line_args 0.00s
teardown test/test_job_config.py::TestJobConfig::test_command_line_args
0.00s setup
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s setup
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s teardown
test/test_job_config.py::TestJobConfig::test_command_line_args_with_override
0.00s setup
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
0.00s setup test/test_job_config.py::TestJobConfig::test_job_config_file
0.00s teardown
test/test_job_config.py::TestJobConfig::test_job_file_does_not_exist
0.00s teardown
test/test_job_config.py::TestJobConfig::test_job_config_file 0.00s
teardown
test/test_job_config.py::TestJobConfig::test_job_config_file_with_override
============================== 5 passed in 0.10s
===============================

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: gnadathur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants