Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: 'coefficient' is not defined in the energy loss config {'fn': 'mae'}. #1028

Open
yxwang1215 opened this issue Feb 25, 2025 · 4 comments

Comments

@yxwang1215
Copy link

Hi,

When I tried to follow the tutorial to run "python main.py --mode train --config-yml configs/is2re/10k/schnet/schnet.yml", this Error happened.

2025-02-25 10:53:14 (INFO): Loading model: schnet
2025-02-25 10:53:14 (INFO): Loaded SchNetWrap with 541697 parameters.
2025-02-25 10:53:15 (INFO): Loading dataset: lmdb
2025-02-25 10:53:15 (WARNING): Could not find dataset metadata.npz files in '[PosixPath('/data/wyx/OC20/is2re/10k/train/data.lmdb')]'
2025-02-25 10:53:15 (WARNING): Disabled BalancedBatchSampler because num_replicas=1.
2025-02-25 10:53:15 (WARNING): Failed to get data sizes, falling back to uniform partitioning. BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.
2025-02-25 10:53:15 (INFO): rank: 0: Sampler created...
2025-02-25 10:53:15 (INFO): Created BalancedBatchSampler with sampler=<fairchem.core.common.data_parallel.StatefulDistributedSampler object at 0x7fab5042b5f0>, batch_size=64, drop_last=False
2025-02-25 10:53:15 (WARNING): Could not find dataset metadata.npz files in '[PosixPath('/data/wyx/OC20/is2re/all/val_id/data.lmdb')]'
2025-02-25 10:53:15 (WARNING): Disabled BalancedBatchSampler because num_replicas=1.
2025-02-25 10:53:15 (WARNING): Failed to get data sizes, falling back to uniform partitioning. BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.
2025-02-25 10:53:15 (INFO): rank: 0: Sampler created...
2025-02-25 10:53:15 (INFO): Created BalancedBatchSampler with sampler=<fairchem.core.common.data_parallel.StatefulDistributedSampler object at 0x7fab5042ba40>, batch_size=64, drop_last=False
2025-02-25 10:53:15 (INFO): normalizers checkpoint for targets ['energy'] have been saved to: /home/wyx/AI4sci/fairchem/checkpoints/2025-02-25-10-52-48/normalizers.pt
2025-02-25 10:53:15 (INFO): Normalization values for output energy: mean=-1.525913953781128, rmsd=2.279365062713623.
Traceback (most recent call last):
File "/home/wyx/AI4sci/fairchem/main.py", line 8, in
main()
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/_cli.py", line 135, in main
runner_wrapper(config)
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/_cli.py", line 58, in runner_wrapper
Runner()(config)
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/_cli.py", line 37, in call
with new_trainer_context(config=config) as ctx:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/common/utils.py", line 1102, in new_trainer_context
trainer = trainer_cls(**trainer_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/ocp_trainer.py", line 109, in init
super().init(
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/base_trainer.py", line 220, in init
self.load(inference_only)
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/base_trainer.py", line 248, in load
self.load_loss()
File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/base_trainer.py", line 683, in load_loss
"coefficient" in loss[target]
AssertionError: 'coefficient' is not defined in the energy loss config {'fn': 'mae'}.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/wyx/AI4sci/fairchem/main.py", line 8, in
[rank0]: main()
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/_cli.py", line 135, in main
[rank0]: runner_wrapper(config)
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/_cli.py", line 58, in runner_wrapper
[rank0]: Runner()(config)
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/_cli.py", line 37, in call
[rank0]: with new_trainer_context(config=config) as ctx:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/contextlib.py", line 137, in enter
[rank0]: return next(self.gen)
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/common/utils.py", line 1102, in new_trainer_context
[rank0]: trainer = trainer_cls(**trainer_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/ocp_trainer.py", line 109, in init
[rank0]: super().init(
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/base_trainer.py", line 220, in init
[rank0]: self.load(inference_only)
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/base_trainer.py", line 248, in load
[rank0]: self.load_loss()
[rank0]: File "/home/wyx/anaconda3/envs/fair-chem/lib/python3.12/site-packages/fairchem/core/trainers/base_trainer.py", line 683, in load_loss
[rank0]: "coefficient" in loss[target]
[rank0]: AssertionError: 'coefficient' is not defined in the energy loss config {'fn': 'mae'}.

How can I solve this problem?
Thank you!

@blbhc
Copy link

blbhc commented Mar 4, 2025

I also encountered this problem, and I solved it using the following method:

1.find the base.yml file in the folder of the dataset you are using, for example: Fairchem/configs/is2re/10k/base.yml

2.find the following code:

loss_functions:
- energy:
    coefficient: 4  # add this
    fn: mae

@misko
Copy link
Collaborator

misko commented Mar 11, 2025

Thank you @blbhc , that looks right to me! @yxwang1215 did this help resolve the issue?

@brunosamp-usp
Copy link

brunosamp-usp commented Mar 20, 2025

I am still getting the same error, even after i did the correction suggested by @blbhc . Is there any other way to fix this issue? I am using this to run:

python main.py --mode train --cpu --config-yml /home/brunoss/programs/fairchem/configs/s2ef/2M/equiformer_v2/equiformer_v2_N@12_L@[email protected]

The error:

Traceback (most recent call last):
File "/home/brunoss/programs/fairchem/main.py", line 8, in
main()
File "/home/brunoss/programs/fairchem/src/fairchem/core/_cli.py", line 129, in main
runner_wrapper(config)
File "/home/brunoss/programs/fairchem/src/fairchem/core/_cli.py", line 58, in runner_wrapper
Runner()(config)
File "/home/brunoss/programs/fairchem/src/fairchem/core/_cli.py", line 37, in call
with new_trainer_context(config=config) as ctx:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/brunoss/miniconda3/lib/python3.12/contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "/home/brunoss/programs/fairchem/src/fairchem/core/common/utils.py", line 1103, in new_trainer_context
trainer = trainer_cls(**trainer_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/ocp_trainer.py", line 109, in init
super().init(
File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 220, in init
self.load(inference_only)
File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 246, in load
self.load_datasets()
File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 366, in load_datasets
self.train_sampler = self.get_sampler(
^^^^^^^^^^^^^^^^^
File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 314, in get_sampler
return BalancedBatchSampler(
^^^^^^^^^^^^^^^^^^^^^
File "/home/brunoss/programs/fairchem/src/fairchem/core/common/data_parallel.py", line 173, in init
raise error
File "/home/brunoss/programs/fairchem/src/fairchem/core/common/data_parallel.py", line 170, in init
dataset = _ensure_supported(dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/brunoss/programs/fairchem/src/fairchem/core/common/data_parallel.py", line 113, in _ensure_supported
raise UnsupportedDatasetError(
fairchem.core.datasets.base_dataset.UnsupportedDatasetError: BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/brunoss/programs/fairchem/main.py", line 8, in
[rank0]: main()
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/_cli.py", line 129, in main
[rank0]: runner_wrapper(config)
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/_cli.py", line 58, in runner_wrapper
[rank0]: Runner()(config)
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/_cli.py", line 37, in call
[rank0]: with new_trainer_context(config=config) as ctx:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/brunoss/miniconda3/lib/python3.12/contextlib.py", line 137, in enter
[rank0]: return next(self.gen)
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/common/utils.py", line 1103, in new_trainer_context
[rank0]: trainer = trainer_cls(**trainer_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/ocp_trainer.py", line 109, in init
[rank0]: super().init(
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 220, in init
[rank0]: self.load(inference_only)
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 246, in load
[rank0]: self.load_datasets()
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 366, in load_datasets
[rank0]: self.train_sampler = self.get_sampler(
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/trainers/base_trainer.py", line 314, in get_sampler
[rank0]: return BalancedBatchSampler(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/common/data_parallel.py", line 173, in init
[rank0]: raise error
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/common/data_parallel.py", line 170, in init
[rank0]: dataset = _ensure_supported(dataset)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/brunoss/programs/fairchem/src/fairchem/core/common/data_parallel.py", line 113, in _ensure_supported
[rank0]: raise UnsupportedDatasetError(
[rank0]: fairchem.core.datasets.base_dataset.UnsupportedDatasetError: BalancedBatchSampler requires a dataset that has a metadata attributed with number of atoms.

@blbhc
Copy link

blbhc commented Mar 22, 2025

@brunosamp-usp Perhaps you could try modifying the equiformer_v2_N@12_L@[email protected] file as follows:

optim:
  batch_size:                   4         # 6
  eval_batch_size:              4         # 6
  load_balancing: neighbors     # Modify this line
  num_workers: 8
  lr_initial:                   0.0004    # [0.0002, 0.0004], eSCN uses 0.0008 for batch size 96

I'm not sure if this solution will solve your problem because a 2M dataset is too big for me, so I didn't reproduce your problem, but you can try it.

By the way, this problem you're experiencing has been brought up before, so maybe you can check it out: #868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants