You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The LBANN Distconv adapter for layers mandates that only the first input tensor to distconv-enabled layer can be a non-DiHydrogen tensor. We raise an error if a tensor requires a copy to a DiHydrogen tensor. The following checks are done:
While these worked for the original DC layers (Convolution, MSE, ReLU), mewer DC layers such as Scatter, Gather, and MatMul generally have more than one input that may need to be copied to DiHydrogen tensors, so ideally we should support the case for multiple parent tensors requiring copy. Simply removing the checks resulted in failing CI tests.
Possible workaround with Identity layer as a copy layer also has issues: #2126
The text was updated successfully, but these errors were encountered:
The LBANN Distconv adapter for layers mandates that only the first input tensor to distconv-enabled layer can be a non-DiHydrogen tensor. We raise an error if a tensor requires a copy to a DiHydrogen tensor. The following checks are done:
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L329
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L646
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L787
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L812
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L836
https://github.com/LLNL/lbann/blob/3b0ea84e2e0b86d14f466d9abe7c60e8b026e84a/src/layers/data_type_distconv_adapter.cpp#L861
While these worked for the original DC layers (Convolution, MSE, ReLU), mewer DC layers such as Scatter, Gather, and MatMul generally have more than one input that may need to be copied to DiHydrogen tensors, so ideally we should support the case for multiple parent tensors requiring copy. Simply removing the checks resulted in failing CI tests.
Possible workaround with Identity layer as a copy layer also has issues: #2126
The text was updated successfully, but these errors were encountered: