AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #30

SkullFang · 2022-11-01T07:20:02Z

torch 1.6.0 and torch 1.8.1 not work. assert this error like title.

Traceback (most recent call last):
File "main.py", line 14, in
from sentence_transformers import models, losses
File "/root/ConSERT/sentence_transformers/init.py", line 3, in
from .datasets import SentencesDataset, SentenceLabelDataset, ParallelSentencesDataset
File "/root/ConSERT/sentence_transformers/datasets/init.py", line 1, in
from .sampler import *
File "/root/ConSERT/sentence_transformers/datasets/sampler/init.py", line 1, in
from .LabelSampler import *
File "/root/ConSERT/sentence_transformers/datasets/sampler/LabelSampler.py", line 6, in
from ...datasets import SentenceLabelDataset
File "/root/ConSERT/sentence_transformers/datasets/SentenceLabelDataset.py", line 8, in
from .. import SentenceTransformer
File "/root/ConSERT/sentence_transformers/SentenceTransformer.py", line 11, in
import transformers
File "/root/ConSERT/transformers/init.py", line 22, in
from .integrations import ( # isort:skip
File "/root/ConSERT/transformers/integrations.py", line 58, in
from .file_utils import is_torch_tpu_available
File "/root/ConSERT/transformers/file_utils.py", line 140, in
from apex import amp # noqa: F401
File "/root/miniconda3/lib/python3.8/site-packages/apex/init.py", line 27, in
from . import transformer
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/init.py", line 4, in
from apex.transformer import pipeline_parallel
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/init.py", line 1, in
from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/init.py", line 3, in
from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import (
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py", line 10, in
from apex.transformer.pipeline_parallel.schedules.common import Batch
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/common.py", line 9, in
from apex.transformer.pipeline_parallel.p2p_communication import FutureTensor
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/p2p_communication.py", line 25, in
from apex.transformer.utils import split_tensor_into_1d_equal_chunks
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/utils.py", line 11, in
torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

this error in apex
NVIDIA/apex#1526

apex not match torch version, can you tell me your torch version?

zhou-wb · 2022-11-04T04:37:57Z

try to use an old version of apex:

git clone https://github.com/NVIDIA/apex
cd apex
git checkout 22.05-dev
pip install -v --disable-pip-version-check --no-cache-dir ./

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #30

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #30

SkullFang commented Nov 1, 2022

zhou-wb commented Nov 4, 2022

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #30

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #30

Comments

SkullFang commented Nov 1, 2022

zhou-wb commented Nov 4, 2022