Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base' #30

Open
SkullFang opened this issue Nov 1, 2022 · 1 comment

Comments

@SkullFang
Copy link

torch 1.6.0 and torch 1.8.1 not work. assert this error like title.

Traceback (most recent call last):
File "main.py", line 14, in
from sentence_transformers import models, losses
File "/root/ConSERT/sentence_transformers/init.py", line 3, in
from .datasets import SentencesDataset, SentenceLabelDataset, ParallelSentencesDataset
File "/root/ConSERT/sentence_transformers/datasets/init.py", line 1, in
from .sampler import *
File "/root/ConSERT/sentence_transformers/datasets/sampler/init.py", line 1, in
from .LabelSampler import *
File "/root/ConSERT/sentence_transformers/datasets/sampler/LabelSampler.py", line 6, in
from ...datasets import SentenceLabelDataset
File "/root/ConSERT/sentence_transformers/datasets/SentenceLabelDataset.py", line 8, in
from .. import SentenceTransformer
File "/root/ConSERT/sentence_transformers/SentenceTransformer.py", line 11, in
import transformers
File "/root/ConSERT/transformers/init.py", line 22, in
from .integrations import ( # isort:skip
File "/root/ConSERT/transformers/integrations.py", line 58, in
from .file_utils import is_torch_tpu_available
File "/root/ConSERT/transformers/file_utils.py", line 140, in
from apex import amp # noqa: F401
File "/root/miniconda3/lib/python3.8/site-packages/apex/init.py", line 27, in
from . import transformer
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/init.py", line 4, in
from apex.transformer import pipeline_parallel
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/init.py", line 1, in
from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/init.py", line 3, in
from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import (
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py", line 10, in
from apex.transformer.pipeline_parallel.schedules.common import Batch
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/common.py", line 9, in
from apex.transformer.pipeline_parallel.p2p_communication import FutureTensor
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/p2p_communication.py", line 25, in
from apex.transformer.utils import split_tensor_into_1d_equal_chunks
File "/root/miniconda3/lib/python3.8/site-packages/apex/transformer/utils.py", line 11, in
torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

this error in apex
NVIDIA/apex#1526

apex not match torch version, can you tell me your torch version?

@zhou-wb
Copy link

zhou-wb commented Nov 4, 2022

try to use an old version of apex:

git clone https://github.com/NVIDIA/apex
cd apex
git checkout 22.05-dev
pip install -v --disable-pip-version-check --no-cache-dir ./

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants