Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch RuntimeError when enabling mixed precision in transformer (roberta-base) #13512

Open
ferranconde opened this issue May 30, 2024 · 0 comments

Comments

@ferranconde
Copy link

Hi! I've been using spaCy over the last few weeks to fine-tune a roberta-base model for NER. So far, the experience has been great and I'm able to train and use the fine-tuned models without any issues.

I now wanted to enable mixed precision to speed up the training process. However, when I do that, I get the following error:

File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
    torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.

Toggling mixed_precision back to false results in successful training.

Traceback

/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
ℹ Saving to output directory: spacy_trained_pipeline_en
ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  -------------  --------  ------  ------  ------  ------
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('current_scale must be a float tensor.')
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/_util.py", line 87, in setup_cli
    command(prog_name=COMMAND)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 783, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 225, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 54, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 84, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 135, in train
    raise e
  File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 118, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 236, in train_while_improving
    proc.finish_update(optimizer)  # type: ignore[attr-defined]
  File "spacy/pipeline/trainable_pipe.pyx", line 252, in spacy.pipeline.trainable_pipe.TrainablePipe.finish_update
  File "/usr/local/lib/python3.10/dist-packages/thinc/model.py", line 342, in finish_update
    shim.finish_update(optimizer)
  File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch.py", line 180, in finish_update
    self._grad_scaler.update()
  File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
    torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.

To me, this hints that the grad_scaler_config is somehow not getting to PyTorch, but I'm not sure what I'm doing wrong.
I'm following the example config from spacy-transformers.TransformerModel.v3.

My config file, trf_config.cfg

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 64
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
vectors = {"@vectors":"spacy.Vectors.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
# mixed_precision = false
mixed_precision = true
grad_scaler_config = {"init_scale": 32768}

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 200
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

How to reproduce the behaviour

I'm running the training on Google Colab, using a Tesla T4 runtime:

!nvidia-smi -L
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

GPU 0: Tesla T4 (UUID: GPU-0c3e659f-2933-c77e-7694-6112031f1cef)

I've tried not executing the line !export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, but it doesn't make a difference.

I've also made sure that I call spacy train with --gpu-id 0.

Here's the exact steps of the Colab notebook I use:

Colab notebook

!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

!pip install spacy[cuda12x,transformers] transformers[sentencepiece]
!pip freeze | grep cupy

cupy-cuda12x==12.2.0

!python -m spacy download en_core_web_trf
!nvidia-smi -L
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

GPU 0: Tesla T4 (UUID: GPU-0c3e659f-2933-c77e-7694-6112031f1cef)

!pip3 freeze | grep torch

torch @ https://download.pytorch.org/whl/cu121/torch-2.3.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=0a12aa9aa6bc442dff8823ac8b48d991fd0771562eaa38593f9c8196d65f7007
torchaudio @ https://download.pytorch.org/whl/cu121/torchaudio-2.3.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=38b49393f8c322dcaa29d19e5acbf5a0b1978cf1b719445ab670f1fb486e3aa6
torchsummary==1.5.1
torchtext==0.18.0
torchvision @ https://download.pytorch.org/whl/cu121/torchvision-0.18.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=13e1b48dc5ce41ccb8100ab3dd26fdf31d8f1e904ecf2865ac524493013d0df5

!python -m spacy train ./trf_config.cfg --output ./spacy_trained_pipeline_en --paths.train "train.spacy" --paths.dev "dev.spacy" --gpu-id 0

Could you please give me a hand? Thanks a lot!

Info about spaCy

  • spaCy version: 3.7.4
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Pipelines: en_core_web_trf (3.7.3), en_core_web_sm (3.7.1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant