PyTorch RuntimeError when enabling mixed precision in transformer (roberta-base) #13512

ferranconde · 2024-05-30T11:46:06Z

Hi! I've been using spaCy over the last few weeks to fine-tune a roberta-base model for NER. So far, the experience has been great and I'm able to train and use the fine-tuned models without any issues.

I now wanted to enable mixed precision to speed up the training process. However, when I do that, I get the following error:

File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
    torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.

Toggling mixed_precision back to false results in successful training.

Traceback

/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
ℹ Saving to output directory: spacy_trained_pipeline_en
ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  -------------  --------  ------  ------  ------  ------
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('current_scale must be a float tensor.')
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/_util.py", line 87, in setup_cli
    command(prog_name=COMMAND)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 783, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 225, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 54, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/spacy/cli/train.py", line 84, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 135, in train
    raise e
  File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 118, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/usr/local/lib/python3.10/dist-packages/spacy/training/loop.py", line 236, in train_while_improving
    proc.finish_update(optimizer)  # type: ignore[attr-defined]
  File "spacy/pipeline/trainable_pipe.pyx", line 252, in spacy.pipeline.trainable_pipe.TrainablePipe.finish_update
  File "/usr/local/lib/python3.10/dist-packages/thinc/model.py", line 342, in finish_update
    shim.finish_update(optimizer)
  File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch.py", line 180, in finish_update
    self._grad_scaler.update()
  File "/usr/local/lib/python3.10/dist-packages/thinc/shims/pytorch_grad_scaler.py", line 171, in update
    torch._amp_update_scale_(
RuntimeError: current_scale must be a float tensor.

To me, this hints that the grad_scaler_config is somehow not getting to PyTorch, but I'm not sure what I'm doing wrong.
I'm following the example config from spacy-transformers.TransformerModel.v3.

My config file, trf_config.cfg

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 64
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
vectors = {"@vectors":"spacy.Vectors.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
# mixed_precision = false
mixed_precision = true
grad_scaler_config = {"init_scale": 32768}

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 200
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

How to reproduce the behaviour

I'm running the training on Google Colab, using a Tesla T4 runtime:

!nvidia-smi -L
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

GPU 0: Tesla T4 (UUID: GPU-0c3e659f-2933-c77e-7694-6112031f1cef)

I've tried not executing the line !export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, but it doesn't make a difference.

I've also made sure that I call spacy train with --gpu-id 0.

Here's the exact steps of the Colab notebook I use:

Colab notebook

!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

!pip install spacy[cuda12x,transformers] transformers[sentencepiece]

!pip freeze | grep cupy

cupy-cuda12x==12.2.0

!python -m spacy download en_core_web_trf

!nvidia-smi -L
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

GPU 0: Tesla T4 (UUID: GPU-0c3e659f-2933-c77e-7694-6112031f1cef)

!pip3 freeze | grep torch

torch @ https://download.pytorch.org/whl/cu121/torch-2.3.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=0a12aa9aa6bc442dff8823ac8b48d991fd0771562eaa38593f9c8196d65f7007
torchaudio @ https://download.pytorch.org/whl/cu121/torchaudio-2.3.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=38b49393f8c322dcaa29d19e5acbf5a0b1978cf1b719445ab670f1fb486e3aa6
torchsummary==1.5.1
torchtext==0.18.0
torchvision @ https://download.pytorch.org/whl/cu121/torchvision-0.18.0%2Bcu121-cp310-cp310-linux_x86_64.whl#sha256=13e1b48dc5ce41ccb8100ab3dd26fdf31d8f1e904ecf2865ac524493013d0df5

!python -m spacy train ./trf_config.cfg --output ./spacy_trained_pipeline_en --paths.train "train.spacy" --paths.dev "dev.spacy" --gpu-id 0

Could you please give me a hand? Thanks a lot!

Info about spaCy

spaCy version: 3.7.4
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python version: 3.10.12
Pipelines: en_core_web_trf (3.7.3), en_core_web_sm (3.7.1)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch RuntimeError when enabling mixed precision in transformer (roberta-base) #13512

PyTorch RuntimeError when enabling mixed precision in transformer (roberta-base) #13512

ferranconde commented May 30, 2024

PyTorch RuntimeError when enabling mixed precision in transformer (roberta-base) #13512

PyTorch RuntimeError when enabling mixed precision in transformer (roberta-base) #13512

Comments

ferranconde commented May 30, 2024

How to reproduce the behaviour

Info about spaCy