Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected is_sm80 to be true, but got false #101

Closed
awaelchli opened this issue Apr 5, 2023 · 14 comments
Closed

Expected is_sm80 to be true, but got false #101

awaelchli opened this issue Apr 5, 2023 · 14 comments
Assignees

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Apr 5, 2023

I tried running the finetuning scripts on a 3090 GPU and got this error:

/home/adrian/repositories/lightning-llama/lit_llama/model.py:43: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.)
  ).to(complex_dtype)
Traceback (most recent call last):
  File "/home/adrian/repositories/lightning-llama/finetune_adapter.py", line 201, in <module>
    main()
  File "/home/adrian/repositories/lightning-llama/finetune_adapter.py", line 67, in main
    train(fabric, model, optimizer, train_data, val_data)
  File "/home/adrian/repositories/lightning-llama/finetune_adapter.py", line 97, in train
    fabric.backward(loss / gradient_accumulation_steps)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 365, in backward
    self._precision.backward(tensor, module, *args, **kwargs)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/lightning/fabric/plugins/precision/amp.py", line 70, in backward
    super().backward(tensor, model, *args, **kwargs)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/lightning/fabric/plugins/precision/precision.py", line 81, in backward
    tensor.backward(*args, **kwargs)
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/adrian/anaconda3/envs/lit-llama/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected is_sm80 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

This was on the branch of #100 where I added the EmptyInitOnDevice() context manager. It looks like the conversion to complex_dtype caused problems in the backward.

Both

python finetune_lora.py

and

python finetune_adapter.py

fail with this error.

@lantiga
Copy link
Collaborator

lantiga commented Apr 5, 2023

@t-vi does this ring any bells?

@t-vi
Copy link
Contributor

t-vi commented Apr 5, 2023

unfortunately not, but I'll be sure to dig into it.

@lantiga
Copy link
Collaborator

lantiga commented Apr 5, 2023

If we could get rid of that complex op in the RoPE implementation and still match up the results it would unblock a ton (see test_rope.py)

@t-vi
Copy link
Contributor

t-vi commented Apr 5, 2023

There is a known and fixed upstream bug about this check, maybe try a nightly?

@t-vi
Copy link
Contributor

t-vi commented Apr 5, 2023

but I can expand the rope to use reals if that helps. gets rid of the stupid warning, too.

@awaelchli
Copy link
Contributor Author

Thanks @t-vi
Indeed nightly worked! So that seems unrelated to the complex issue then? It might just show up in that line the first time?

@lantiga
Copy link
Collaborator

lantiga commented Apr 6, 2023

Closing as nightly has solved it and we reference the workaround in the README.

@AurelienSaussay
Copy link

AurelienSaussay commented Apr 14, 2023

Hi all, I was still encountering with PyTorch nightly (as of 2023-04-13) on an A10 while running LoRA finetuning.

As a temp fix, I have found that disabling flash attention backend in the scaled dot-product attention calculation around the loss function resolved the issue. In finetuning_lora.py, simply replace lines 93-96 with:

        with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable:
            input_ids, targets = get_batch(fabric, train_data)
            logits = model(input_ids)
            loss = loss_fn(logits, targets)
            fabric.backward(loss)

@lantiga
Copy link
Collaborator

lantiga commented Apr 14, 2023

Oh interesting, thanks for bringing this up @AurelienSaussay

@lantiga
Copy link
Collaborator

lantiga commented Apr 14, 2023

I'm imagining the same issue comes up with LLaMA-Adapter on A10, can you confirm?

@lantiga
Copy link
Collaborator

lantiga commented Apr 14, 2023

Also, the autocast part should be already taken care of by model, optimizer = fabric.setup(model, optimizer), while disabling flash attention can be done globally too, which avoids modifying the code

torch.backends.cuda.enable_flash_sdp(False)

Do you confirm @awaelchli ?

@awaelchli
Copy link
Contributor Author

Yes I downgraded to torch 2.0 and was able to prevent the issue with torch.backends.cuda.enable_flash_sdp(False) as well.

@lantiga
Copy link
Collaborator

lantiga commented Apr 14, 2023

So let's add this line (commented) to the scripts and mention in the README to uncomment that line if that error comes up.

@Lingeswaran-S
Copy link

Lingeswaran-S commented Jul 17, 2023

Can anyone help me with this.

{'eval_interval': 600, 'save_interval': 1000, 'eval_iters': 100, 'log_interval': 1, 'devices': 1, 'learning_rate': 0.003, 'batch_size': 128.0, 'micro_batch_size': 2, 'gradient_accumulation_iters': 64.0, 'epoch_size': 50000, 'num_epochs': 5, 'max_iters': 125000, 'weight_decay': 0.02, 'warmup_steps': 781.0}
Global seed set to 1337
Loading model 'checkpoints/stabilityai/stablelm-tuned-alpha-3b/lit_model.pth' with {'org': 'stabilityai', 'name': 'stablelm-tuned-alpha-3b', 'block_size': 4096, 'vocab_size': 50254, 'padding_multiple': 512, 'padded_vocab_size': 50688, 'n_layer': 16, 'n_head': 32, 'n_embd': 4096, 'rotary_percentage': 0.25, 'parallel_residual': True, 'bias': True, 'n_query_groups': 32, 'shared_attention_norm': False, '_norm_class': 'LayerNorm', 'norm_eps': 1e-05, '_mlp_class': 'GptNeoxMLP', 'intermediate_size': 16384, 'condense_ratio': 1, 'adapter_prompt_length': 10, 'adapter_start_layer': 2}
Number of trainable parameters: 2125248
Number of non trainable parameters: 3637051392
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/lingeswaran/0/AI/privateChat/testFalcon/lit-gpt/finetune/adapter_v2.py:305 in │
│ │
│ 302 │ │
│ 303 │ from jsonargparse.cli import CLI │
│ 304 │ │
│ ❱ 305 │ CLI(setup) │
│ 306 │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/env/lib/python3.10/site-packages/jsonargparse/_cli │
│ .py:85 in CLI │
│ │
│ 82 │ │ │ return parser │
│ 83 │ │ cfg = parser.parse_args(args) │
│ 84 │ │ cfg_init = parser.instantiate_classes(cfg) │
│ ❱ 85 │ │ return _run_component(component, cfg_init) │
│ 86 │ │
│ 87 │ subcommands = parser.add_subcommands(required=True) │
│ 88 │ comp_dict = {c.name: c for c in components} │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/env/lib/python3.10/site-packages/jsonargparse/_cli │
│ .py:147 in _run_component │
│ │
│ 144 def run_component(component, cfg): │
│ 145 │ cfg.pop("config", None) │
│ 146 │ if not inspect.isclass(component): │
│ ❱ 147 │ │ return component(**cfg) │
│ 148 │ subcommand = cfg.pop("subcommand") │
│ 149 │ if not subcommand: │
│ 150 │ │ return component(**cfg) │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/lit-gpt/finetune/adapter_v2.py:82 in setup │
│ │
│ 79 │ logger = step_csv_logger(out_dir.parent, out_dir.name, flush_logs_every_n_steps=log

│ 80 │ fabric = L.Fabric(devices=fabric_devices, strategy=strategy, precision=precision, lo │
│ 81 │ fabric.print(hparams) │
│ ❱ 82 │ fabric.launch(main, data_dir, checkpoint_dir, out_dir) │
│ 83 │
│ 84 │
│ 85 def main(fabric: L.Fabric, data_dir: Path, checkpoint_dir: Path, out_dir: Path): │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/env/lib/python3.10/site-packages/lightning/fabric/ │
│ fabric.py:789 in launch │
│ │
│ 786 │ │ │ │ f"To use the {type(self.strategy).__name__} strategy, .launch() need │
│ 787 │ │ │ │ " that contains the code to launch in processes." │
│ 788 │ │ │ ) │
│ ❱ 789 │ │ return self._wrap_and_launch(function, self, *args, **kwargs) │
│ 790 │ │
│ 791 │ def call(self, hook_name: str, *args: Any, **kwargs: Any) -> None: │
│ 792 │ │ """Trigger the callback methods with the given name and arguments. │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/env/lib/python3.10/site-packages/lightning/fabric/ │
│ fabric.py:871 in _wrap_and_launch │
│ │
│ 868 │ │ to_run = partial(self._wrap_with_setup, to_run) │
│ 869 │ │ if (launcher := self._strategy.launcher) is not None: │
│ 870 │ │ │ return launcher.launch(to_run, *args, **kwargs) │
│ ❱ 871 │ │ return to_run(*args, **kwargs) │
│ 872 │ │
│ 873 │ def _wrap_with_setup(self, to_run: Callable, *args: Any, **kwargs: Any) -> Any: │
│ 874 │ │ self._strategy.setup_environment() │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/env/lib/python3.10/site-packages/lightning/fabric/ │
│ fabric.py:876 in _wrap_with_setup │
│ │
│ 873 │ def _wrap_with_setup(self, to_run: Callable, *args: Any, **kwargs: Any) -> Any: │
│ 874 │ │ self._strategy.setup_environment() │
│ 875 │ │ with _replace_dunder_methods(DataLoader, "dataset"), _replace_dunder_methods(Bat │
│ ❱ 876 │ │ │ return to_run(*args, **kwargs) │
│ 877 │ │
│ 878 │ def _move_model_to_device(self, model: nn.Module, optimizers: List[Optimizer]) -> nn │
│ 879 │ │ initial_device = next(model.parameters(), torch.tensor(0)).device │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/lit-gpt/finetune/adapter_v2.py:121 in main │
│ │
│ 118 │ model, optimizer = fabric.setup(model, optimizer) │
│ 119 │ │
│ 120 │ train_time = time.time() │
│ ❱ 121 │ train(fabric, model, optimizer, train_data, val_data, checkpoint_dir, out_dir, speed │
│ 122 │ fabric.print(f"Training time: {(time.time()-train_time):.2f}s") │
│ 123 │ │
│ 124 │ # Save the final checkpoint at the end of training │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/lit-gpt/finetune/adapter_v2.py:140 in train │
│ │
│ 137 │ speed_monitor: SpeedMonitor, │
│ 138 ) -> None: │
│ 139 │ tokenizer = Tokenizer(checkpoint_dir) │
│ ❱ 140 │ max_seq_length, longest_seq_length, longest_seq_ix = get_max_seq_length(train_data) │
│ 141 │ │
│ 142 │ validate(fabric, model, val_data, tokenizer, longest_seq_length) # sanity check │
│ 143 │
│ │
│ /home/lingeswaran/0/AI/privateChat/testFalcon/lit-gpt/finetune/adapter_v2.py:283 in │
│ get_max_seq_length │
│ │
│ 280 def get_max_seq_length(data: List[Dict]) -> Tuple[int, int, int]: │
│ 281 │ # find out the minimum max_seq_length required during fine-tuning (saves memory!) │
│ 282 │ lengths = [len(d["input_ids"]) for d in data] │
│ ❱ 283 │ max_seq_length = max(lengths) │
│ 284 │ longest_seq_ix = lengths.index(max_seq_length) │
│ 285 │ # support easy override at the top of the file │
│ 286 │ return ( │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: max() arg is an empty sequence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@lantiga @AurelienSaussay @awaelchli @t-vi @Lingeswaran-S and others