`BetterTransformer` optimizations can't be applied to `Falcon`

### System Info

```shell
Python 3.10, optimum @ main, transformers @ main
```


### Who can help?

@fxmarty

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

Reproduction:

```py
from transformers import AutoTokenizer, AutoModelForCausalLM
from optimum.bettertransformer import BetterTransformer
import torch

model_id = "tiiuae/falcon-rw-1b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model = BetterTransformer.transform(model)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
```

Falcon attention was refactored in https://github.com/huggingface/transformers/commit/05ea7b79e6903623e4d8e697c9be88462a8d8071#diff-81c616a9db6f569c579ccf03c30c2f69aa7b65fa40959ac7e882fb8d541891d7. This removed the property `maybe_rotary` and adopted llama conventions for rotary embeddings.

We could modify the use of `maybe_rotary` [here](https://github.com/huggingface/optimum/blob/ea4349d98f18b951c296ba919a5f38ace59d08da/optimum/bettertransformer/models/decoder_models.py#L256) by using something like:

```py
        submodules = ["query_key_value", "dense", "attention_dropout"]
        if not config.alibi:
            submodules.append("rotary_emb")
```

And then we'd need to adapt the code [here](https://github.com/huggingface/optimum/blob/ea4349d98f18b951c296ba919a5f38ace59d08da/optimum/bettertransformer/models/attention.py#L948), applying rotary embeddings when [`alibi` is not in use](https://github.com/huggingface/optimum/blob/ea4349d98f18b951c296ba919a5f38ace59d08da/optimum/bettertransformer/models/attention.py#L976).

### Expected behavior

Transformation would work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`BetterTransformer` optimizations can't be applied to `Falcon` #1543

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BetterTransformer optimizations can't be applied to Falcon #1543

Description

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`BetterTransformer` optimizations can't be applied to `Falcon` #1543