Skip to content

Conversation

@mattdangerw
Copy link
Member

@mattdangerw mattdangerw commented Feb 2, 2024

This proposes a new style for backbone and tasks:

# === Layers ===
self.token_embedding = ...
self.transformer_layers = []
for i in range(num_layers):
    self.transformer_layers.append(...)

# === Functional Model ===
inputs = keras.Input(...)
x = self.token_embedding(inputs)
for layer in self.transformer_layers:
    x = layer(x)
outputs = x
super.__init__(inputs, outputs)

# === Config ===
self.num_layers = num_layers

The main goal is to allow more readable manipulations of the model, e.g. a custom LoRA routine:

backbone = GPT2Backbone(...)
for layer in self.transformer_layers:
    # Use different lora rank for different matrices.
    layer.self_attention.query_projection.enable_lora(16)
    layer.self_attention.key_projection.enable_lora(4)

@mattdangerw
Copy link
Member Author

Draft PR just for feedback for now. I've only touched less than half of the models. Tests will fail.

@mattdangerw mattdangerw force-pushed the new-backbone-style branch 4 times, most recently from 4c844a6 to acf72d8 Compare February 9, 2024 22:37
@mattdangerw
Copy link
Member Author

I also removed unused features from our pipeline model class, just to keep our tasks as clean as possible.

@mattdangerw mattdangerw force-pushed the new-backbone-style branch 3 times, most recently from 4e5c555 to 58ab131 Compare February 9, 2024 23:25
@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Feb 9, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Feb 9, 2024
@mattdangerw mattdangerw force-pushed the new-backbone-style branch 2 times, most recently from 39f2205 to 5f1f2b3 Compare February 10, 2024 00:31
This proposes a new style for backbone and tasks:

```python
self.token_embedding = ...
self.transformer_layers = []
for i in range(num_layers):
    self.transformer_layers.append(...)

inputs = keras.Input(...)
x = self.token_embedding(inputs)
for layer in self.transformer_layers:
    x = layer(x)
outputs = x
super.__init__(inputs, outputs)

self.num_layers = num_layers
```

The main goal is to allow more readable manipulations of the model,
e.g. a custom LoRA routine:

```python
backbone = GPT2Backbone(...)
for layer in self.transformer_layers:
    # Use different rank for different matrices.
    layer.self_attention.query_projection.enable_lora(16)
    layer.self_attention.key_projection.enable_lora(4)
```
@mattdangerw mattdangerw marked this pull request as ready for review February 10, 2024 01:36
@mattdangerw mattdangerw requested a review from fchollet February 10, 2024 02:30
Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@mattdangerw mattdangerw merged commit c3c268a into keras-team:master Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants