Skip to content

Conversation

@SamanehSaadat
Copy link
Member

@SamanehSaadat SamanehSaadat commented Feb 27, 2024

This is part of addressing #1372 to add the Falcon model to KerasNLP. This PR adds the FalconBackbone.

Checkpoint conversion colab

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw Right now, I'm calculating alibi in every layer although it doesn't change from layer to layer! It would be more efficient to just calculate it once in the backbone but backbone init doesn't know the shapes yet! If you have any suggestions to be able to calculate the alibi once, let me know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline. This is fine for now because it doesn't seem a really compute intensive part. Ideally, it would be better to analyze to see what's the impact of this repetition and avoid repetition if it has a significant impact on the performance of the model.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference.
Yeah! I was thinking of adding this as a follow up PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider modeling after this #1402 for a more fleshed out conversion script (though the PR i'm linking is missing the numerics validation). Will save a full preset directory and print emojis, both very important.

Also, could wait to shift to something like this after we add the tokenizer. Either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info! It would be great if I can add it later.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! feel free to merge if gpu testing looks good!

@SamanehSaadat SamanehSaadat merged commit 87eec69 into keras-team:master Mar 1, 2024
@SamanehSaadat SamanehSaadat deleted the falcon-backbone branch March 1, 2024 01:18
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
* Add Falcon backbone.

* Add docstring.

* Add dtype.

* Add checkpoint conversion script.

* Fix tests.

* Random fixes.

* Add cache.

* Cast cumsum to int32.

* Make sublayers public.

* Address backbone comments.

* Update attention computation to use einsum.

* Falcon only works with Keras3.

* Fix tests.

* Remove falcon_causal_lm file.

* Remove commented/unused codes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants