Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidelines on finetuning LLMs as policy models #170

Open
Saltychtao opened this issue Mar 6, 2024 · 1 comment
Open

Guidelines on finetuning LLMs as policy models #170

Saltychtao opened this issue Mar 6, 2024 · 1 comment
Assignees
Milestone

Comments

@Saltychtao
Copy link

Hello,

Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.

The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?

@saleml
Copy link
Collaborator

saleml commented Mar 6, 2024

Hello,

Thanks for raising the issue. This is an important question we're trying to address these days: how to allow for more flexible state spaces, including graphs for instance.

As of now, states need to be represented as tensors, so the natural way would be to consider long tensors that contain all information you need to transition from a state to another. In this case, maybe you can use some dimensions of the state to store the key-value cache, and some dimensions to store the decoded token indices

@josephdviviano josephdviviano added this to the V2 milestone Apr 16, 2024
@josephdviviano josephdviviano self-assigned this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants