Guidelines on finetuning LLMs as policy models #170

Saltychtao · 2024-03-06T04:04:21Z

Hello,

Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.

The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?

saleml · 2024-03-06T05:39:13Z

Hello,

Thanks for raising the issue. This is an important question we're trying to address these days: how to allow for more flexible state spaces, including graphs for instance.

As of now, states need to be represented as tensors, so the natural way would be to consider long tensors that contain all information you need to transition from a state to another. In this case, maybe you can use some dimensions of the state to store the key-value cache, and some dimensions to store the decoded token indices

josephdviviano added this to the V2 milestone Apr 16, 2024

josephdviviano self-assigned this Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidelines on finetuning LLMs as policy models #170

Guidelines on finetuning LLMs as policy models #170

Saltychtao commented Mar 6, 2024

saleml commented Mar 6, 2024

Guidelines on finetuning LLMs as policy models #170

Guidelines on finetuning LLMs as policy models #170

Comments

Saltychtao commented Mar 6, 2024

saleml commented Mar 6, 2024