n_keep parameter #19

GameOverFlowChart · 2024-05-03T20:24:14Z

I don't know if llama.rn exposes that parameter, if so could we add the option to set n_keep (and possibly also n_discard ?). The main reason I'm asking is this:
ggml-org/llama.cpp#3440 (comment)

Vali-98 · 2024-05-04T01:07:27Z

This value is not exposed by default, so no.

I will update this issue if that changes.

Vali-98 · 2024-08-06T15:03:08Z

Technically speaking, n_keep is always set to 4 now, even in llama.rn.

Is this feature still necessary? @GameOverFlowChart

GameOverFlowChart · 2024-08-07T14:40:17Z

Technically speaking, n_keep is always set to 4 now, even in llama.rn.

Is this feature still necessary? @GameOverFlowChart

I think it would be still useful, for example to ensure that the system message is always kept. If I understand it correctly, 4 is like the minimum to have the speed gain but setting it higher should also work.
But having it on 4 is better than keeping nothing for sure (I guess that was the case before?)

Vali-98 · 2024-08-07T15:00:17Z

I think it would be still useful, for example to ensure that the system message is always kept.

This actually isn't necessary, as CUI controls the prompt building JS side, not via lcpp.

All prompts are always:

[system][examples if context has space][context until context limit]

Examples and older context are phased out as the limit is reached, and context shift ensures that this doesn't incur a full reprocess.

Is there any other reason to implement n_keep?

GameOverFlowChart · 2024-08-07T16:55:55Z

I think it would be still useful, for example to ensure that the system message is always kept.

This actually isn't necessary, as CUI controls the prompt building JS side, not via lcpp.

All prompts are always:

[system][examples if context has space][context until context limit]

Examples and older context are phased out as the limit is reached, and context shift ensures that this doesn't incur a full reprocess.

Is there any other reason to implement n_keep?

Wait so system message is already always kept? So the n_keep of 4 means the first 4 tokens is from the examples or the rest of the context? I think if the system message is already kept than we don't even need n_keep to be 4 it could be 0 or am I missing something? Except for the case that there is no system message (I don't think a system message between 1 and 4 would ever make sense).

And yes if the system message is always kept, I wouldn't need that parameter.

Edit: I hope the system suffix is kept together with the message and instruct tokens are also considered in the calculations for the context.

Vali-98 · 2024-08-08T00:49:42Z

Edit: I hope the system suffix is kept together with the message and instruct tokens are also considered in the calculations for the context.

Yes this is the case, I think we can close this issue as n_keep relies on llama.cpp for handling prompt building which is not used.

Vali-98 closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n_keep parameter #19

n_keep parameter #19

GameOverFlowChart commented May 3, 2024 •

edited

Loading

Vali-98 commented May 4, 2024 •

edited

Loading

Vali-98 commented Aug 6, 2024 •

edited

Loading

GameOverFlowChart commented Aug 7, 2024

Vali-98 commented Aug 7, 2024 •

edited

Loading

GameOverFlowChart commented Aug 7, 2024 •

edited

Loading

Vali-98 commented Aug 8, 2024

n_keep parameter #19

n_keep parameter #19

Comments

GameOverFlowChart commented May 3, 2024 • edited Loading

Vali-98 commented May 4, 2024 • edited Loading

Vali-98 commented Aug 6, 2024 • edited Loading

GameOverFlowChart commented Aug 7, 2024

Vali-98 commented Aug 7, 2024 • edited Loading

GameOverFlowChart commented Aug 7, 2024 • edited Loading

Vali-98 commented Aug 8, 2024

GameOverFlowChart commented May 3, 2024 •

edited

Loading

Vali-98 commented May 4, 2024 •

edited

Loading

Vali-98 commented Aug 6, 2024 •

edited

Loading

Vali-98 commented Aug 7, 2024 •

edited

Loading

GameOverFlowChart commented Aug 7, 2024 •

edited

Loading