Quadratic sampling by kalomaze · Pull Request #5403 · oobabooga/textgen

kalomaze · 2024-01-30T15:46:37Z

Quadratic Sampling

The idea behind this is to simplify sampling as much as possible for the purposes of creative writing.

The design I've been testing (on a Mistral 7b so far) is "quadratic sampling". The way that it works is:

We transform each logit based on a quadratic function with a scaling factor & a reference value (h). A higher scaling factor will generally be more deterministic.
Logits closer to the reference value (which is the maximum logit) will be boosted in score, so that the top tokens become more evenly distributed, in order to avoid repetition and improve vocabulary usage
Because we are using the top logit as the reference value, the modifications should theoretically scale somewhat well across different models which have different "scales" (e.g. Yi 34b with its 64k vocab)
We inherently penalize small logits in the process of making the top ones more even, leading to a more coherent distribution overall without having to resort to cutting out tokens completely.

TLDR of what this means for an end user; sampler that can both make the model less deterministic while also punishing extremely low probability options.
Reasonable range from what I was testing is 0-1.0, 0.2-0.3 seemed like the optimal range to tinker with for creative outputs.

Can also be used to make it pseudo-deterministic in a way that will make extremely close options have less relative change that would be caused by a lower temperature value.

Higher smoothing_factor = more deterministic, lower = more even in top probabilities.

Values under 0.1 not recommended unless you're setting it at 0 to disable it.

Merge dev branch

Merge dev branch (oobabooga#5257)

BadisG · 2024-02-02T13:20:30Z

@Ph0rk0z Maybe by putting smoothing after MinP, it got way later in the order of samplers.

Let's see this situation:

Sampler1 -> Smoothing -> Sampler 2 -> Sampler 3 -> MinP -> Sampler 4

@kalomaze decided to move Smoothing only so this is what happened

Sampler1 ->  Sampler 2 -> Sampler 3 -> MinP -> Smoothing -> Sampler 4

The difference is quite huge, now Smoothing is being considered after 3 samplers instead of 1 in this example.

I feel like Smoothing didn't need to be moved (it had a fine order in the previous commit) and that MinP only had to be moved, MinP should be first (or second if we consider temperature) in the order sampler no matter what, that's my 2 cents.

Ph0rk0z · 2024-02-03T18:33:53Z

Looking through there, it doesn't seem like there's another place it could go. Those other samplers aren't hijacked.

oobabooga · 2024-02-04T02:30:34Z

The way I see it, this parameter is an alternative to temperature. So maybe TemperatureLogitsWarperWithDynatemp should be renamed to something more general like ModifiedTemperatureLogitsWarper, and the transformation should go there and be applied when smoothing_factor > 0. Then temperature_last will automatically make smoothing_factor be applied last.

BadisG · 2024-02-04T09:56:34Z

@oobabooga I'm not sure it's a good idea to include the smoothing sampler on the "temp_last", I was using this order:

MinP -> Smoothing -> Temp

And now I feel it's not possible anymore with this new configuration, right?

Will you consider making a custom sampler order feature one day? That would fix this issue quite easily, I think that having the right order can make a big impact of the output.

kalomaze · 2024-02-04T10:50:17Z

The way I see it, this parameter is an alternative to temperature. So maybe TemperatureLogitsWarperWithDynatemp should be renamed to something more general like ModifiedTemperatureLogitsWarper, and the transformation should go there and be applied when smoothing_factor > 0. Then temperature_last will automatically make smoothing_factor be applied last.

I strongly oppose this. The Temperature can change the base relative distance between the probabilities, and the quadratic transformation will naturally change as a direct consequence of this, unless my tests with the log prob viewer were wrong

kalomaze · 2024-02-04T11:18:53Z

Alright, so I think I was wrong to some extent. While the relationship between the two values isn't perfectly linear, it's predictable.

The Temperature value squared x 0.25 will always result in the same output, so this means

4.0 Temperature & 4.0 Smoothing
1.0 Temperature & 0.25 Smoothing
5.0 Temperature & 6.25 Smoothing

Should all be equivalent transformations on the log probs (?)

EDIT: I'm not confident that you can estimate a smoothing value that, by itself, will be the same as high temp + high smoothing combinations in all cases, so I think that the change to replace Temperature completely probably takes away some degree of control, as I'd initially thought.

In any case, it might be best to keep both options instead of arbitrarily grouping them together if just for the fact that it'd be easier to scale and control it that way (see what happened with DynaTemp as a range rather than it being two values.)

Here is a different branch that adds smoothing_last as a proper option in the meantime, for those who prefer the old behavior where it was considered separate to Temperature:
https://github.com/kalomaze/text-generation-webui/tree/quad-smooth-last

biship · 2024-02-04T11:55:17Z

Why not remove the binary 'x_last' sequencing, permit the users to order their samplers, and just recommend an order (and values) in each the chat template? It always seemed weird to me the only sequence we could change was temp, and that was either first or last. When min_p was 'discovered', it wasn't obvious it worked best with temp not last, so users created their chat templates wrong. Properly curated text-generation-webui chat templates go long way to helping users.

kalomaze · 2024-02-04T12:19:41Z

Having a customizable order is the ideal solution, yeah

oobabooga · 2024-02-04T21:30:11Z

About custom order for sampling parameters, I don't see a compelling reason for it. Why not also have the same parameter appearing 2 or more times in the stack, having temperature mixed with dynamic temperature, etc. It becomes a black box.

As I see it, there are 3 main types of parameters:

Those that remove tail tokens: top_p, min_p, top_k, typical_p, tfs, top_a, epsilon_cutoff, eta_cutoff
Temperature-like parameters that flatten the distribution or make it more peaked: temperature, dynamic temperature, quadratic sampling
Parameters that control repetition: repetition_penalty, presence_penalty, frequency_penalty

For the sake of interpretability and simplicity, I believe in using only 1 parameter of each type. I have never seen a reason to not apply the repetition penalty first, so temperature_last is sufficient for changing the order of the tail cutoff parameter and the temperature parameter (whatever each one may be). This is also why I don't see any reason to mix quadratic sampling with temperature, just like dynamic temperature is not currently mixed with temperature.

Ph0rk0z · 2024-02-05T01:39:07Z

Tested current on the same preset I was using and nothing broke. I tried dynamic temp and regular temp with qSampling, I kept going back to low temperatures anyway. Don't know if that applies to all models.

BadisG · 2024-02-05T06:43:01Z

Tested current on the same preset I was using and nothing broke. I tried dynamic temp and regular temp with qSampling, I kept going back to low temperatures anyway. Don't know if that applies to all models.

@Ph0rk0z The merged version deactivates temp when smoothing is applied, so no matter what temp you're using it won't change anything, tbh I prefered when I had the control over the two samplers at the same time, they don't have exactly a similar effect so you could find a nice combo out of it.

BadisG · 2024-02-05T06:52:04Z

About custom order for sampling parameters, I don't see a compelling reason for it. Why not also have the same parameter appearing 2 or more times in the stack, having temperature mixed with dynamic temperature, etc. It becomes a black box.

As I see it, there are 3 main types of parameters:

Those that remove tail tokens: top_p, min_p, top_k, typical_p, tfs, top_a, epsilon_cutoff, eta_cutoff

Temperature-like parameters that flatten the distribution or make it more peaked: temperature, dynamic temperature, quadratic sampling

Parameters that control repetition: repetition_penalty, presence_penalty, frequency_penalty

For the sake of interpretability and simplicity, I believe in using only 1 parameter of each type. I have never seen a reason to not apply the repetition penalty first, so temperature_last is sufficient for changing the order of the tail cutoff parameter and the temperature parameter (whatever each one may be). This is also why I don't see any reason to mix quadratic sampling with temperature, just like dynamic temperature is not currently mixed with temperature.

@oobabooga Even if we consider that samplers can be summed up into 3 groups, I still don't think it's that simplistic.

Let's say top_a comes before temp and it works fine, you can't just assume that all the tail token removers should be before temp, maybe for min_p, it works better if it's applied after temperature.
Mixing multiple "same groupey" samplers is fine, NovelAI is using a preset that mixes top_a and tfs and it gives great results in practice, so that alone adds to the complexity and suggests that having the control over the sampler order might be a good addition to that.

So yeah, I'd also like to have this feature to do some experiments, even if it's as an extension, I wouldn't mind. If we can manage to squeeze out more performance out of our current models with a better sampler order, there's no reason to not go for it in my opinion.

Ph0rk0z · 2024-02-05T11:43:07Z

The merged version deactivates temp when smoothing is applied, so no matter what temp you're using it won't change anything,

I know it does now. I mean in the previous versions.

I prefered when I had the control over the two samplers at the same time

Yea, it's not ideal but at least it still works. I too mix min_P with typical_P for instance. I would be devastated to not be able to have both there. The latter makes it so I barely have to do any repetition penalty.

biship · 2024-02-05T12:21:59Z

About custom order for sampling parameters, I don't see a compelling reason for it. Why not also have the same parameter appearing 2 or more times in the stack, having temperature mixed with dynamic temperature, etc. It becomes a black box.

As I see it, there are 3 main types of parameters:

Those that remove tail tokens: top_p, min_p, top_k, typical_p, tfs, top_a, epsilon_cutoff, eta_cutoff

Temperature-like parameters that flatten the distribution or make it more peaked: temperature, dynamic temperature, quadratic sampling

Parameters that control repetition: repetition_penalty, presence_penalty, frequency_penalty

For the sake of interpretability and simplicity, I believe in using only 1 parameter of each type. I have never seen a reason to not apply the repetition penalty first, so temperature_last is sufficient for changing the order of the tail cutoff parameter and the temperature parameter (whatever each one may be). This is also why I don't see any reason to mix quadratic sampling with temperature, just like dynamic temperature is not currently mixed with temperature.

I respect your viewpoint, but that is how is it today .
As a coder, isn't it easier to implement something that isn't constructed around a bunch of assumptions and existing patterns?
Remove all the logic behind why things would and wouldn't be combined or sequenced in certain orders.
Let the user sequence and enable/disable them as they see fit - even if it to their detriment.
It has to be a simpler solution to implement and maintain when new samplers show up.
Just guide users down the correct paths with templates that work.

oobabooga · 2024-02-05T14:32:48Z

@BadisG @biship I have added this option here: #5443

Tests are welcome.

akujinnoninjin · 2024-02-21T16:40:50Z

Alright, so I think I was wrong to some extent. While the relationship between the two values isn't perfectly linear, it's predictable.

The Temperature value squared x 0.25 will always result in the same output, so this means

4.0 Temperature & 4.0 Smoothing

1.0 Temperature & 0.25 Smoothing

5.0 Temperature & 6.25 Smoothing

Should all be equivalent transformations on the log probs (?)

EDIT: I'm not confident that you can estimate a smoothing value that, by itself, will be the same as high temp + high smoothing combinations in all cases, so I think that the change to replace Temperature completely probably takes away some degree of control, as I'd initially thought.

In any case, it might be best to keep both options instead of arbitrarily grouping them together if just for the fact that it'd be easier to scale and control it that way (see what happened with DynaTemp as a range rather than it being two values.)

Here is a different branch that adds smoothing_last as a proper option in the meantime, for those who prefer the old behavior where it was considered separate to Temperature: https://github.com/kalomaze/text-generation-webui/tree/quad-smooth-last

I've been playing around on koboldcpp, and noticed a similar behavior: it's pretty consistent that increasing the temperature by a factor of N can be 'cancelled out' by increasing the smoothing by a factor of N^2. I have confirmed in testing that the generations from [T=1, S=0.25], [T=2, S=1], [T=3, S=2.25], [T=4, S=4], [T=0.5, S=0.0625] are all identical at fixed seeds (ie when N is 1/2/3/4/0.5 for that initial pair of values). I've also repeated this for a few other initial values, and the pattern has held up.

Looking at the code, I think this makes some sense. At first glance, I wasn't sure - the smoothing factor is only applied to the quadratic difference, and the temperature is applied to the whole logitprob; however, because of the normalisation that happens with softmax during the smoothing function, I believe that the effects of the "h" value are essentially being cancelled out due to it being constant across all tokens. I think.

Either way, there is one suggestion I would like to make based on this behavior, regarding the implementation with Dynamic Temperature:

As mentioned, the effects of the quadratic sampling are dependent on the combination of temperature and smoothing factor. However in its current implementation in Dynamic Temp, the smoothing factor remains constant no matter what the temperature is adjusted to. This means that when the temperature is adjusted below 1, the chosen smoothing factor is effectively increased, and when the temperature is adjusted above 1 it is decreased.

As an example, consider a dynamic temp range of 0.5 to 2, with a smoothing factor of 0.25:

At the low end, that's equivalent to [1, 1]
At the high end, that's equivalent to [1, 0.0625]
And if you increased the max to 5, it's the equivalent of [1, 0.01]!

I think it could potentially make more sense to adjust the value based on temperature for consistency. If you take the chosen smoothing factor as being at Temp=1, then you can multiply it by the square of the actual temperature to get a consistent effective value across all ranges.... but I'm not sure if that's going to just obviate temperature entirely.

Edit: I suppose potentially you could go a stage further, and invert the relationship; make the smoothing *more" deterministic at higher temperatures, and less so at lower. Even add a second control factor, to allow that relationship to be adjusted separately. My first thought is adding a (constant+1) that you multiply by that square of the temperature. At 0 the smoothing would be flat, at -1 you'd get the current behavior, and at +1 you'd get the inverse. But that's just off the top of my head; I'd have to see the actual logigprobs to see if that was actually worth doing. Do you have a public copy of your matplotlib graph setups anywhere?

Edit2: I originally said divide when it should be multiply

--------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>

oobabooga and others added 29 commits December 14, 2023 22:39

Merge pull request oobabooga#4927 from oobabooga/dev

c3e0fcf

Merge dev branch

Merge pull request oobabooga#4937 from oobabooga/dev

443be39

Merge dev branch

Merge pull request oobabooga#4961 from oobabooga/dev

7be0983

Merge dev branch

Merge pull request oobabooga#4980 from oobabooga/dev

b28020a

Merge dev branch

Merge pull request oobabooga#4988 from oobabooga/dev

781367b

Merge dev branch

Merge pull request oobabooga#5002 from oobabooga/dev

71eb744

Merge dev branch

Merge pull request oobabooga#5005 from oobabooga/dev

5b791ca

Merge dev branch

Merge pull request oobabooga#5011 from oobabooga/dev

c1f78db

Merge dev branch

Merge pull request oobabooga#5012 from oobabooga/dev

489f4a2

Merge dev branch

Merge pull request oobabooga#5022 from oobabooga/dev

11288d1

Merge dev branch

Merge pull request oobabooga#5039 from oobabooga/dev

4b25acf

Merge dev branch

Merge pull request oobabooga#5073 from oobabooga/dev

af87609

Merge dev branch

Merge pull request oobabooga#5078 from oobabooga/dev

19d1374

Merge dev branch

Merge pull request oobabooga#5100 from oobabooga/dev

3fd7073

Merge dev branch

Merge pull request oobabooga#5132 from oobabooga/dev

3e3a66e

Merge dev branch

Merge pull request oobabooga#5152 from oobabooga/dev

3f28925

Merge dev branch

Merge pull request oobabooga#5163 from oobabooga/dev

c54d1da

Merge dev branch

Merge pull request oobabooga#5181 from oobabooga/dev

8ea3f31

Merge dev branch

Merge pull request oobabooga#5195 from oobabooga/dev

e169993

Merge dev branch

Merge pull request oobabooga#5199 from oobabooga/dev

ad1ff53

Merge dev branch

Merge pull request oobabooga#5220 from oobabooga/dev

2dc8db8

Merge dev branch

Merge pull request oobabooga#5253 from oobabooga/dev

61e4bfe

Merge dev branch

Merge pull request oobabooga#5266 from oobabooga/dev

d8c3a5b

Merge dev branch (oobabooga#5257)

Noisy sampling HF implementation

f86339b

do min-max normalization before noising the logits

34597d7

replace noisy sampling logic with quadratic transformation

c67833d

rebrand it into the new quadratic sampler

a7fceea

missed one xd

a3c41af

the scale was way off lol

3e10e45

kalomaze mentioned this pull request Jan 30, 2024

Quadratic sampling kalomaze/text-generation-webui#3

Merged

AAbushady mentioned this pull request Feb 2, 2024

added quadratic sampling theroyallab/tabbyAPI#56

Merged

oobabooga added 5 commits February 3, 2024 18:51

Create ModifiedTemperatureLogitsWarper

142831c

Reorder UI elements

e155245

Remove the old class

ca328f6

Add documentation

da281dd

Lint

0bb1e44

oobabooga changed the base branch from main to dev February 4, 2024 03:19

oobabooga merged commit b6077b0 into oobabooga:dev Feb 4, 2024

Repository owner deleted a comment from Myobu1 Feb 5, 2024

BadisG mentioned this pull request Feb 5, 2024

Add custom sampler order support #5443

Merged

Technologicat mentioned this pull request Feb 12, 2024

[FEATURE_REQUEST] Adding another default character: an AI assistant SillyTavern/SillyTavern#1805

Closed

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Feb 22, 2024

Quadratic sampling (oobabooga#5403)

0910f60

--------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>

AlpinDale mentioned this pull request Mar 3, 2024

feat: quadratic + cubic sampling vllm-project/vllm#3167

Closed

kalomaze mentioned this pull request Apr 2, 2024

Smooth Sampling / Quadratic Sampling support ggml-org/llama.cpp#6445

Open

Conversation

kalomaze commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quadratic Sampling

Uh oh!

BadisG commented Feb 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ph0rk0z commented Feb 3, 2024

Uh oh!

oobabooga commented Feb 4, 2024

Uh oh!

BadisG commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kalomaze commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kalomaze commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

biship commented Feb 4, 2024

Uh oh!

kalomaze commented Feb 4, 2024

Uh oh!

oobabooga commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ph0rk0z commented Feb 5, 2024

Uh oh!

BadisG commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BadisG commented Feb 5, 2024

Uh oh!

Ph0rk0z commented Feb 5, 2024

Uh oh!

biship commented Feb 5, 2024

Uh oh!

oobabooga commented Feb 5, 2024

Uh oh!

akujinnoninjin commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kalomaze commented Jan 30, 2024 •

edited

Loading

BadisG commented Feb 2, 2024 •

edited

Loading

BadisG commented Feb 4, 2024 •

edited

Loading

kalomaze commented Feb 4, 2024 •

edited

Loading

kalomaze commented Feb 4, 2024 •

edited

Loading

oobabooga commented Feb 4, 2024 •

edited

Loading

BadisG commented Feb 5, 2024 •

edited

Loading

akujinnoninjin commented Feb 21, 2024 •

edited

Loading