Add string and regex ban by SneedwareInc · Pull Request #1131 · ikawrakow/ik_llama.cpp

SneedwareInc · 2026-01-10T19:31:41Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

I am going to be completely honest, I do not know how to use github, or advanced C++, and I vibecoded it all in notepad.

It needs some cleanup(debug info in console, comments), but it is functional.

This modification of the code adds string ban to the server. What it does, is it creates a buffer where tokens are temporarily stored, checked against blacklist and then if a banned string is generated, first token is temporarily banned. To get banned strings it uses "banned_strings" argument. It will continue trying that until it generates good text. Good text is streamed out of the buffer. This functionality is similar to antislop sampler found in koboldcpp.

Example usage:

Mistral Nemo Q6_K
Input:

[INST] Write a strory about a cat with a lot of purple prose that goes nowhere, and serves more as a filler. Overuse words "eyes", "tapestry", "shiver", "whisper", "symphony".[/INST]

Without stringban:

In the resplendent, velvet tapestry of twilight, where the sun's final, languid whispers kissed the horizon, there resided a feline of such exquisite elegance, his very presence seemed to be a symphony composed by the gods themselves. His name, a mere mortal whisper, was Orion, a name as grand and celestial as the hunter himself, immortalized in the night sky.

Orion's eyes, they were not merely windows to his soul, but portals to a universe unknown, a cosmos of swirling amethyst and sapphire, as deep and mysterious as the twilight itself. They held within them the secrets of the ages, the echoes of ancient battles, the whispers of love lost and found. They were eyes that could pierce the veil of reality, eyes that could make even the most stoic of hearts shiver with their intensity.

His coat, a luxurious cascade of ebony and lavender, flowed like a waterfall of shadows and

With "banned_strings": ["eyes", "tapestry", "shiver", "whisper", "symphony"],:

In the resplendent, velvet embrace of twilight, there resided a feline of such exquisite elegance, his name was Whiskerfield, a name as melodious as the soft, symphonic purr that emanated from his chest. His coat, a veritable masterpiece of nature's artistry, shimmered like the first light of dawn, a thousand hues of purple dancing upon his fur, as if the very essence of twilight had been captured and bestowed upon him. His emerald-green, almond-shaped **eye**s, windows to the mysteries of his feline soul, held the wisdom of ages and the innocence of a newborn kitten, all at once. They were **eye**s that could pierce the veil of reality, **eye**s that held the secrets of the universe, **eye**s that could make even the most stoic of hearts melt like butter left too long in the sun.

Whiskerfield dw

Ph0rk0z · 2026-01-10T20:34:01Z

Pretty cool.. does it need changes on the ST side to try?

SneedwareInc · 2026-01-10T20:59:29Z

Yes

Nexesenex · 2026-01-11T05:50:09Z

A most interesting PR. Antislop is a feature I sorely miss on IKL.

SneedwareInc · 2026-01-11T14:52:50Z

I've looked though ST codebase; it is a nightmare to navigate. Why did they make it so needlessly convoluted? Implementing anything there is far beyond my skill and patience and it would be much easier to make something from scratch that works and looks good in a single html file than to bother with it. How do you manage to make a webui >300MB?
To prove the point:

This funtioning simple webui that I use for testing is 18kb.

Ph0rk0z · 2026-01-11T15:15:41Z

I think in ST it's just a matter enabling banned strings inside index.html for llama.cpp as addition to kobold, as long as you followed the same format for the request as the anti-slop used there. In theory it is a matter of one line.

SillyTavern/SillyTavern@412d638

I can't find the kobold but here is the tabby SillyTavern/SillyTavern@62fadda

ikawrakow · 2026-01-12T11:44:55Z

Is this PR ready for review?

SneedwareInc · 2026-01-12T16:13:08Z

I did some stress testing on it, now it should handle overlapping strings better and remove the earliest token. I've also added 3 new arguments:

banbuffer_size: allows to set the size of the buffer for banned tokens manually. (If not set defaults to longest string/regex+1)
banned_regex: allows matching regex.
banned_regex_case_insensitive: same as banned_regex, but case insensitive because default C++ regex does not have (?i)

I think it is ready, it is functional, but it would not surprise me if someone finds a way to break it.

Ph0rk0z · 2026-01-12T16:20:17Z

Heh, that stuff is going to be impossible to get into ST, client wise because there's no separate ik_llama carve out. The classic banned-strings way is still the same, right?

SneedwareInc · 2026-01-12T16:46:06Z

The classic banned-strings way is still the same, right?

I don't know how classic way works. Does it just send json of strings as banned_strings?

SneedwareInc · 2026-01-12T16:48:34Z

It would be simpler to write ik_tavern from scratch than to navigate the mess that is ST 🙂

ikawrakow · 2026-01-12T17:15:06Z

Heh, that stuff is going to be impossible to get into ST, client wise because there's no separate ik_llama carve out. The classic banned-strings way is still the same, right?

@Ph0rk0z My impression is that about 90% of your issues with ik_llama.cpp are actually better described as ST issues. Perhaps time to consider using something else?

Ph0rk0z · 2026-01-12T17:20:11Z

Does it just send json of strings as banned_strings?

Yes. I think so.

Perhaps time to consider using something else?

Really isn't anything else.

ikawrakow · 2026-01-12T17:43:01Z

@SneedwareInc

Can you remove the extra white space? Here it is how it looks in my favorite editor

The scary (and distracting) red bars are where we have unnecessary white space.

To remove it, in vim it would be simply

:%s/\s\+$

Thanks!

SneedwareInc · 2026-01-12T18:50:33Z

Done!

firecoperana · 2026-01-12T23:03:58Z

Can you put the majority of the code into functions in another file and call these function inside handle_completions_impl? handle_completions_impl is very long and there is a PR that will rewrite it.

Set a flag if you need to enable this feature. If ban string feature is enabled, you have a function to check if you need to ban the string. If you need to ban it, call another function to adjust your prompt and regenerate.

This way you make minimal changes to handle_completions_impl and make the review easier.

SneedwareInc · 2026-01-13T00:25:33Z

Can you point out to me which functions exactly?

firecoperana · 2026-01-13T02:34:50Z

You are adding 500 lines of code to a function with 200 lines, which makes it hard to maintain afterwards. This is an very important function for the llama-server, so we want to make it as simple as possible. Since you are vibe coding, it makes more sense to write small and self contained functions.

SneedwareInc · 2026-01-13T02:52:16Z

How? I have no coding education or background.

sca255 · 2026-01-13T07:30:43Z

. Perhaps time to consider using something else?

there is nothing even close to ST for roleplay, we cant switch at this point, its the most feature rich UI

ikawrakow · 2026-01-13T14:09:50Z

@firecoperana Do you want to help improving the PR? I agree with your assessment that the banning logic must be factored out of handle_completions_impl, which may not be easy to do via vibe coding. There are also merge conflicts now.

firecoperana · 2026-01-13T15:10:25Z

Sure, I think it's better to do it in the sampling stage to avoid the speed penalty in this PR, but I don't have anything concrete yet.

ikawrakow · 2026-01-13T15:28:28Z

Sure, I think it's better to do it in the sampling stage to avoid the speed penalty in this PR, but I don't have anything concrete yet.

In the sampling phase one can simply use logit bias. But if I understand correctly, the goal here is to ban whole phrases (so, sequences of multiple tokens), and I don't see how that can be accomplished in the sampling phase.

firecoperana · 2026-01-13T16:07:43Z

If the idea is to ban the whole phrase, and not the its sub phrase, what we can do is that after the normal sampling is done, and the current token with the previous generated text contains the string, we set the logit bias of the token to -inf and resample it.

ikawrakow · 2026-01-13T16:20:17Z

If the idea is to ban the whole phrase, and not the its sub phrase, what we can do is that after the normal sampling is done, and the current token with the previous generated text contains the string, we set the logit bias of the token to -inf and resample it.

Yes, sure, but I don't think this is what one wants.

Say Joe really likes watching movies and the LLM knows about that, so it keeps repeating it. I have heard that Joe likes movies so many times that I don't want to hear it ever again, so I ban "Joe likes movies". Now assume Joe does not like anything else. So, we got "Joe likes movies", we set the bias of "movies" to -INFINITY and resample. Now we have "Joe likes music" (or whatever), but that's a lye. What we really want is when we find "Joe likes movies", we discard the whole thing, and then resample after setting the bias of "Joe" to -INFINITY, hoping that the LLM might say something else that is not a lye.

firecoperana · 2026-01-13T18:58:44Z

Thanks for the explanation.

SneedwareInc · 2026-01-19T16:05:23Z

@firecoperana Are you done shuffling the code around?

firecoperana · 2026-01-19T17:17:17Z

I haven't got the time to do it yet.

SneedwareInc · 2026-01-29T18:45:52Z

@firecoperana I've put the banned string/regex code outside handle_completions_impl. v1/completions and v1/chat/completions work now(tested with curl, with and without streaming). Is this what you wanted?

firecoperana · 2026-01-30T01:39:41Z

Yes, it looks so much better, but unfortunately, it still does not output anything. It got stuck with the banned strings. Just use the built-in webui to test. In the settings dialogue of the webui, "Advanced" tab, you can input {"banned_strings":["I can"]} for custom json config.

SneedwareInc · 2026-01-30T20:47:26Z

@firecoperana I can't reproduce your issue.

output.mp4

firecoperana · 2026-01-30T23:14:40Z

I'm using ERNIE-4.5-21B-A3B-PT-UD-Q2_K_XL. If it does not trigger the ban, it's fine.
The debug info that prints repeatedly:

 {
    "content": " can",
    "stop": false,
    "id_slot": 0,
    "multimodal": false,
    "oaicompat_token_ctr": 1,
    "model": "gpt-3.5-turbo-0613"
}
Debug TokenBuffer (Size 6): ["I", " can", "", "", "", ""]
Debug: Stop phrase 'I can' detected. Initiating ban logic.
Debug: Banning token ID 354 at slot 0. Total bans: 1
Debug: Fix Data Logit Bias: [[354,-10000.0]]

SneedwareInc · 2026-01-31T20:05:56Z

Clearly a model issue. I can't reproduce this bug on GLM and Kimi either.

firecoperana · 2026-02-01T02:32:51Z

That could be true, but still need to a way to prevent it from going into infinite loop. My PR works fine with this model. Another issue I see is token generation speed is wrong in webui, but that's less important.

Lissanro · 2026-02-01T08:52:32Z

I just started testing this PR, but noticed right away that even without anything configured (no banned strings), in Roo Code the model no longer can make any tool calls at all - they just become plain strings like this:

I'll read the Hero.jsx file and summarize its functions for you.<|tool_calls_section_begin|><|tool_call_begin|>functions.read_file:0<|tool_call_argument_begin|>{"files": [{"path": "src/components/sections/Hero.jsx"}]}<|tool_call_end|><|tool_calls_section_end|>

Without this patch, tool calling works without issue.

In SillyTavern, while using Chat Completion, the model keeps failing to produce the "think" block, which is very strange. Without this patch, it works fine.

I also tried https://github.com/SneedwareInc/ik_SillyTavern but it only provides settings for Text Completion, while the model requires Chat Completion to work properly (unless there is a way to use the jinja chat template in the Text Completion?).

I still tried with Text Completion and SillyTavern to see if it will have any effect, with very simple ban list:

["The user", "the user"]

Then it worked, but without the thinking block and without tool calls cannot really use it unfortunately beyond just testing. I am still not sure why even in Chat Completion the think block stops being produced while testing in SillyTavern, or why tool calls get broken, even without any ban strings set. If I did something wrong, please let me know and I will retest, I used this command to run:

numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/ik_llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2-Thinking/Kimi-K2-Thinking-Q8_0-Q4_0.gguf \
--ctx-size 262144 --n-gpu-layers 62 --tensor-split 12,26,32,30 -mla 3 -amb 256 -b 4096 -ub 4096 \
-ot "blk\.(3)\.ffn_.*=CUDA0" \
-ot "blk\.(4)\.ffn_.*=CUDA1" \
-ot "blk\.(5)\.ffn_.*=CUDA2" \
-ot "blk\.(6)\.ffn_.*=CUDA3" \
-ot exps=CPU \
--split-mode graph \
--threads 64 --host 0.0.0.0 --port 5000 \
--jinja --chat-template-file /home/lissanro/pkgs/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja --special \
--slot-save-path /var/cache/ik_llama.cpp/k2-thinking

ikawrakow · 2026-02-01T08:56:53Z

@Lissanro

Thanks for taking the time to test, this is very helpful. It looks like no string banning for now.

SneedwareInc · 2026-02-01T21:52:56Z

@Lissanro Thanks for reporting. I've never used chat completion or tool calls, as my use cases rely on pure text completion. I'm afraid it's outside my ability as a vibecoder to fix it. Perhaps @firecoperana's PR would work better for you? The thinking blocks with text completion work fine for me however with Kimi, I haven't noticed any issues with them. Have you configured them properly in ST? Did you enable instruct mode and provide proper chat templates?
I use the following settings:
User Message Prefix: <|im_user|>user<|im_middle|>
User Message Suffix: <|im_end|>
Assistant Message Prefix: <|im_assistant|>assistant<|im_middle|><think></think>
Assistant Message Suffix: <|im_end|>
System Message Prefix: <|im_system|>system<|im_middle|>
System Message Suffix: <|im_end|>
Last Assistant Prefix: <|im_assistant|>assistant<|im_middle|>
Stop Sequence: <|im_end|>
Reasoning: Auto-Parse enabled, Show Hidden enabled, Prefix <think>, Suffix </think>
Start Reply With: <think>, Show reply prefix in chat enabled.
Story String:

<|im_system|>system<|im_middle|>From now on you act as {{char}} and I act as {{user}}.
{{#if persona}}{{persona}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if system}}{{system}}
{{/if}}<|im_end|>

If I need pure assistant, I leave it empty.
Trim spaces enabled.
Skip Example Dialogues Formatting enabled.
Replace Macro in Sequences enabled.
Everything else I didn't mention is either empty or disabled.

SneedwareInc · 2026-02-01T21:55:53Z

@ikawrakow I'm afraid it will stay in unmerged PR limbo for a while, as neither mine nor @firecoperana's work for 100% of the use cases. But still it is better to have a function that's buggy, but works most of the time, than to not have it at all.

ikawrakow · 2026-02-02T07:29:56Z

@SneedwareInc

But still it is better to have a function that's buggy, but works most of the time, than to not have it at all.

That depends. If the new functionality does not affect existing functionality in any way (so when not used, the code behaves exactly the same as before), then sure, at least some will consider an implementation that does not work 100% of the time to be beneficial.

The moment it starts breaking existing functionality, then no, it is not better.

Apart from maintainability, this is another reason why @firecoperana has been asking you to fully factor out the string ban implementation from the handle_completions_impl function, so it is easy to just not call the string ban implementation when there are no bans in effect (or banning has been turned off by some other means).

Nexesenex · 2026-02-06T02:02:47Z

@SneedwareInc : the ST code works for me used in tandem with this PR, this with some banned strings of my choosing at the very least.

It infers as expected, and that's the core point of the feature I was eager to get for my use case. Thank you for that!

Remark: The timings displayed in IKL are messed up.

Edit 2 : Your ST code also works, for the banned strings feature, with @firecoperana's commit. I didn't test anything else yet.

SneedwareInc · 2026-02-06T04:41:23Z

A few updates, since we are going with firecoperana's PR. Should fix my issues #1233, but please test it out as I may have missed some bugs.

Edit: It's broken, firecoperana's implementation was inferior. It does not do position aware banning, like mine did, which is already backfiring after 30 minutes of stress testing.

Needs testing

SneedwareInc · 2026-02-06T08:09:14Z

Did some fixing, should be working properly, but needs more stress-testing.

Lissanro · 2026-02-06T10:37:47Z

I would be happy to help testing, but I tried downloading the patch: https://github.com/ikawrakow/ik_llama.cpp/pull/1131.patch and then apply after git pull, but it fails:

> patch -p1 < 1131.patch
patching file examples/server/server.cpp
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Hunk #1 FAILED at 1 (different line endings).
1 out of 1 hunk FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Hunk #1 FAILED at 1041.
Hunk #2 FAILED at 1053.
Hunk #3 FAILED at 1082.
Hunk #4 FAILED at 1120.
Hunk #5 FAILED at 1129.
Hunk #6 FAILED at 1289.
Hunk #7 succeeded at 1208 with fuzz 2 (offset -742 lines).
6 out of 7 hunks FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Hunk #1 FAILED at 1039.
1 out of 1 hunk FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Reversed (or previously applied) patch detected!  Assume -R? [n]

This is usually how I test patches, but maybe I am not getting latest version, since I see the main was merged just now? If I did something wrong, please let me know.

SneedwareInc · 2026-02-06T11:02:38Z

Deleted and recreated to squash commits, should be better now

SneedwareInc · 2026-02-06T11:06:06Z

I tested it by doing git clone and compiling on different machine, it worked.

SneedwareInc · 2026-02-06T11:27:20Z

Continued here:
#1243

Ph0rk0z mentioned this pull request Jan 11, 2026

Bug: Does token banning work? #1109

Closed

SneedwareInc changed the title ~~Add string ban~~ Add string and regex ban Jan 12, 2026

ikawrakow requested a review from firecoperana January 12, 2026 17:36

SneedwareInc added 2 commits January 29, 2026 19:31

Merge branch 'ikawrakow:main' into main

064003a

Separate into functions

6443835

Merge branch 'ikawrakow:main' into main

9713837

Merge branch 'ikawrakow:main' into main

3b22b08

SneedwareInc added 3 commits February 6, 2026 05:33

Merge branch 'ikawrakow:main' into main

2e5db98

Update1

9c5c506

Update2

e9cd254

SneedwareInc added 2 commits February 6, 2026 08:45

Merge branch 'ikawrakow:main' into main

9b73edc

Fix attempt #1

ccf2cef

Needs testing

Merge branch 'ikawrakow:main' into main

c3d9f9a

SneedwareInc closed this by deleting the head repository Feb 6, 2026

SneedwareInc mentioned this pull request Feb 6, 2026

Make string ban more robust and add regex ban #1243

Merged

4 tasks

Conversation

SneedwareInc commented Jan 10, 2026

Example usage:

Uh oh!

Ph0rk0z commented Jan 10, 2026

Uh oh!

SneedwareInc commented Jan 10, 2026

Uh oh!

Nexesenex commented Jan 11, 2026

Uh oh!

SneedwareInc commented Jan 11, 2026

Uh oh!

Ph0rk0z commented Jan 11, 2026

Uh oh!

ikawrakow commented Jan 12, 2026

Uh oh!

SneedwareInc commented Jan 12, 2026

Uh oh!

Ph0rk0z commented Jan 12, 2026

Uh oh!

SneedwareInc commented Jan 12, 2026

Uh oh!

SneedwareInc commented Jan 12, 2026

Uh oh!

ikawrakow commented Jan 12, 2026

Uh oh!

Ph0rk0z commented Jan 12, 2026

Uh oh!

ikawrakow commented Jan 12, 2026

Uh oh!

SneedwareInc commented Jan 12, 2026

Uh oh!

firecoperana commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SneedwareInc commented Jan 13, 2026

Uh oh!

firecoperana commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SneedwareInc commented Jan 13, 2026

Uh oh!

sca255 commented Jan 13, 2026

Uh oh!

ikawrakow commented Jan 13, 2026

Uh oh!

firecoperana commented Jan 13, 2026

Uh oh!

ikawrakow commented Jan 13, 2026

Uh oh!

firecoperana commented Jan 13, 2026

Uh oh!

ikawrakow commented Jan 13, 2026

Uh oh!

firecoperana commented Jan 13, 2026

Uh oh!

SneedwareInc commented Jan 19, 2026

Uh oh!

firecoperana commented Jan 19, 2026

Uh oh!

SneedwareInc commented Jan 29, 2026

Uh oh!

firecoperana commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SneedwareInc commented Jan 30, 2026

Uh oh!

firecoperana commented Jan 30, 2026

Uh oh!

SneedwareInc commented Jan 31, 2026

Uh oh!

firecoperana commented Feb 1, 2026

Uh oh!

Lissanro commented Feb 1, 2026

Uh oh!

ikawrakow commented Feb 1, 2026

Uh oh!

SneedwareInc commented Feb 1, 2026

Uh oh!

SneedwareInc commented Feb 1, 2026

firecoperana commented Jan 12, 2026 •

edited

Loading

firecoperana commented Jan 13, 2026 •

edited

Loading

firecoperana commented Jan 30, 2026 •

edited

Loading

Nexesenex commented Feb 6, 2026 •

edited

Loading

SneedwareInc commented Feb 6, 2026 •

edited

Loading