Skip to content

Add string and regex ban#1131

Closed
SneedwareInc wants to merge 15 commits intoikawrakow:mainfrom
SneedwareInc:main
Closed

Add string and regex ban#1131
SneedwareInc wants to merge 15 commits intoikawrakow:mainfrom
SneedwareInc:main

Conversation

@SneedwareInc
Copy link
Contributor

I am going to be completely honest, I do not know how to use github, or advanced C++, and I vibecoded it all in notepad.

It needs some cleanup(debug info in console, comments), but it is functional.

This modification of the code adds string ban to the server. What it does, is it creates a buffer where tokens are temporarily stored, checked against blacklist and then if a banned string is generated, first token is temporarily banned. To get banned strings it uses "banned_strings" argument. It will continue trying that until it generates good text. Good text is streamed out of the buffer. This functionality is similar to antislop sampler found in koboldcpp.

Example usage:

Mistral Nemo Q6_K
Input:

[INST] Write a strory about a cat with a lot of purple prose that goes nowhere, and serves more as a filler. Overuse words "eyes", "tapestry", "shiver", "whisper", "symphony".[/INST]

Without stringban:

In the resplendent, velvet tapestry of twilight, where the sun's final, languid whispers kissed the horizon, there resided a feline of such exquisite elegance, his very presence seemed to be a symphony composed by the gods themselves. His name, a mere mortal whisper, was Orion, a name as grand and celestial as the hunter himself, immortalized in the night sky.

Orion's eyes, they were not merely windows to his soul, but portals to a universe unknown, a cosmos of swirling amethyst and sapphire, as deep and mysterious as the twilight itself. They held within them the secrets of the ages, the echoes of ancient battles, the whispers of love lost and found. They were eyes that could pierce the veil of reality, eyes that could make even the most stoic of hearts shiver with their intensity.

His coat, a luxurious cascade of ebony and lavender, flowed like a waterfall of shadows and

With "banned_strings": ["eyes", "tapestry", "shiver", "whisper", "symphony"],:

In the resplendent, velvet embrace of twilight, there resided a feline of such exquisite elegance, his name was Whiskerfield, a name as melodious as the soft, symphonic purr that emanated from his chest. His coat, a veritable masterpiece of nature's artistry, shimmered like the first light of dawn, a thousand hues of purple dancing upon his fur, as if the very essence of twilight had been captured and bestowed upon him. His emerald-green, almond-shaped **eye**s, windows to the mysteries of his feline soul, held the wisdom of ages and the innocence of a newborn kitten, all at once. They were **eye**s that could pierce the veil of reality, **eye**s that held the secrets of the universe, **eye**s that could make even the most stoic of hearts melt like butter left too long in the sun.

Whiskerfield dw

@Ph0rk0z
Copy link

Ph0rk0z commented Jan 10, 2026

Pretty cool.. does it need changes on the ST side to try?

@SneedwareInc
Copy link
Contributor Author

Yes

@Nexesenex
Copy link
Contributor

A most interesting PR. Antislop is a feature I sorely miss on IKL.

@SneedwareInc
Copy link
Contributor Author

I've looked though ST codebase; it is a nightmare to navigate. Why did they make it so needlessly convoluted? Implementing anything there is far beyond my skill and patience and it would be much easier to make something from scratch that works and looks good in a single html file than to bother with it. How do you manage to make a webui >300MB?
To prove the point:
image
This funtioning simple webui that I use for testing is 18kb.

@Ph0rk0z
Copy link

Ph0rk0z commented Jan 11, 2026

I think in ST it's just a matter enabling banned strings inside index.html for llama.cpp as addition to kobold, as long as you followed the same format for the request as the anti-slop used there. In theory it is a matter of one line.

SillyTavern/SillyTavern@412d638

I can't find the kobold but here is the tabby SillyTavern/SillyTavern@62fadda

@ikawrakow
Copy link
Owner

Is this PR ready for review?

@SneedwareInc
Copy link
Contributor Author

I did some stress testing on it, now it should handle overlapping strings better and remove the earliest token. I've also added 3 new arguments:

  • banbuffer_size: allows to set the size of the buffer for banned tokens manually. (If not set defaults to longest string/regex+1)
  • banned_regex: allows matching regex.
  • banned_regex_case_insensitive: same as banned_regex, but case insensitive because default C++ regex does not have (?i)

I think it is ready, it is functional, but it would not surprise me if someone finds a way to break it.

@SneedwareInc SneedwareInc changed the title Add string ban Add string and regex ban Jan 12, 2026
@Ph0rk0z
Copy link

Ph0rk0z commented Jan 12, 2026

Heh, that stuff is going to be impossible to get into ST, client wise because there's no separate ik_llama carve out. The classic banned-strings way is still the same, right?

@SneedwareInc
Copy link
Contributor Author

The classic banned-strings way is still the same, right?

I don't know how classic way works. Does it just send json of strings as banned_strings?

@SneedwareInc
Copy link
Contributor Author

It would be simpler to write ik_tavern from scratch than to navigate the mess that is ST 🙂

@ikawrakow
Copy link
Owner

Heh, that stuff is going to be impossible to get into ST, client wise because there's no separate ik_llama carve out. The classic banned-strings way is still the same, right?

@Ph0rk0z My impression is that about 90% of your issues with ik_llama.cpp are actually better described as ST issues. Perhaps time to consider using something else?

@Ph0rk0z
Copy link

Ph0rk0z commented Jan 12, 2026

Does it just send json of strings as banned_strings?

Yes. I think so.

Perhaps time to consider using something else?

Really isn't anything else.

@ikawrakow
Copy link
Owner

@SneedwareInc

Can you remove the extra white space? Here it is how it looks in my favorite editor

image

The scary (and distracting) red bars are where we have unnecessary white space.

To remove it, in vim it would be simply

:%s/\s\+$

Thanks!

@SneedwareInc
Copy link
Contributor Author

Done!

@firecoperana
Copy link
Collaborator

firecoperana commented Jan 12, 2026

Can you put the majority of the code into functions in another file and call these function inside handle_completions_impl? handle_completions_impl is very long and there is a PR that will rewrite it.

Set a flag if you need to enable this feature. If ban string feature is enabled, you have a function to check if you need to ban the string. If you need to ban it, call another function to adjust your prompt and regenerate.

This way you make minimal changes to handle_completions_impl and make the review easier.

@SneedwareInc
Copy link
Contributor Author

Can you point out to me which functions exactly?

@firecoperana
Copy link
Collaborator

firecoperana commented Jan 13, 2026

You are adding 500 lines of code to a function with 200 lines, which makes it hard to maintain afterwards. This is an very important function for the llama-server, so we want to make it as simple as possible. Since you are vibe coding, it makes more sense to write small and self contained functions.

@SneedwareInc
Copy link
Contributor Author

How? I have no coding education or background.

@sca255
Copy link

sca255 commented Jan 13, 2026

. Perhaps time to consider using something else?

there is nothing even close to ST for roleplay, we cant switch at this point, its the most feature rich UI

@ikawrakow
Copy link
Owner

@firecoperana Do you want to help improving the PR? I agree with your assessment that the banning logic must be factored out of handle_completions_impl, which may not be easy to do via vibe coding. There are also merge conflicts now.

@firecoperana
Copy link
Collaborator

Sure, I think it's better to do it in the sampling stage to avoid the speed penalty in this PR, but I don't have anything concrete yet.

@ikawrakow
Copy link
Owner

Sure, I think it's better to do it in the sampling stage to avoid the speed penalty in this PR, but I don't have anything concrete yet.

In the sampling phase one can simply use logit bias. But if I understand correctly, the goal here is to ban whole phrases (so, sequences of multiple tokens), and I don't see how that can be accomplished in the sampling phase.

@firecoperana
Copy link
Collaborator

If the idea is to ban the whole phrase, and not the its sub phrase, what we can do is that after the normal sampling is done, and the current token with the previous generated text contains the string, we set the logit bias of the token to -inf and resample it.

@ikawrakow
Copy link
Owner

If the idea is to ban the whole phrase, and not the its sub phrase, what we can do is that after the normal sampling is done, and the current token with the previous generated text contains the string, we set the logit bias of the token to -inf and resample it.

Yes, sure, but I don't think this is what one wants.

Say Joe really likes watching movies and the LLM knows about that, so it keeps repeating it. I have heard that Joe likes movies so many times that I don't want to hear it ever again, so I ban "Joe likes movies". Now assume Joe does not like anything else. So, we got "Joe likes movies", we set the bias of "movies" to -INFINITY and resample. Now we have "Joe likes music" (or whatever), but that's a lye. What we really want is when we find "Joe likes movies", we discard the whole thing, and then resample after setting the bias of "Joe" to -INFINITY, hoping that the LLM might say something else that is not a lye.

@firecoperana
Copy link
Collaborator

Thanks for the explanation.

@SneedwareInc
Copy link
Contributor Author

@firecoperana Are you done shuffling the code around?

@firecoperana
Copy link
Collaborator

I haven't got the time to do it yet.

@SneedwareInc
Copy link
Contributor Author

@firecoperana I've put the banned string/regex code outside handle_completions_impl. v1/completions and v1/chat/completions work now(tested with curl, with and without streaming). Is this what you wanted?

@firecoperana
Copy link
Collaborator

firecoperana commented Jan 30, 2026

Yes, it looks so much better, but unfortunately, it still does not output anything. It got stuck with the banned strings. Just use the built-in webui to test. In the settings dialogue of the webui, "Advanced" tab, you can input {"banned_strings":["I can"]} for custom json config.
image

@SneedwareInc
Copy link
Contributor Author

@firecoperana I can't reproduce your issue.

output.mp4

@firecoperana
Copy link
Collaborator

I'm using ERNIE-4.5-21B-A3B-PT-UD-Q2_K_XL. If it does not trigger the ban, it's fine.
The debug info that prints repeatedly:

 {
    "content": " can",
    "stop": false,
    "id_slot": 0,
    "multimodal": false,
    "oaicompat_token_ctr": 1,
    "model": "gpt-3.5-turbo-0613"
}
Debug TokenBuffer (Size 6): ["I", " can", "", "", "", ""]
Debug: Stop phrase 'I can' detected. Initiating ban logic.
Debug: Banning token ID 354 at slot 0. Total bans: 1
Debug: Fix Data Logit Bias: [[354,-10000.0]]

@SneedwareInc
Copy link
Contributor Author

Clearly a model issue. I can't reproduce this bug on GLM and Kimi either.

@firecoperana
Copy link
Collaborator

That could be true, but still need to a way to prevent it from going into infinite loop. My PR works fine with this model. Another issue I see is token generation speed is wrong in webui, but that's less important.

@Lissanro
Copy link

Lissanro commented Feb 1, 2026

I just started testing this PR, but noticed right away that even without anything configured (no banned strings), in Roo Code the model no longer can make any tool calls at all - they just become plain strings like this:

I'll read the Hero.jsx file and summarize its functions for you.<|tool_calls_section_begin|><|tool_call_begin|>functions.read_file:0<|tool_call_argument_begin|>{"files": [{"path": "src/components/sections/Hero.jsx"}]}<|tool_call_end|><|tool_calls_section_end|>

Without this patch, tool calling works without issue.

In SillyTavern, while using Chat Completion, the model keeps failing to produce the "think" block, which is very strange. Without this patch, it works fine.

I also tried https://github.com/SneedwareInc/ik_SillyTavern but it only provides settings for Text Completion, while the model requires Chat Completion to work properly (unless there is a way to use the jinja chat template in the Text Completion?).

I still tried with Text Completion and SillyTavern to see if it will have any effect, with very simple ban list:

["The user", "the user"]

Then it worked, but without the thinking block and without tool calls cannot really use it unfortunately beyond just testing. I am still not sure why even in Chat Completion the think block stops being produced while testing in SillyTavern, or why tool calls get broken, even without any ban strings set. If I did something wrong, please let me know and I will retest, I used this command to run:

numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/ik_llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2-Thinking/Kimi-K2-Thinking-Q8_0-Q4_0.gguf \
--ctx-size 262144 --n-gpu-layers 62 --tensor-split 12,26,32,30 -mla 3 -amb 256 -b 4096 -ub 4096 \
-ot "blk\.(3)\.ffn_.*=CUDA0" \
-ot "blk\.(4)\.ffn_.*=CUDA1" \
-ot "blk\.(5)\.ffn_.*=CUDA2" \
-ot "blk\.(6)\.ffn_.*=CUDA3" \
-ot exps=CPU \
--split-mode graph \
--threads 64 --host 0.0.0.0 --port 5000 \
--jinja --chat-template-file /home/lissanro/pkgs/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja --special \
--slot-save-path /var/cache/ik_llama.cpp/k2-thinking

@ikawrakow
Copy link
Owner

@Lissanro

Thanks for taking the time to test, this is very helpful. It looks like no string banning for now.

@SneedwareInc
Copy link
Contributor Author

@Lissanro Thanks for reporting. I've never used chat completion or tool calls, as my use cases rely on pure text completion. I'm afraid it's outside my ability as a vibecoder to fix it. Perhaps @firecoperana's PR would work better for you? The thinking blocks with text completion work fine for me however with Kimi, I haven't noticed any issues with them. Have you configured them properly in ST? Did you enable instruct mode and provide proper chat templates?
I use the following settings:
User Message Prefix: <|im_user|>user<|im_middle|>
User Message Suffix: <|im_end|>
Assistant Message Prefix: <|im_assistant|>assistant<|im_middle|><think></think>
Assistant Message Suffix: <|im_end|>
System Message Prefix: <|im_system|>system<|im_middle|>
System Message Suffix: <|im_end|>
Last Assistant Prefix: <|im_assistant|>assistant<|im_middle|>
Stop Sequence: <|im_end|>
Reasoning: Auto-Parse enabled, Show Hidden enabled, Prefix <think>, Suffix </think>
Start Reply With: <think>, Show reply prefix in chat enabled.
Story String:

<|im_system|>system<|im_middle|>From now on you act as {{char}} and I act as {{user}}.
{{#if persona}}{{persona}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if system}}{{system}}
{{/if}}<|im_end|>

If I need pure assistant, I leave it empty.
Trim spaces enabled.
Skip Example Dialogues Formatting enabled.
Replace Macro in Sequences enabled.
Everything else I didn't mention is either empty or disabled.

@SneedwareInc
Copy link
Contributor Author

@ikawrakow I'm afraid it will stay in unmerged PR limbo for a while, as neither mine nor @firecoperana's work for 100% of the use cases. But still it is better to have a function that's buggy, but works most of the time, than to not have it at all.

@ikawrakow
Copy link
Owner

@SneedwareInc

But still it is better to have a function that's buggy, but works most of the time, than to not have it at all.

That depends. If the new functionality does not affect existing functionality in any way (so when not used, the code behaves exactly the same as before), then sure, at least some will consider an implementation that does not work 100% of the time to be beneficial.

The moment it starts breaking existing functionality, then no, it is not better.

Apart from maintainability, this is another reason why @firecoperana has been asking you to fully factor out the string ban implementation from the handle_completions_impl function, so it is easy to just not call the string ban implementation when there are no bans in effect (or banning has been turned off by some other means).

@Nexesenex
Copy link
Contributor

Nexesenex commented Feb 6, 2026

@SneedwareInc : the ST code works for me used in tandem with this PR, this with some banned strings of my choosing at the very least.

It infers as expected, and that's the core point of the feature I was eager to get for my use case. Thank you for that!

Remark: The timings displayed in IKL are messed up.

Edit 2 : Your ST code also works, for the banned strings feature, with @firecoperana's commit. I didn't test anything else yet.

@SneedwareInc
Copy link
Contributor Author

SneedwareInc commented Feb 6, 2026

A few updates, since we are going with firecoperana's PR. Should fix my issues #1233, but please test it out as I may have missed some bugs.

Edit: It's broken, firecoperana's implementation was inferior. It does not do position aware banning, like mine did, which is already backfiring after 30 minutes of stress testing.

@SneedwareInc
Copy link
Contributor Author

Did some fixing, should be working properly, but needs more stress-testing.

@Lissanro
Copy link

Lissanro commented Feb 6, 2026

I would be happy to help testing, but I tried downloading the patch: https://github.com/ikawrakow/ik_llama.cpp/pull/1131.patch and then apply after git pull, but it fails:

> patch -p1 < 1131.patch
patching file examples/server/server.cpp
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Hunk #1 FAILED at 1 (different line endings).
1 out of 1 hunk FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Hunk #1 FAILED at 1041.
Hunk #2 FAILED at 1053.
Hunk #3 FAILED at 1082.
Hunk #4 FAILED at 1120.
Hunk #5 FAILED at 1129.
Hunk #6 FAILED at 1289.
Hunk #7 succeeded at 1208 with fuzz 2 (offset -742 lines).
6 out of 7 hunks FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Hunk #1 FAILED at 1039.
1 out of 1 hunk FAILED -- saving rejects to file examples/server/server.cpp.rej
patching file examples/server/server.cpp
Reversed (or previously applied) patch detected!  Assume -R? [n]

This is usually how I test patches, but maybe I am not getting latest version, since I see the main was merged just now? If I did something wrong, please let me know.

@SneedwareInc SneedwareInc closed this by deleting the head repository Feb 6, 2026
@SneedwareInc
Copy link
Contributor Author

Deleted and recreated to squash commits, should be better now

@SneedwareInc
Copy link
Contributor Author

I tested it by doing git clone and compiling on different machine, it worked.

@SneedwareInc
Copy link
Contributor Author

Continued here:
#1243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants