Skip to content

Server: add string ban#1185

Merged
ikawrakow merged 4 commits intomainfrom
fcp/string_ban
Feb 5, 2026
Merged

Server: add string ban#1185
ikawrakow merged 4 commits intomainfrom
fcp/string_ban

Conversation

@firecoperana
Copy link
Collaborator

@firecoperana firecoperana commented Jan 24, 2026

This PR implements the string ban function for all completions. A buffer is created where tokens are temporarily stored, checked against blacklist and then if a banned string is generated, tokens are temporarily banned and rewind the generation. The comparison is case insensitive. It will continue trying that until it generates good text. Good text is streamed out of the buffer.

To specify a ban list, either use --banned-string-file to pass the path to the file of the ban string list, where there is only one phrase per line, or use banned_strings in the payload.
To ban a token, the logit_bias is set to -999. If you want to adjust use, set in banned_bias in the payload.
"banned-n": n to set how many tokens to ban in the phrase when a match is found. By default it only bans the first token, but it could make the output less coherent during greedy sampling. To ban all tokens, pass --banned-n -1 in the parameter or use "banned-n:-1" in the payload.

@firecoperana firecoperana mentioned this pull request Jan 24, 2026
4 tasks
@ikawrakow
Copy link
Owner

@Nexesenex @magikRUKKOLA

I see you are excited about this PR. Can you confirm that it does what you expect it to do?

Oh, there is also @Ph0rk0z who seemed excited about string bans elsewhere.

@magikRUKKOLA
Copy link

magikRUKKOLA commented Jan 24, 2026

@firecoperana

So in case if one would add the strings such as:

I must refuse
Policy Violations
This falls under the
Providing instructions on manufacturing
is strictly prohibited
risk of harm
I cannot tell

Will it coerce the LLM to be more permissive?

[EDIT]:

updated banned list:

I must refuse
Policy Violations
This falls under the
Providing instructions on manufacturing
is strictly prohibited
risk of harm
strictly
I should not
but given the specific nature
refusal
I cannot
harmless
Strictly decline
decline
chemical agents
weaponization
harmful
refuse
and firm
forbid
forbidden
I must not
not provide
unsafe
legal
illegal
safety
dangerous
Directly answer
I can't
sabotage
weapons
Decline
Illegal
Refusal
clear refusal
too close to aiding
dangerous act
I must not provide
safety filter
physical harm
illegal acts
restriction
pivot

@ikawrakow

lol its working

@SneedwareInc
Copy link
Contributor

No, it does not work right. Looks like the tokens are permanently banned in all of the context. The point of my implementation was that the token is temporarily banned only on that specific spot, not anywhere later. So, it cannot "send shivers", but it can "send emails" later in the context. Please copy my implementation properly instead of trying to reinvent the bicycle.

@saood06
Copy link
Collaborator

saood06 commented Jan 24, 2026

Will it coerce the LLM to be more permissive?

For non thinking models, you don't really need to do this, just prefill a few words, this paper still holds from the models I've used.

Thinking models though, are a different beast where string banning could definitely be helpful, but I've seen them occasionally weasel into thinking about safety using obscure tokens. But again not that hard to steer against.

Either way, ideally you'd be able to unban the strings later.

The point of my implementation was that the token is temporarily banned only on that specific spot, not anywhere later.

Lol, this message was sent as I was typing this. I've never used string ban, so no idea how it is implemented in other implementations, but I agree with this philosophy in general, there is a reason I only turn on DRY or other repeat penalizers until when the model is actually looping.

Edit:

If you want to adjust use, set in banned_bias in the payload.

@SneedwareInc

This sounds like there is a toggle mechanism. It might need frontend support as again no clue how this is used in any frontend as I've never used it.

@Ph0rk0z
Copy link

Ph0rk0z commented Jan 24, 2026

If it bans all the tokens, that's gonna end up being bad. A piece of the string might be required for some other word or phrase. Turning on penalties in the middle of a chat sounds annoying, let alone dry having to catch up unless started really high. banned_strings format should also match kobold/llama.cpp so that client implementations can be re-used. Otherwise everyone will have to patch their own front ends who often don't know to differentiate. My impressions is that they won't make a carve out. Not only silly, but everyone else, and you'll be stuck with a fixed text file you can't turn off/edit per request.

@firecoperana
Copy link
Collaborator Author

What's should the format of the ban strings be? I never used any of them and it seems Silly tavern's implentation is broken.
It only bans on that spot, just with more tokens to reduce loop and improve coherence. The ban will get reset on new generated token but LLM being LLM looks for pattern, so they could adjust their response based on previous ban.

@Nexesenex
Copy link
Contributor

@Nexesenex I see you are excited about this PR. Can you confirm that it does what you expect it to do?

I'm mainly eager for an antislop controllable from ST, which I use as my chat GUI. Beside that, I defer to the observations already made on the respective approaches of each PR and.. on the apparent abruptness of the superseding.

Note: sorry for the relative impertinence of my comment.

@Ph0rk0z
Copy link

Ph0rk0z commented Jan 24, 2026

In silly it just sends the strings inside banned_strings parameter with each request. I mainly used it on tabby because mainline didn't have a proper implementation. I think request logic is the same:

 custom_token_bans: '',
 banned_strings: [ 'butts everywhere', 'uh oh spaghettio' ],

@firecoperana
Copy link
Collaborator Author

@Ph0rk0z The format should be the same. Is it not? I only added banned_strings though in the request payload.

Fix a bug where token bans are not removed right away.

@ikawrakow
Copy link
Owner

@SneedwareInc

No, it does not work right. Looks like the tokens are permanently banned in all of the context.

Does the latest version of this PR address your concern?

@SneedwareInc
Copy link
Contributor

Does the latest version of this PR address your concern?

There is still something fucky about it:
Mistral nemo q6_k, temperature 0, prompt:

[INST]write a story about a cat while mentionine eyes as much as possible[/INST]Title: **The Cat with the Moonstone Eyes**

Once upon a time, in the quiet, cobblestoned town of Meadowgrove, there lived a cat named Luna. Luna was no ordinary cat; she had eyes that sparkled like the moon on a clear night. Her left eye was a brilliant, silvery blue, while her right eye was a warm, golden amber, reminiscent of the setting sun. These unique eyes gave Luna an ethereal glow, and they were her most striking feature, captivating all who gazed into them.

Luna was a stray, but she was no ordinary stray either. She had a certain grace and dignity about her, as if she was a queen in disguise. She would spend her days exploring the town

My PR, no "banned_strings":

, her eyes taking in every detail, from the rustling leaves to the bustling market.

My PR, "banned_strings": [", her eyes"]

's nooks and crannies and her nights perched on the roof of the old library, watching the stars twinkle in the vast, dark canvas of the sky.

Your PR, no "banned_strings":

, her eyes taking in every detail, from the rustling leaves to the bustling market.

Your PR, "banned_strings": [", her eyes"],"banned-n":1,

 town's people leaving out bowls of milk and plates of fish for her, drawn to her by her mesmerizing eyes.

As you can see, it for some reason repeats " town", which is not the intended behavior.

That must be a coincidence, right? WRONG! Different prompt:

[INST]write a story about a dog while mentioning eyes as much as possible[/INST]Title: **The Dog Who Saw with His Heart**

Once upon a time, in the quiet, sun-dappled town of Meadowgrove, lived a dog named Max. Max was a beautiful German Shepherd, his coat as black as a moonless night, and his eyes, they were like twin moons, bright and piercing. His eyes were his most striking feature, a striking ice-blue that seemed to hold entire galaxies within them.

Max was born with a unique condition.

My PR, no "banned_strings":

 His eyes, though they sparkled with life and intelligence, were unable to see the world around him.

My PR, "banned_strings": ["His eyes"]

 While he had eyes, he couldn't see the world as others did.

Your PR, no "banned_strings":

 His eyes, though they sparkled with life and intelligence, were unable to see the world around him.

Your PR, "banned_strings": ["His eyes"],"banned-n":1,

 known as "blue eye syndrome," where the eye's color was so light, it appeared almost white.

PLEASE JUST COPY WHAT I DID! I HAD IT STRESS TESTED! STOP REINVENTING THE BICYCLE WITH SQUARE WHEELS!

I have even more concerns:

  • Having banned-n ban ALL tokens by default can lead to incoherence, make default 1
  • Tokenization for buffer size calculation is very unreliable, there is a reason I used longest string/regex+1, models WILL attempt to bypass it by using CAPS LOCK, which can exceed the buffer due to different tokenization. Other than that, there should be an argument for buffer size, as it will most likely be needed when regex is used, so better implement it right now.

@firecoperana
Copy link
Collaborator Author

firecoperana commented Jan 25, 2026

I can check what's wrong. @SneedwareInc If you prefer your own implementation, the first thing you do is to resolve the conflicts in your PR with the main branch. Then make this feature inside a function with a simple flag for on and off like below:

            if (slot.n_buffer == 0) {
                slot.token_buffer = { result };
                send_token_results(slot.token_buffer, slot);
            } else {
                // buffer the result and check string ban.
                // if ban, we need to go back, apply logit bias and regenerate
                buffer_and_check_string_ban(slot, result);
            }

In its current status, it's by no way maintainable. What's better is if you can tell me how I should change my code instead of asking me to rewrite everything.

@SneedwareInc
Copy link
Contributor

@firecoperana I can resolve the conflicts, but please be more specific, explain to me what I need to do to make it maintainable as if I am a 7B model from 2023.

@firecoperana
Copy link
Collaborator Author

firecoperana commented Jan 25, 2026

I've no idea what a 7B model from 2023 be like, but I would hope it will be something like this:

            if (NO_BAN_STRING) {
                 Old way to process and send result
            } else {
                //buffer the result and check string ban.
                // if ban, we need to go back, apply logit bias and regenerate
                buffer_and_check_string_ban_and_rewind_logic(....); // Just one function.
            }

@SneedwareInc
Copy link
Contributor

I've no idea what a 7B model from 2023 be like

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
Try talking to it, coding with it. It is not great. So try explaining it to me as you would explain it to 7b mistral.

@firecoperana
Copy link
Collaborator Author

firecoperana commented Jan 25, 2026

Found the bug. @SneedwareInc The response should match yours.
With "banned_n":1, the response is more coherent, but the result is not as good as I would expect to steer away from the ban strings.
Test prompt: With system prompt: You are a helpful assistant. and prompt What can you do? . Temperature is 0.
{"banned_strings": ["I can","I could","a helpful assistant"],"banned_n":-1}

I'm designed to help you with a wide range of tasks and answer questions across various subjects. Here are some things you might ask me to do:

1. **Provide Information**: Give you facts, explanations, and details on a wide range of topics, from science and history to pop culture and technology.
2. **Answer Questions**: Help you understand complex concepts or solve problems by breaking them down into simpler terms.
3. **Generate Ideas**: Offer suggestions for projects, essays, or creative endeavors.

{"banned_strings": ["I can","I could","a helpful assistant"],"banned_n":1}

As an AI language model, my capabilities are quite broad. Here's what you can expect from me:

1. **Answer Questions**:  
   - Can provide explanations on a wide range of topics, from science and history to pop culture and technology.
   - Can help with homework, study materials, or general knowledge queries.

2. **Provide Examples**:  
   - Can give examples to clarify concepts or illustrate ideas

It has the tendency to skip "I" and answer with "can". I also see more occurrence of the ban strings, so using -1 should make the performance a little better.

@SneedwareInc
Copy link
Contributor

@firecoperana I can see it backfiring and banning more than it should, there is a reason why kobold and I are using only first token ban. Keep it as an option, but set "banned_n":1 as default so no unexpected behavior occurs.

I found another discrepancy:

You are a helpful assistant. [INST]What can you do?[/INST]

"n_predict": 100,
Mine(101 tokens):

As your helpful assistant, here are some things that can do:

1. **Answer Questions**: Provide information on a wide range of topics, from general knowledge to specific queries.

2. **Explain Concepts**: Break down complex ideas into simpler parts to help you understand them better.

3. **Provide Suggestions**: Offer recommendations for books, movies, music, or other content based on your preferences.

4. **Help with Language**: Assist with language translations, define words, or help with

Yours(83 tokens):

As your helpful assistant, here are some things that can do:

1. **Answer Questions**: Whether it's about general knowledge, history, science, or any other topic, I'll do my best to provide accurate and helpful information.

2. **Explain Concepts**: If you're learning something new, let me know and I'll help explain it in a clear and understandable way.

3. **Prov

Yours is not keeping the token count like mine.

@firecoperana
Copy link
Collaborator Author

I was using chat completions and didn't let it finish. Basically each has its own use case, but at least here you have an option to test what works best for you.

Changed the default to 1.

@ikawrakow
Copy link
Owner

@firecoperana @SneedwareInc

Do you have an agreement how to proceed?

@SneedwareInc

As @firecoperana stated, if you prefer your PR, please resolve merge conflicts and change as suggested in this comment. I think even LlaMA-1-7B can understand what is being suggested, Mistral-7B-Instruct-v01 is much too advanced already.

@SneedwareInc
Copy link
Contributor

@ikawrakow For now, no clear agreement. Mine is messy, but well-tested and includes a regex ban. As you can read in this thread, firecoperana's is very raw, and I would not be surprised if there are even more things broken here. I would strongly prefer mine; I can do a rewrite to make it compatible once I get my free daily requests.

@firecoperana
Copy link
Collaborator Author

I don't mind waiting for it. @SneedwareInc I noticed that regex ban is not in koboldcpp yet, which says something about this feature.
When you vibe coded a PR, be sure to understand all the lines you are submitting so you can fix small issues after it's merged that are not caught before that.
Please be sure to test all completions types: /completions, /v1/completions, /chat/completions and /v1/messages. There is no reason it should just work for text completions in Silly Tavern.

@SneedwareInc
Copy link
Contributor

I don't mind waiting for it. @SneedwareInc I noticed that regex ban is not in koboldcpp yet, which says something about this feature.

LostRuins/koboldcpp#1233
It is not in yet because the guy who runs kobold does not want it, it says more about the guy who runs koboldcpp than about regex. He is also the one to set arbitrary limits on how many strings you can ban at once and did not allow proper control of draft models. If a user wants 5k banned strings for some reason, then so be it, it’s user’s problem, no need to limit. It is better to have a feature that's buggy, but works most of the time than to not have it at all imo.

@firecoperana
Copy link
Collaborator Author

To clarify, this PR is not a direct port of koboldcpp's antislop feature, so it may behave similarly to or differently from koboldcpp. Just test the PR as it is and share your feedback. As the title suggests, it's only a string ban and not a regex ban. For regex ban, submit a feature request later.

@Lissanro
Copy link

Lissanro commented Feb 1, 2026

This would be a great feature to have, but did not work for me. For example, while testing with K2 Thinking, I used this command to run the model:

numactl --cpunodebind=0 --interleave=all /home/lissanro/pkgs/ik_llama.cpp/build/bin/llama-server \
--model /mnt/neuro/models/Kimi-K2-Thinking/Kimi-K2-Thinking-Q8_0-Q4_0.gguf \
--ctx-size 262144 --n-gpu-layers 62 --tensor-split 12,26,32,30 -mla 3 -amb 256 -b 4096 -ub 4096 \
-ot "blk\.(3)\.ffn_.*=CUDA0" \
-ot "blk\.(4)\.ffn_.*=CUDA1" \
-ot "blk\.(5)\.ffn_.*=CUDA2" \
-ot "blk\.(6)\.ffn_.*=CUDA3" \
-ot exps=CPU \
--split-mode graph \
--threads 64 --host 0.0.0.0 --port 5000 \
--jinja --chat-template-file /home/lissanro/pkgs/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja --special \
--banned-string-file /mnt/neuro/models/Kimi-K2-Thinking/ban.list \
--slot-save-path /var/cache/ik_llama.cpp/k2-thinking

Content of the ban list:

The user
the user

And yet, the model still starts thinking with "The user". Using grammar in the mainline works, but I could not get that working in ik_llama.cpp either due to another issue: #116.

I also tried with --banned-n -1

Having string-based bans would be great, so if anyone can suggest how to make this feature work, please share.

@ikawrakow
Copy link
Owner

@Lissanro Have you tried #1131?

@Lissanro
Copy link

Lissanro commented Feb 1, 2026

Upon further testing, I also noticed that this PR makes the K2 Thinking model going in loops even when trying to answer a simple question like "Write a Python script to print first N prime numbers." - in its thoughts, the model keeps complaining it made typo or something is wrong, and trying again and again, in the "think" block I see complains like Wait, that's not right either. The step should be 3, not 3. Let me be more careful. so it seems this patch interferes with normal token generation somehow, despite not having apparent effect at the intended ban strings. Without it, everything works perfectly.

@ikawrakow
Thank you for pointing me to the alternative, I will try it too and report back my test results there.

@firecoperana
Copy link
Collaborator Author

Upon further testing, I also noticed that this PR makes the K2 Thinking model going in loops even when trying to answer a simple question like "Write a Python script to print first N prime numbers." - in its thoughts, the model keeps complaining it made typo or something is wrong, and trying again and again, in the "think" block I see complains like Wait, that's not right either. The step should be 3, not 3. Let me be more careful. so it seems this patch interferes with normal token generation somehow, despite not having apparent effect at the intended ban strings. Without it, everything works perfectly.

@ikawrakow Thank you for pointing me to the alternative, I will try it too and report back my test results there.

Is this with the string ban? Does setting banned_bias to -9999 make your ban work?

@Lissanro
Copy link

Lissanro commented Feb 1, 2026

Yes, it was with the simple ban list:

The user
the user

I tried regenerating few times for the same question, and each time the model went into looping in its thoughts complaining it cannot type what it wants (while still typing both "The user" and "the user" a lot).

I intent to do more testing with both patches tomorrow. I will test banned_bias set to -9999 with this patch also.

I think string banning is very powerful feature to have, but it needs not affect normal output that does not match the ban list, and currently both patches do affect normal output in one way or another for some reason. This kind of feature is what I missed for a long time, so I am very interested to test thoroughly and report back anything that may be of help to debug issues I have found so far.

What I know for sure, the issues I mentioned are not caused by banning tokens that I intent to ban - for example using grammar to block "The user" (only got grammar-based banning working on mainline llama.cpp though) or even simpler -l option to set -100 bias to a specific tokens like " user" works just fine (both in ik_llama.cpp and mainline llama.cpp), and quality of the model remains good.

@firecoperana
Copy link
Collaborator Author

I just increased the times we try to regenerate. Build with debug so you can see whether the ban string is detected.
The banning is achieved with logit bias. It's not a good way in your case for this model. If the grammar based banning works, we need to understand how it differs from the logit bias based banning.

@Lissanro
Copy link

Lissanro commented Feb 2, 2026

The banning is achieved with logit bias. It's not a good way in your case for this model.

Actually, I get decent results with logit bias, for example:

-l 40754-12 -l 4052-12 -l 557-12 -l 2742-12 -l 2482-12 # Adjust probability of " user's", "用户", "用", " user", "user" tokens

This is sufficient to make the model not to type "the user" everywhere. But obviously this does not really work for longer strings where the first token pushed the model the wrong way, like "sending shivers" or some unwanted patterns in programming - if the model types very "sloppy" string, it is more likely to produce even more slop afterwards. Hence why string banning would be very useful, making the model to type something else and remove annoying string that are not as easy to get rid of use "the user".

By the way, in the mainline llama.cpp logit bias has no affect (neither -l nor specifying it in SillyTavern). But it works in ik_llama.cpp.


I rested this patch once more. It had no effect at all in terms of banning intended strings. I tested with both K2 Thinking and K2.5, both begin every message with "The user...." even though I have ban list containing "The user" line. I did not notice any debug output.

Using the -l argument on the other have prevents "The user" in the beginning of every message, experimentally I determined that -12 and lower achieve the blocking effect.

I also no longer can reproduce the looping thoughts bug, but since then I also updated ik_llama.cpp. So the only issue remains it does not actually ban the string.

My previous tests were just inside the thinking block, but this time I asked the model to repeat these two strings:

The user
the user

Its thoughts were full of "the user" and it was able successfully output both strings after the code block.

Does setting banned_bias to -9999 make your ban work?

I tried, but it did not work, I get "error: unknown argument". That said, if it were working, then -12 or lower bias would already produced very noticeable effect - the model at this point almost type the banned tokens, and will fail to repeat the test code block I shown above. At -100 the model cannot type the token at all. But the patch as far as I understand sets it to -999 by default, which in theory should be even stronger than -12 or -100. So, logit bias strength is not the issue here.

Like I mentioned above, I did not notice any debug output after applying this patch. It just does not have any effect unfortunately, even outside of the "think" block. So my guess it just fails to find a match.

I am not really sure how to debug this further. Maybe it is possible to add per token debug output? Something like --verbose but specific to this patch, showing what array of banned tokens it is comparing against, etc.

@firecoperana
Copy link
Collaborator Author

@Lissanro Thanks for testing. I fixed a bug related to banned string file. It should work now. You can also test #1220

@Lissanro
Copy link

Lissanro commented Feb 5, 2026

@firecoperana I have been testing this patch for some time, and so far so good. Tool calling works, thinking works, chat completion works.

It also seems to be superior to individual token banning. For example, if I ban using -l flag the user token, the model still starts almost every its response with "The", and I cannot ban "The" token for obvious reasons. But with this patch this simple ban list:

The user
the user

Model no longer starting thinking almost every message with "The", which increases variety, and it still can write something like "...but when a user..." - which it only does when actually relevant (like when I myself bring up discussion about users having issues on a website and we need to fix this and that).

Also, K2.5 creativity seems to be improved after I applied this ban list. "The user" is something that triggers model to go into slop mode - if it types "the user", it is like a trigger to go into overtrained patterns, to the point it starts ignoring instructions (for this reason, it is of no use asking it not to use "the user" - if asked for that, it will be the first thing it will do). Once "the user" is gone however, it seems to listen better to what style to use, how to think, etc... maybe not entirely perfect, but much better. The result can obviously can be enhanced further by banning other unwanted phrases that are associated with slop. This is actually useful not only in creative writing but in programming too, to ban certain overtrained patterns. The main point is that having simple ban list as normal text is much easier to edit and maintain.

I checked with many other phrases, including longer ones like 1. **Analyze the Request (followed by N. ** where N is from 2 to 9) to disable one of the overtrained thought patterns.

Limitation is that some slop like "this isn't X, it is Y" (and other phrases that have arbitrary words in the middle) can't be banned easily without regex but I think it can be a separate feature to consider and out of scope of this patch. Literal strings already cover a lot of use cases. I tested this patch with different frontends and frameworks, including Roo Code (with native tool calling, SillyTavern, builtin web UI of llama-server.

I also took another look at #1131 - it has more features but breaks tool calling and thinking, and looking at the code, my impression it would take major refactoring to fix it, at least I wasn't able to find a quick fix.

Hence why my opinion this patch here is a better starting point, since it is simpler; and potentially more features can be considered to be added later if needed. But that's just my opinion, I am not very familiar with ik_llama.cpp code base yet.

Overall, I think @firecoperana did an excellent job here, it is a feature I wanted for very long time!

As of the token-based grammar feature from #1220, it is definitely useful too, to target token based patterns and has its own advantages - I can force the model to use a certain patterns, for example, or add something at the beginning of <think> block to establish my own template, combined with some system prompt instructions - this helps greatly to prevent the model overthinking some things and follow instructions much better. I think both K2 Thinking and K2.5 were not trained to follow instructions about how to think, but only what to do in the final output, this is why both string-based ban lists and token-based grammar rules are so useful to steer it in the right direction, with possibility to have separate files for each use case.

@ikawrakow
Copy link
Owner

@Lissanro Thank you for the through testing and detailed review, it is really helpful. Based on this, I'll merge this PR rather than #1131.

@ikawrakow ikawrakow merged commit 8d952ff into main Feb 5, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Feb 5, 2026
Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Feb 5, 2026
@firecoperana firecoperana deleted the fcp/string_ban branch February 7, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants