Conversation
|
Pretty cool.. does it need changes on the ST side to try? |
|
Yes |
|
A most interesting PR. Antislop is a feature I sorely miss on IKL. |
|
I think in ST it's just a matter enabling banned strings inside index.html for llama.cpp as addition to kobold, as long as you followed the same format for the request as the anti-slop used there. In theory it is a matter of one line. SillyTavern/SillyTavern@412d638 I can't find the kobold but here is the tabby SillyTavern/SillyTavern@62fadda |
|
Is this PR ready for review? |
|
I did some stress testing on it, now it should handle overlapping strings better and remove the earliest token. I've also added 3 new arguments:
I think it is ready, it is functional, but it would not surprise me if someone finds a way to break it. |
|
Heh, that stuff is going to be impossible to get into ST, client wise because there's no separate ik_llama carve out. The classic banned-strings way is still the same, right? |
I don't know how classic way works. Does it just send json of strings as |
|
It would be simpler to write ik_tavern from scratch than to navigate the mess that is ST 🙂 |
@Ph0rk0z My impression is that about 90% of your issues with |
Yes. I think so.
Really isn't anything else. |
|
Done! |
|
Can you put the majority of the code into functions in another file and call these function inside handle_completions_impl? handle_completions_impl is very long and there is a PR that will rewrite it. Set a flag if you need to enable this feature. If ban string feature is enabled, you have a function to check if you need to ban the string. If you need to ban it, call another function to adjust your prompt and regenerate. This way you make minimal changes to handle_completions_impl and make the review easier. |
|
Can you point out to me which functions exactly? |
|
You are adding 500 lines of code to a function with 200 lines, which makes it hard to maintain afterwards. This is an very important function for the llama-server, so we want to make it as simple as possible. Since you are vibe coding, it makes more sense to write small and self contained functions. |
|
How? I have no coding education or background. |
there is nothing even close to ST for roleplay, we cant switch at this point, its the most feature rich UI |
|
@firecoperana Do you want to help improving the PR? I agree with your assessment that the banning logic must be factored out of |
|
Sure, I think it's better to do it in the sampling stage to avoid the speed penalty in this PR, but I don't have anything concrete yet. |
In the sampling phase one can simply use logit bias. But if I understand correctly, the goal here is to ban whole phrases (so, sequences of multiple tokens), and I don't see how that can be accomplished in the sampling phase. |
|
If the idea is to ban the whole phrase, and not the its sub phrase, what we can do is that after the normal sampling is done, and the current token with the previous generated text contains the string, we set the logit bias of the token to -inf and resample it. |
Yes, sure, but I don't think this is what one wants. Say Joe really likes watching movies and the LLM knows about that, so it keeps repeating it. I have heard that Joe likes movies so many times that I don't want to hear it ever again, so I ban "Joe likes movies". Now assume Joe does not like anything else. So, we got "Joe likes movies", we set the bias of "movies" to |
|
Thanks for the explanation. |
|
@firecoperana Are you done shuffling the code around? |
|
I haven't got the time to do it yet. |
|
@firecoperana I've put the banned string/regex code outside |
|
@firecoperana I can't reproduce your issue. output.mp4 |
|
I'm using ERNIE-4.5-21B-A3B-PT-UD-Q2_K_XL. If it does not trigger the ban, it's fine. |
|
Clearly a model issue. I can't reproduce this bug on GLM and Kimi either. |
|
That could be true, but still need to a way to prevent it from going into infinite loop. My PR works fine with this model. Another issue I see is token generation speed is wrong in webui, but that's less important. |
|
I just started testing this PR, but noticed right away that even without anything configured (no banned strings), in Roo Code the model no longer can make any tool calls at all - they just become plain strings like this: Without this patch, tool calling works without issue. In SillyTavern, while using Chat Completion, the model keeps failing to produce the "think" block, which is very strange. Without this patch, it works fine. I also tried https://github.com/SneedwareInc/ik_SillyTavern but it only provides settings for Text Completion, while the model requires Chat Completion to work properly (unless there is a way to use the jinja chat template in the Text Completion?). I still tried with Text Completion and SillyTavern to see if it will have any effect, with very simple ban list: Then it worked, but without the thinking block and without tool calls cannot really use it unfortunately beyond just testing. I am still not sure why even in Chat Completion the think block stops being produced while testing in SillyTavern, or why tool calls get broken, even without any ban strings set. If I did something wrong, please let me know and I will retest, I used this command to run: |
|
Thanks for taking the time to test, this is very helpful. It looks like no string banning for now. |
|
@Lissanro Thanks for reporting. I've never used chat completion or tool calls, as my use cases rely on pure text completion. I'm afraid it's outside my ability as a vibecoder to fix it. Perhaps @firecoperana's PR would work better for you? The thinking blocks with text completion work fine for me however with Kimi, I haven't noticed any issues with them. Have you configured them properly in ST? Did you enable instruct mode and provide proper chat templates? If I need pure assistant, I leave it empty. |
|
@ikawrakow I'm afraid it will stay in unmerged PR limbo for a while, as neither mine nor @firecoperana's work for 100% of the use cases. But still it is better to have a function that's buggy, but works most of the time, than to not have it at all. |
That depends. If the new functionality does not affect existing functionality in any way (so when not used, the code behaves exactly the same as before), then sure, at least some will consider an implementation that does not work 100% of the time to be beneficial. The moment it starts breaking existing functionality, then no, it is not better. Apart from maintainability, this is another reason why @firecoperana has been asking you to fully factor out the string ban implementation from the |
|
@SneedwareInc : the ST code works for me used in tandem with this PR, this with some banned strings of my choosing at the very least. It infers as expected, and that's the core point of the feature I was eager to get for my use case. Thank you for that! Remark: The timings displayed in IKL are messed up. Edit 2 : Your ST code also works, for the banned strings feature, with @firecoperana's commit. I didn't test anything else yet. |
|
A few updates, since we are going with firecoperana's PR. Should fix my issues #1233, but please test it out as I may have missed some bugs. Edit: It's broken, firecoperana's implementation was inferior. It does not do position aware banning, like mine did, which is already backfiring after 30 minutes of stress testing. |
Needs testing
|
Did some fixing, should be working properly, but needs more stress-testing. |
|
I would be happy to help testing, but I tried downloading the patch: https://github.com/ikawrakow/ik_llama.cpp/pull/1131.patch and then apply after git pull, but it fails: This is usually how I test patches, but maybe I am not getting latest version, since I see the main was merged just now? If I did something wrong, please let me know. |
|
Deleted and recreated to squash commits, should be better now |
|
I tested it by doing git clone and compiling on different machine, it worked. |
|
Continued here: |



I am going to be completely honest, I do not know how to use github, or advanced C++, and I vibecoded it all in notepad.
It needs some cleanup(debug info in console, comments), but it is functional.
This modification of the code adds string ban to the server. What it does, is it creates a buffer where tokens are temporarily stored, checked against blacklist and then if a banned string is generated, first token is temporarily banned. To get banned strings it uses "banned_strings" argument. It will continue trying that until it generates good text. Good text is streamed out of the buffer. This functionality is similar to antislop sampler found in koboldcpp.
Example usage:
Mistral Nemo Q6_K
Input:
Without stringban:
With
"banned_strings": ["eyes", "tapestry", "shiver", "whisper", "symphony"],: