-
Couldn't load subscription status.
- Fork 155
Webui improvement #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webui improvement #481
Conversation
|
I need people to confirm that this works. |
|
I see options for DRY and XTC. Neither of which is currently supported here. |
Yes, I would have thought that one needs to pick up the changes in |
Are you sure bringing over samplers is that difficult? There was a time when I wanted to bring over DRY ( I no longer care, min_p and temperature is all I use and n-sigma is the only one that if brought over I may end up using since it might be better at eliminating "bad" tokens at the tail than min_p is, but min_p works well enough that I doubt it would be that big of an improvement), and I looked into it, and the only major issue was that you would have to manually port it over because of the refactors that mainline has done, but it still seemed manageable, and much easier than starting from scratch. Edit: I want to clarify I saw DRY and XTC from the code. I haven't tested the new Webui. |
|
Adding a sampler or two shouldn't be too hard. But
|
I followed the sampling changes (and new sampler additions) as they were happening and I do agree that it changed for the better, but it does seem like considerably more work to adapt the changes in sampling than just porting over samplers. My own desires made me consider the easier change of just bringing over any sampler I cared enough about (which currently is none), over changing the way sampling is done, but I know that will differ for everyone.
I haven't checked either. I only looked through the code so far for this PR (and the RPC one) |
|
XTC is about the only way to remove top tokens which could be slop or refusals. Dry has it's issues, but is better than the other repeat penalties. min_p and temperature are fine for non creative stuff but otherwise they come up short. And no "just raise the temperature" isn't a solution. |
XTC is one of the only samplers that is not monotonic (and for a reason, it doesn't really make sense to alter the rankings of the predicted tokens, since so much effort was made training the LLM to rank them in the order it did). I do think that Top-nσ with higher temperatures is better for diverse branching over using XTC but that is mostly just based on the math behind them. I don't get using a sampler to remove refusals either use a model that doesn't refuse, or prefill some of the response so that it doesn't refuse.
I agree, but like you said and from what I've heard it still has it's issues, and so manually intervening to fix repeats is still better as that doesn't have issues.
I disagree, min_p does fine at removing the "bad" tail end, and temperature works for regulating how "chaotic" a model is, and that is all you need (maybe Top-nσ over min_p as it may be better at removing the "bad" tail end at higher temperatures). I do often look at the top-10 tokens and manually sample or even inject tokens to steer the output, thus "manually" sampling, but even without that from what I can see from all the token distributions I've looked at, temperature and min_p leave little room for improvement. |
Right, and I want to undo it. Trainers and my goals aren't necessarily aligned.
First part doesn't exist. Second part is already done. Models like to maliciously comply or default to cliches. Dumping top tokens goes a long way.
Yes it does, as well as setting high top_K like 100. I use min_P of around .03 on everything. But cranking the temperature doesn't really improve coherent creativity. It just makes the model chaotic.
Absolutely kills the fun for me. We're coming at it from 2 different places. I want a realistic "personality" with no defined end goal. A chat videogame. You probably want a story that goes somewhere you have planned it to go. In either case, taking the sampling refactor from mainline probably does it all at once. It didn't look super easy from the PRs unfortunately. They did a lot of changes. Even when I was trying to add tensor size printing, everything is all renamed or moved. IK not kidding about how they do that constantly. |
Fair enough.
I'll take your word for it, since the models I prefer to use now have basically never given me a refusal, and the ones I used to use that would sometimes refuse, the prefilling did work. I think I do remember what you are referring to happening and I would usually just not use those models for those tasks.
Have you tried Top-nσ since it is designed to maintain coherence while acting similar to min_p at high temperatures. I've read mixed feedback from people, but personally I prefer lower temperatures (if the model works well with it, which is why I liked that the new V3 recommended 0.3 which I use with min_p of 0.01, but other models don't work as well with such low temperatures and I would often use 0.6-1.2 depending on the model).
I just realized repeat loops haven't happened for me in a long time, but fixing them was definitely not fun. Even if I don't steer, seeing the top-10 tokens is interesting to me. A story writing assistant is one of the ways I use LLMs but it definitely isn't the only way I use them. You are correct that I haven't use them for what you call "a chat videogame" but I definitely wouldn't be opposed to it, I just haven't written a prompt that sets that up (or used one written by someone else), and I can understand why in that situation intervening or injecting tokens could be very annoying. We probably do use different front-ends then as well. I mainly use (and have contributed to) mikupad, but if I were to try what you describe I know there are other front-ends that would work better.
Yeah, it doesn't look easy, I didn't look into it with the purpose of bringing it over, but I have looked at all basically all of those PRs and the code and I do agree that bringing it over would be a good amount of work. |
|
Have not tried top n sigma since it's only in mainline and generally I use EXL2 for normal sized models. I've been meaning to load up command-A or gemma and give it a whirl. All the "meme" sampling missing here is a bit of a drawback. I initially didn't even realize that it was forked pre dry/xtc and was confused why Deepseek 2.5 was looping so badly. Its like you have to choose between usable speed (close to fully offloaded dense model) or functionality. |
Interesting take. Isn't usable speed one of the most important functionalities of an LLM inference toolkit? |
|
No user feedback here, so new strategy: I'll merge this tomorrow. If we don't get bug reports, all is good. If we do get bug reports, all is good too because we know that it needs further work. |
The DRY/XTC options in the UI this adds can't function. I don't think there is a need to test that, those samplers do not exist here, so the UI exposing them should be removed before this is added (or the samplers could be added I guess). The other thing I found when looking at the source code is the bug report button goes to mainline and not here. |
Right. But then god said:
So it's making me badly want to port the QOL stuff. It mirrors LLMs where a model will be great and then has that one thing you want to change. |
I would love that, and I'm sure many users will too. |
|
Ok.. well it seemed easy enough until I hit the portion where they refactored everything into args.h/args.cpp. So all those new things you added aren't in ctx params anymore. Some time around September. Looks fun, doesn't it? ggml-org/llama.cpp@bfe76d4 |
|
Ha! Last night I cherry picked and got the refactor working. Got as far as DRY and XTC. I didn't post it yet because I somehow bugged the seed to where it it might not be randomizing on re-rolls. I was gonna keep going after a night of sleep. Adding sigma was good because its way up there, past yet another refactor. |
|
https://github.com/Ph0rk0z/ik_llama.cpp/branches Btw, there is a branch where it's only refactored to separate out the sampling. Furthest ahead one is the DRY one. Still didn't delete the args.cpp nor fixed the Makefile changes mainline did but you get the gist. Is any of that worth doing? |
|
Too much change for my taste. The DRY one is 8631+ LOC, 4089- LOC. The XTC one is 7687+, 4020-. This would require a lot of testing. My PR's are in the 70-90 LOC each. The DRY would be a bit bigger, but not sure if it is worth it. |
|
Yep, it is a ton of changes. They add a lot of code in a year. I'm surprised it worked at all. Much of it is related to all the examples too. Even here, 60 files changed for the webui. |
|
Clicking the save button in settings doesn't exit it out like llama.cpp |
Thanks for testing. Apart from this, does it work for you? |
|
It works... At least I found it can respond properly and show TPS. Might need more testing. |
I think the issue is because you used the newest version of webui from mainline in the same browser. If you click "reset to default", save is working again. |
|
I'll try, thanks |
|
If you are interested I added a new endpoint to server that could be utilized by this front end (#502). I already added support to my preferred front end and it has been nice being able to see all my stored sessions and restore them with ease (saving and restoring support already existed but there was no good way to add it to a UI without being able to list what is saved which is what I added). |
|
Works fine (multiple conversations, display of token rate). Huge improvement over the old UI, which made you choose between prompt formats that didn't fit to current models. |
I will try when I have time. That looks very helpful! |
|
What is your opinion on having another additional (alternative like legacy) frontend besides the one implemented here. The one I use has what seems like an abandoned maintainer so I have nowhere to upstream my changes. |
|
So you want to bring in to this repository your favorite frontend and maintain it here? |
Yes. |
|
Can I take a look? |
For now what is public is lmg-anon/mikupad#113. But I have more that isn't public as it works but is not polished (like adding #502 and #504 ) and other things in the roadmap. |
|
It doesn't look like a very big project, so from that point of view, sure. But what about license and such? Why do you prefer do have it here instead of just a separate fork? |
It has a very permissible license, which allows for it to be here from how I read it. ( https://github.com/lmg-anon/mikupad/blob/main/LICENSE )
I plan to maintain it following the feature support here, and there are changes that would make it integrate better here that I am planning. |
|
I know CC0 is very permissive. What I don't know is how one mixes it with MIT. I.e., do we need to update the license file and such. |
I think we can just add a CC0 section to the license file, that specifies the location of it. I could add and maintain an authors file. |
|
OK, go ahead. |
Thanks, I will submit the PR when it is ready. |
|
Finally, some decent UI. Now I can ditch openwebui again. I can't just use the old UI, i don't even know where to start. This made my day |
Updating webui to a newer version, but not latest version
Some minor bug fix for webui