Skip to content

Conversation

@saood06
Copy link
Collaborator

@saood06 saood06 commented Jun 26, 2025

This PR adds mikupad (and new endpoints to server.cpp that mikupad uses to manage its sql database).

It must be built with -DLLAMA_SERVER_SQLITE3=ON.

Tip

Click here if you see `Could NOT find SQLite3 (missing: SQLite3_INCLUDE_DIR SQLite3_LIBRARY)` when building
Fix for Windows

In an install directory:

git clone https://github.com/microsoft/vcpkg
.\vcpkg\bootstrap-vcpkg.bat -disableMetrics
.\vcpkg\vcpkg.exe install sqlite3:x64-windows

then pass -DCMAKE_TOOLCHAIN_FILE="[...]\vcpkg\scripts\buildsystems\vcpkg.cmake" where [...] is the location of vcpkg installed above

Fix for Linux Look into installing some form of libsqlite3-dev or sqlite-autoconf-dev or similar for your distro.

It can be launched with --path ../../examples/server/public_mikupad --sql-save-file [...] with an optional --sqlite-zstd-ext-file [...].

The path serves the index.html, but the methods the endpoint rely on are only enabled when a sql-save-file is passed.

The provided mikupad file is built on top of lmg-anon/mikupad#113 but additionally it streamlines the code (and UI sections), by removing support for other LLM endpoints and data storage models but additionally has the following:

  • Add a sidebar section and pop out modal for managing save slots and KV cache slots (new standard feature).
  • Add a sidebar section to manage Database Compression (new optional feature [will only be turned on if --sqlite-zstd-ext-file is passed] )
  • Add a second list of auto-grouped sessions and made the sidebar and sessions sections resizable
  • Add an Export All button (to complement the import many functionality I contributed to mikupad)
  • add top-n σ sampler
  • fixed a longstanding bug with highlight misalignment (using the fix in this comment: Highlight misalignment lmg-anon/mikupad#78 (comment))
  • move current page storage from database to browser URL

The compression feature requires supports dynamically loading phiresky/sqlite-zstd which for allows one to use compressed sql databases, results may vary but for me it is very useful:

size before size after row count
31.04GB 3.40GB 14752
8.62GB 581.33MB 8042
12.54 GB 2.04 GB 1202
30.54 GB 5.02 GB 6180

sqlite_modern_cpp will be built if -LLAMA_SERVER_SQLITE3=ON.

Potential future roadmap items:

  • Add a mode that creates new sessions on branching or prediction
  • SQLite Wasm option (this would allow for you to choose to save to the server or the browser)
  • Add themes to the database, and add a user friendly way to create and manage them
  • Add better way to organize sessions together as opposed to the current solution of just grouping those with the same name
  • Add the ability to mask tokens from being processed (for use with think tokens as they are supposed to be removed once the response is finished).
  • Add generic template tokens (for system prompt, and instruct), making stored prompts more flexible with different templates.
  • max content length should be obtained from server (based on n_ctx) and not from user input, and also changing or even removing the usage of that variable (or just from the UI). It is used for setting maximums for Penalty Range for some samplers (useful but could be frustrating if set wrong as knowing that is not very clear), and to truncate the context in some situations potentially.
  • Allow for slot saves to be in the database. This would allow for it to be compressed (similar to prompts there can often be a lot of redundancy between saves). Edit: This may not be as useful as expected.
  • Move template selected to sampling, and make sampling have it's own saves like sessions (and available templates) do. This would make it easy to have preset profiles of templates/sampler. The downside of this, it would break import/export functionality with older versions and I am not sure if would even be a better experience let alone worth that tradeoff.

@saood06
Copy link
Collaborator Author

saood06 commented Jun 28, 2025

Now that I have removed the hardcoded extension loading, I do think this is in a state where it can be used by others (and potentially provide feedback), but I will still be working on completing things from the "To-do" list above until it is ready for review (and will update the post above).

@ubergarm
Copy link
Contributor

ubergarm commented Jun 30, 2025

Heya @saood06 I had some time this morning to kick the tires on this PR.

My high level understanding is that this PR adds new web endpoint for Mikupad as an alternative to the default built-in web interface.

I don't typically use the built-in web interface, but I did by mest to try it out. Here is my experience:

👈logs and screenshots
# get setup
$ cd ik_llama.cpp
$ git fetch upstream
$ git checkout s6/mikupad
$ git rev-parse --short HEAD
3a634c7a

# i already had the sqllite OS level lib installed apparently:
$ pacman -Ss libsql
core/sqlite 3.50.2-1 [installed]
    A C library that implements an SQL database engine

# compile
$ cmake -B build -DGGML_CUDA=ON -DGGML_VULKAN=OFF -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1
$ cmake --build build --config Release -j $(nproc)

Then I tested my usual command like so:

# run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
  ./build/bin/llama-server \
    --model "$model" \
    --alias ubergarm/Qwen3-14B-IQ4_KS \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
    --host 127.0.0.1 \
    --port 8080

When I open a browser to 127.0.0.1:8080 I get a nice looking Web UI that is simple and sleek with a just a few options for easy quick configuring:

ik_llama-saood06-mikupad-pr558

Then I added the extra arguments you mention above and run again:

# run llama-server
model=/mnt/astrodata/llm/models/ubergarm/Qwen3-14B-GGUF/Qwen3-14B-IQ4_KS.gguf
CUDA_VISIBLE_DEVICES="0" \
  ./build/bin/llama-server \
    --model "$model" \
    --alias ubergarm/Qwen3-14B-IQ4_KS \
    -fa \
    -ctk f16 -ctv f16 \
    -c 32768 \
    -ngl 99 \
    --threads 1 \
    --host 127.0.0.1 \
    --port 8080 \
    --path ./examples/server/public_mikupad \
    --sql-save-file sqlite-save.sql

This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot:

ik_llama-saood06-mikupad-pr558-test-2

The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it?

NFO [                    init] initializing slots | tid="140147414781952" timestamp=1751293931 n_slots=1
INFO [                    init] new slot | tid="140147414781952" timestamp=1751293931 id_slot=0 n_ctx_slot=32768
INFO [                    main] model loaded | tid="140147414781952" timestamp=1751293931
INFO [                    main] chat template | tid="140147414781952" timestamp=1751293931 chat_example="<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n" built_in=true
INFO [                    main] HTTP server listening | tid="140147414781952" timestamp=1751293931 n_threads_http="31" port="8080" hostname="127.0.0.1"
INFO [            update_slots] all slots are idle | tid="140147414781952" timestamp=1751293931
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=200 method="GET" path="/version" params={}
INFO [      log_server_request] request | tid="140145881767936" timestamp=1751293939 remote_addr="127.0.0.1" remote_port=54320 status=500 method="POST" path="/load" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=200 method="GET" path="/version" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=500 method="POST" path="/load" params={}
INFO [      log_server_request] request | tid="140145873375232" timestamp=1751293944 remote_addr="127.0.0.1" remote_port=54336 status=404 method="GET" path="/favicon.ico" params={}

@Downtown-Case
Copy link
Contributor

Downtown-Case commented Jun 30, 2025

I am interested in this.

Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text.

@saood06
Copy link
Collaborator Author

saood06 commented Jun 30, 2025

I am interested in this.

Mikupad is excellent for testing prompt formatting and sampling, with how it shows logprobs over generated tokens. It's also quite fast with big blocks of text.

Glad to hear it. I agree. I love being able to see probs for each token (and even be able to pick a replacement from the specified tokens).

If you are an existing mikupad user you may need to use the DB migration script I put in lmg-anon/mikupad#113 if you want to migrate a whole database, migrating individual sessions via import and export should work just fine I think.

This time a different color background appears but seems throw an async error in the web debug console as shown in this screenshot:
...
The server seems to be throwing 500's so maybe I didn't go to the correct endpoint or do I need to do something else to properly access it?

You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose.

@ubergarm
Copy link
Contributor

You are doing the correct steps, I was able to reproduce the issue of not working with a fresh sql file (so far my testing was done with backup databases with existing data). Thanks for testing, I'll let you know when it works so that you can test it again if you so choose.

Thanks for confirming, correct I didn't have a .sql file already in place but just made up that name. Happy to try again whenever u are ready!

@saood06
Copy link
Collaborator Author

saood06 commented Jun 30, 2025

Thanks for confirming, correct I didn't have a .sql file already in place but just made up that name. Happy to try again whenever u are ready!

Just pushed a fix. ( The issue was with something that is on my to-do list to refactor and potentially remove but for now a quick fix for the code as is).

Edit: The fix is in the html only so no compile or even relaunch needed just a reload should fix it

@ubergarm
Copy link
Contributor

@saood06

Aye! It fired right up this time and I was able to play with it a little and have a successful generation. It is cool how it I can mouse over the tokens to see the probabilities!

mikupad-testing-works

@saood06
Copy link
Collaborator Author

saood06 commented Jun 30, 2025

Aye! It fired right up this time and I was able to play with it a little and have a successful generation.

Nice.

It is cool how it I can mouse over the tokens to see the probabilities!

Yes, I like to turn on the "Color by probability" to be able to see low probability tokens at a glance.

It might also be useful to you for benchmarking quants or models (saving and cloning prompts).

@ikawrakow
Copy link
Owner

This is getting surprisingly little testing. Nevertheless we can merge whenever @saood06 feels it is ready and removes the "draft" label.

@saood06
Copy link
Collaborator Author

saood06 commented Jul 25, 2025

This post is from before I finished implementing and polishing the UI.

The new resizable sessions section (All group is always on top, and contains all prompts, number is how many prompts in that group ):
image

Managing prompts from disk cache:
image

The left panel is adjustable in width, and the entire thing is adjustable in height. (Note: total width is fixed, but total height is adjustable).

Edit: The image is outdated, renaming support and the save tab now exist.

The icons for sorting are in order: name, token count, file size, and modified date. I hope the icons make that clear but they also say what they do on hover:

image

The sidebar which includes the button to open what is shown above alongside Database compression management:

image

The enable button will swap to an update button once you enable compression

Custom button when clicked shows this:

image

@saood06
Copy link
Collaborator Author

saood06 commented Aug 11, 2025

I have yet to push a commit with it because although most things are functional, there are still bugs and some missing functionality.

I have now pushed my changes (the UI for compression, KV cache manipulation, and the export all button are now implemented and fully functional). This PR is now feature complete, but leaving it in draft as it won't compile for people who do not have sqlite3 installed (fixing this is the last on my To-do list).

…found build option, and update error message to include the build option is not passed situation
@saood06 saood06 marked this pull request as ready for review August 24, 2025 09:05
@saood06
Copy link
Collaborator Author

saood06 commented Aug 24, 2025

@ikawrakow

It is finally ready for review.

@firecoperana

This adds endpoints that you may find useful to utilize in the UI you maintain.

@saood06 saood06 merged commit af13c9a into main Aug 24, 2025
@saood06 saood06 mentioned this pull request Sep 23, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants