common/parser: add proper reasoning tag prefill reading by pwilkin · Pull Request #20424 · ggml-org/llama.cpp

pwilkin · 2026-03-11T20:49:33Z

This changes the erroneous behavior of the autoparser that ascribed thinking behavior to templates. As people rightly mentioned, some models have dynamic or hybrid reasoning - they can reason or not depending on some switches and even the template behavior can change due to this (i.e. inserting <think> in assistant prefill after a "no_think" appears in a user message).

Therefore, the FORCED_OPEN and FORCED_CLOSED formats are gone. The parser will now just detect models with tagged reasoning, i.e. an opening and closing reasoning marker (deleted DELIMITER also since it's a special case with the opening marker being empty). However, it will check the assistant prefill for those markers and will append them to the input for the grammar and the parser, so that they are taken into account, therefore just simplifying the parsing mechanism since it doesn't now have to differentiate whether the <think>' / ` was added by the template or generated by the model.

pwilkin · 2026-03-11T20:51:47Z

Fixes #20356
Fixes #20325
Fixes #20265

This also clears the ground for disabling grammar triggers inside reasoning loops in a subsequent PR, which would resolve #20260

aldehir · 2026-03-11T21:17:53Z

Dumb question, why not find the start of the assistant message and prepend that?

I agree it would be easier to parse if we had a "prefill" of some sort that normalizes the input, such that we can handle the logic in the grammar and not through flags. However, if we're going this route I would look into prepending the start of the entire assistant message. This will also open the door for parsing output from requests with an assistant prefill.

pwilkin · 2026-03-11T21:30:35Z

Yeah, that would be the logical conclusion, but for now it's easier for me just to extract the reasoning markers since finding the actual start of the assistant message is nontrivial.

aldehir · 2026-03-11T23:09:26Z

Qwen3.5 uses

<think>\n\n</think>\n\n

{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- else %}
{{- '<think>\n' }}
{%- endif %}

however,

      "reasoning_prefill": "<think></think>\n\n",

It probably doesn't matter for this model, but it is technically not adhering to the template.

aldehir · 2026-03-11T23:15:21Z

    {
      "id": 248045,
      "piece": "<|im_start|>"
    },
    {
      "id": 74455,
      "piece": "assistant"
    },
    {
      "id": 198,
      "piece": "\n"
    },
    {
      "id": 248068,
      "piece": "<think>"
    },
    {
      "id": 271,
      "piece": "\n\n"
    },
    {
      "id": 248069,
      "piece": "</think>"
    },
    {
      "id": 271,
      "piece": "\n\n"
    }

Maybe set reasoning_prefill from the start of the opening tag to the end of the prompt?

aldehir · 2026-03-11T23:32:03Z

finding the actual start of the assistant message is nontrivial.

Run the template once with add_generation_prompt = false, capture the size, run again with true, extract the string content that spans the delta? I think that would work in most cases.

pwilkin · 2026-03-12T00:27:07Z

That usually works, yeah 😀 I can try that and see what the results are (this is what calculate_diff_split from the analyzer does BTW). I'm just worried about some weird edge cases.

bsdice · 2026-03-14T11:10:37Z

Nice patch! With model https://huggingface.co/mradermacher/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking-GGUF the patches fix webui getting confused on /think and not splitting correctly reasoning and generation part. Build llama.cpp-cuda-git-b8334.r9.710878a7dd-1.

pwilkin · 2026-03-14T14:50:50Z

@aldehir changed the prefill extraction behavior to the differential one you mentioned.

aldehir · 2026-03-14T19:06:06Z

common/chat.h

    std::string                         grammar;
    bool                                grammar_lazy         = false;
-    bool                                thinking_forced_open = false;
+    std::string                         prefill;


Think we name this generation_prompt? It lines up with the add_generation_prompt flag.

aldehir · 2026-03-14T20:39:17Z

common/chat-auto-parser-generator.cpp

+    bool clear_reasoning_start = false;
+    if (inputs.reasoning_format != COMMON_REASONING_FORMAT_NONE &&
+        autoparser.reasoning.mode != reasoning_mode::NONE &&
+        !autoparser.reasoning.end.empty()) {
+        const auto & r_start    = autoparser.reasoning.start;
+        const auto & r_end      = autoparser.reasoning.end;
+        auto         r_end_t    = trim_trailing_whitespace(r_end);
+        auto         r_start_t  = trim_trailing_whitespace(r_start);
+
+        if (!r_start_t.empty()) {
+            auto start_pos = prompt_to_search.rfind(r_start_t);
+            if (start_pos != std::string::npos) {
+                std::string from_start = prompt_to_search.substr(start_pos);
+                auto         fs_trimmed = trim_trailing_whitespace(from_start);
+
+                if (string_ends_with(fs_trimmed, r_end_t)) {
+                    data.prefill = r_start + r_end;
+                } else if (string_ends_with(fs_trimmed, r_start_t)) {
+                    data.prefill = from_start;
+                } else {
+                    clear_reasoning_start = true;
+                }
+            }
+        }
+    }


So my understanding is: we have a generation prompt G, we can create a parser that accepts G[0:max(G.size(), G.index_of(reasoning_start))] + (reasoning_start + reasoning + reasoning end)? + .... Then we can do away with all the trim logic.

The benefit is that now the parser can properly parse assistant prefill from the user, since the parser starts from the beginning of the assistant message.

I see that Mistral's templates have no generation prompt, so G = "". But this is fine, because the model emits the [THINK] tag. So the above still works.

This workaround is mostly for Apriel that has a delimited thinking format and inserts a header "Thinking chain starts here: " or something like that as the generation prompt which acts as a quasi-reasoning marker that we want to strip.

pwilkin · 2026-03-15T01:40:25Z

@aldehir okay, that rewrite ended up being a bit bigger than I expected... but it's exactly the algorithm you mentioned now.

aldehir · 2026-03-15T01:59:19Z

Oh jeez, well it's <100 net LOC. I'll give it a whirl.

pwilkin · 2026-03-15T15:03:13Z

@aldehir happy to report I added another nice piece of code to make it work correctly with grammars / schemas :)

ggerganov

I'm not sure how this new logic interacts with the existing logic for feeding the prompt to the sampler:

llama.cpp/tools/server/server-context.cpp

Lines 198 to 220 in 07c6a59

    
           void init_sampler() const { 
        
               common_sampler_reset(smpl.get()); 
        
               if (!task->need_sampling()) { 
        
                   return; 
        
               } 
        
               const int64_t t_start = ggml_time_us(); 
        
               int n_text = 0; 
        
               for (int i = 0; i < (int) prompt.tokens.size(); i++) { 
        
                   const llama_token id = prompt.tokens[i]; 
        
                   if (id != LLAMA_TOKEN_NULL) { 
        
                       common_sampler_accept(smpl.get(), id, false); 
        
                       n_text++; 
        
                   } 
        
               } 
        
               SLT_INF(*this, "init sampler, took %0.2f ms, tokens: text = %d, total = %d\n", 
        
                       (ggml_time_us() - t_start) / 1000.0, n_text, (int) prompt.tokens.size()); 
        
           }

In general the new common_params_sampling.grammar_external external flag feels off. I would look to avoid it.

common/sampling.cpp

pwilkin · 2026-03-15T18:40:13Z

In general the new common_params_sampling.grammar_external external flag feels off. I would look to avoid it.

I'd really love to avoid it, but there's a key problem here.

Together with @aldehir we figured out an algorithm to clearly handle cases of template generation prompt prefill - we simply add the prefill to the parser, this is reflected in the grammar and therefore we handle this irrelevant of what is added by the template and what is generated by the model.

If an external json_schema is added to constrain output, it's also not a problem since the json schema is not directly converted into the grammar - instead, it's fed into the parser and only reflected in the grammar via the parser, so we handle this.

However, there's one case we can't handle, which is the user explicitly specifying a grammar. Because the grammar passed by the user is non-lazy and because it's not aware of the generation prompt prefill, it will fail exactly on the prefill part, which is why it's critical to notify the mechanism that it's an externally passed grammar (and not one generated within the parser engine) so that the prefill part is not added.

aldehir · 2026-03-15T19:47:19Z

Hmm, this on me. I didn't think through the grammar impact. The grammar sampler should be limited to just the generation otherwise it's not technically "sampling." What if we modified how the root rule is defined? This should invoke no changes to the grammar sampler and its inputs. Alternatively we can leverage the grammar_root parameter that defaults to root, which might be an easier solution.

aldehir · 2026-03-15T20:01:27Z

tools/cli/cli.cpp

        inputs.add_generation_prompt = true;
        inputs.reasoning_format      = COMMON_REASONING_FORMAT_DEEPSEEK;
-        inputs.enable_thinking       = common_chat_templates_support_enable_thinking(chat_params.tmpls.get());
+        inputs.enable_thinking       = chat_params.enable_thinking ? common_chat_templates_support_enable_thinking(chat_params.tmpls.get()) : false;


Can we extract this fix to another PR? Just so we can merge it in sooner. I think this PR needs a bit more brainstorming.

Extracted to #20606

aldehir · 2026-03-15T20:39:49Z

models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja

    {%- for tool in message['tool_calls']-%}
      {%- if not ns.is_first -%}
-        {{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}
+        {{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] | tojson + '\n' + '```' + '<｜tool▁call▁end｜>'}}


I'm thinking we use the JSON representation when dumping a JSON object in the Jinja engine itself.

Well, no, we do not, as I found out when I removed the "Python dict" support.

We use the Python dict representation.

Or, more precisely, use double quotes if the internal string contains a single quote and single-quotes when it does not:

// TODO: avoid circular references std::string value_to_string_repr(const value & val) { if (is_val<value_string>(val)) { const std::string val_str = val->as_string().str(); if (val_str.find('\'') != std::string::npos) { return value_to_json(val); } else { return "'" + val_str + "'"; } } else { return val->as_repr(); } }

I would really love it to use the JSON representation as standard, but I'm worried there might be some "original Python implementation faithfulness" shenanigans. @ngxson @CISC do you guys know if we could make | tojson the official string representation for objects?

An argument can be made that Jinja will respect __str__() if wrapped in a JSON type that serializes to JSON.

import json from jinja2 import Template tmpl = Template("{{ obj }}") python_dict = dict(a=1, b=2, c=dict(d=3, e=4)) print("Python Dict:", tmpl.render(obj=python_dict)) class JSON: def __init__(self, value): self.value = value def __str__(self): return json.dumps(self.value) json_obj = JSON(dict(a=1, b=2, c=dict(d=3, e=4))) print("JSON Object:", tmpl.render(obj=json_obj))

$ ./example.py Python Dict: {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}} JSON Object: {"a": 1, "b": 2, "c": {"d": 3, "e": 4}}

I'd be really glad to change it because it would mean we don't have to change the templates and we handle all the templates that don't use | tojson on arguments - but I won't make the call myself :)

That's not what's happening here though, this template does not support non-string arguments, that's all there is to it.

Fair, I did not test with string concat which throws an exception.

There is no winning here. Either we don't parse arguments and break templates that iterate its values. Or we do, a break templates that expect a string.

I'd be really glad to change it because it would mean we don't have to change the templates and we handle all the templates that don't use | tojson on arguments - but I won't make the call myself :)

It's not acceptable, it's not how this behaves with jinja2, either arguments is sent as a str, or it's been parsed with json.loads, in which case it's a dict, in other words this particular template will give you the following error:

TypeError: can only concatenate str (not "dict") to str

Anyway, I don't understand the issue, don't we have a capability check for this?

capability check for this

No, the arguments are indiscriminately parsed. I don't see a capability check.

capability check for this

No, the arguments are indiscriminately parsed. I don't see a capability check.

Hmmm, maybe that was minja, either way, we should probably add one.

pwilkin · 2026-03-15T20:59:55Z

Hmm, this on me. I didn't think through the grammar impact. The grammar sampler should be limited to just the generation otherwise it's not technically "sampling." What if we modified how the root rule is defined? This should invoke no changes to the grammar sampler and its inputs. Alternatively we can leverage the grammar_root parameter that defaults to root, which might be an easier solution.

Yes, I thought of this path as well. Thing is, I'm not really sure this would be easier. We still have to differentiate the user-supplied grammars from the internally-generated grammars to know which ones we need to patch the root rule for.

Regarding the sampling, the thing is we are technically simulating the sampling of the generation prompt, that's the whole idea of this approach - to make it basically invisible to the parser / grammar whether the given fragment was generated by the template or by the model itself.

pwilkin · 2026-03-15T21:12:26Z

BTW @ggerganov is the weird thingy where you have to add a space in front of certain tokens still necessary? It looks really bizarre and it seems to break idempotence (as in detokenize(tokenize(x)) != x).

aldehir · 2026-03-15T22:12:45Z

I'm starting to think there isn't a clean solution here. To address the original issue, perhaps we should simply adopt the previous checks for the end thinking tag and create a custom parser for templates that deviate.

I believe even a "reasoning prefill" will have challenges when incorporating with the grammar and reasoning samplers.

pwilkin · 2026-03-15T22:31:37Z

There isn't a clean solution, but I quite like this one because it actually realizes the idea of treating template prefill as forced model output. If you think about it, what we're doing is not really that different from the reasoning sampler forcing the end-thinking tag - but instead of forcing the model to generate it, we just simulate the generation for the prefill.

This makes the parsing for reasoning elegant and simple and avoids all problems with deviant / weird / atypical templates. I think having one extra flag, even if a big hackish, is an appropriate price to pay for it.

pwilkin · 2026-03-15T22:32:16Z

Also, I tested this with the reasoning sampler and it works fine.

ggerganov · 2026-03-16T07:24:12Z

BTW @ggerganov is the weird thingy where you have to add a space in front of certain tokens still necessary? It looks really bizarre and it seems to break idempotence (as in detokenize(tokenize(x)) != x).

Not sure where the spaces come from. There was something about having to apply normalize() to restore the identity of tokenize + detokenize, but don't remember the details.

ggerganov · 2026-03-16T07:37:41Z

common/sampling.cpp

@@ -259,7 +282,7 @@ struct common_sampler * common_sampler_init(const struct llama_model * model, st
            params.reasoning_budget_end,
            params.reasoning_budget_forced,
            params.reasoning_budget_tokens,
-            params.reasoning_budget_activate_immediately ? REASONING_BUDGET_COUNTING : REASONING_BUDGET_IDLE));
+            prefill_tokens));
    }


The main issue that I see is that this logic is technically the same as the logic here:

llama.cpp/tools/server/server-context.cpp

Lines 198 to 220 in d393649

void init_sampler() const {

common_sampler_reset(smpl.get());

if (!task->need_sampling()) {

return;

}

const int64_t t_start = ggml_time_us();

int n_text = 0;

for (int i = 0; i < (int) prompt.tokens.size(); i++) {

const llama_token id = prompt.tokens[i];

if (id != LLAMA_TOKEN_NULL) {

common_sampler_accept(smpl.get(), id, false);

n_text++;

}

}

SLT_INF(*this, "init sampler, took %0.2f ms, tokens: text = %d, total = %d\n",

(ggml_time_us() - t_start) / 1000.0, n_text, (int) prompt.tokens.size());

}

Unless I am missing something, the common_sampler should work like this:

We can initialize an object:

without grammar, without reasoning budget

without grammar, with reasoning budget

with grammar, without reasoning budget

with grammar, with reasoning budget

In all cases, we need to feed any "initial" (a.k.a. "prefill") tokens to both the grammar and the reasoning budget samplers

So this feeding logic should become part of the common_sampler initialization

I don't see why we need the params.grammar_external flag - we always have to feed the tokens to the grammar and reasoning samplers, if they exist

Maybe try to remove the old prefill logic from the server and consolidate it in common_sampler_init().

We need the grammar_external flag because otherwise we'll break backwards compatibility.

Some templates add stuff like the assistant role marker (<|im_begin|>assistant) in the prefill. User grammars generally only expect to parse what's present in the content section. The idea is to allow the new behavior of simulating the sampling of the prefill tokens but also guaranteeing backwards compatibility (this is why the server tests failed before I added the fix:

https://github.com/ggml-org/llama.cpp/actions/runs/23100821002

As for the common_sampler initialization, I'll take a look into it.

User grammars generally only expect to parse what's present in the content section.

Ah yes, I forgot about that.

@ggerganov okay, so I checked the flow and we can't really move this to common_sampler_init().

The way common_sampler_init() works is it resets and reinitializes the sampler state for all samplers other than the grammar sampler. The grammar sampler is a different beast from the other samplers since its state depenendence is different - it doesn't depend on the entire prompt, but on the current message (same with the reasoning sampler). Therefore, if we wanted to do this uniformly, we'd have to do something like this:

modify the sampler struct to add a scope parameter (PROMPT, MESSAGE)

keep track of where the beginning of the current message starts (non-trivial since template does this and templates routinely modify the message history eg. by removing reasoning content so we can't just rely on differential analysis here)

run accept for PROMPT-scope samplers for the entire prompt and for MESSAGE-scope samplers for the current message only

This would make the behavior cleaner, but is a big change probably outside the scope of this PR.

I think it's much simpler than that.

Each sampler has an accept() call that consumes a token. Generally we want to feed all tokens from the current context into the samplers. Hence the loop that I referenced in the server. The grammar sampler is the only exception - the false argument to common_sampler_accept() makes it so that we don't feed the grammar sampler with the "prefill" tokens.

So I am thinking that the newly added logic in common_sampler_init() for collecting "prefill" tokens and passing it to the reasoning sampler is not necessary. The reasoning sampler should accept all of its token through common_sampler_accept() - same as all other samplers (except the grammar).

Most likely the common_reasoning_budget_init() needs to be updated to not accept prefill tokens at all.

@ggerganov We definitely do want to pass prefill tokens to the reasoning sampler, the same way we pass it to the grammar sampler, simply because the two samplers operate on the same principle - their context is the single assistant message that is being generated, not the entire conversation (prompt).

However, I now finally understand why the entire flow is so confusing. So when a completion task gets assigned to a slot, this happens:
-> task gets assigned to a slot
-> chat template gets applied, processing the message history to a tokenizable prompt and adding assistant prefill tokens
-> the prompt (with prefill tokens) gets tokenized
-> the tokenized prompt is passed to the budget and reasoning parsers together with the prefill part; those parsers are initialized with just the tokens of the assistant prefill
-> model processes the prompt
-> now that generation is about to start, all the other samplers get reset and fed the entire tokenized prompt together with the assistant prefill
-> all samplers - the reasoning+grammar ones initialized earlier and the generation ones initialized now - start working on the outputs generated by the model

It's really counterintuitive why the sampler reset + initialization is done on generation start, when it could just as well be done during init (right now it doesn't matter that much, but if we have backend samplers, they could theoretically run the prefill / accept phase in parallel with the model doing prompt processing).

However, my point still stands - we can't really equate the two prefills because they differ in what they understand as "context". For the normal samplers, the entire prompt is the context - the entire message history together with the assistant prefill, before the generation phase. For the reasoning and grammar samplers, only the assistant prefill (the current assistant message) is the context. We could init them in the init_sampler() stage and have a unified prefill stage, but only if we mark where the assistant message starts in the prompt and the samplers themselves are aware whether they require full context or just assistant prefill context.

In other words, the loop inside init_sampler() would become:

for (int i = 0; i < (int) prompt.tokens.size(); i++) { const llama_token id = prompt.tokens[i]; if (id != LLAMA_TOKEN_NULL) { common_sampler_accept(smpl.get(), id, i >= asst_prefill_start_pos); n_text++; } }

where asst_prefill_start_pos would be calculated during task init as the position within the prompt where the assistant prefill tokens start, the third parameter to common_sampler_accept would change its meaning from accept_grammar to is_assistant_prefill_token and the samplers would know whether they require full context or just assistant prefill context. Then we could get rid of the loop in common_sampler_init :)

…art to common/reasoning-budget.cpp

pwilkin requested review from aldehir, allozaur, ggerganov and ngxson as code owners March 11, 2026 20:49

github-actions bot added documentation Improvements or additions to documentation testing Everything test related examples server labels Mar 11, 2026

pwilkin mentioned this pull request Mar 11, 2026

Eval bug: unsloth/Qwen3.5-35B-A3B-GGUF peg-native chat format parser fails when model outputs text before <tool_call> (thinking model + tool calling) #20260

Open

loci-dev mentioned this pull request Mar 12, 2026

UPSTREAM PR #20424: common/parser: add proper reasoning tag prefill reading auroralabs-loci/llama.cpp#1247

Open

aldehir mentioned this pull request Mar 13, 2026

Eval bug: Response always starts with </think> tag when running Qwen3.5 9B #20516

Open

pwilkin force-pushed the reasoning-prefill branch from 3bfb08f to 4083259 Compare March 14, 2026 14:49

aldehir reviewed Mar 14, 2026

View reviewed changes

pwilkin requested review from a team as code owners March 15, 2026 15:02

pwilkin mentioned this pull request Mar 15, 2026

Handle reasoning budget #20297

Merged

pwilkin force-pushed the reasoning-prefill branch from d0cf846 to a21d219 Compare March 15, 2026 16:18

github-actions bot added the python python script changes label Mar 15, 2026

ggerganov reviewed Mar 15, 2026

View reviewed changes

common/sampling.cpp Outdated Show resolved Hide resolved

aldehir reviewed Mar 15, 2026

View reviewed changes

winstonma mentioned this pull request Mar 15, 2026

Eval bug: Extremely slow generation on llama.cpp with Qwen3.5-35B-A3B-GGUF models when the log shows Chat format: peg-constructed #20099

Open

aldehir reviewed Mar 15, 2026

View reviewed changes

ggerganov reviewed Mar 16, 2026

View reviewed changes

pwilkin added 9 commits March 16, 2026 14:37

Reasoning prefill

52bc381

wip

7af2ef7

Final touches

a81dc4c

Minor.

ea42cfb

Implement proper prefill extraction

854b0dc

Oof.

2132c10

This is getting more and more complicated by the minute...

d4a4efd

Refactor cli parameters, update docs, move reasoning budget sampler p…

f5534a0

…art to common/reasoning-budget.cpp

chore: update webui build output

eab5b46

pwilkin force-pushed the reasoning-prefill branch from 8df64b2 to eab5b46 Compare March 16, 2026 13:41

	void init_sampler() const {
	common_sampler_reset(smpl.get());

	if (!task->need_sampling()) {
	return;
	}

	const int64_t t_start = ggml_time_us();

	int n_text = 0;

	for (int i = 0; i < (int) prompt.tokens.size(); i++) {
	const llama_token id = prompt.tokens[i];

	if (id != LLAMA_TOKEN_NULL) {
	common_sampler_accept(smpl.get(), id, false);
	n_text++;
	}
	}

	SLT_INF(*this, "init sampler, took %0.2f ms, tokens: text = %d, total = %d\n",
	(ggml_time_us() - t_start) / 1000.0, n_text, (int) prompt.tokens.size());
	}

Conversation

pwilkin commented Mar 11, 2026

Uh oh!

pwilkin commented Mar 11, 2026

Uh oh!

aldehir commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Mar 11, 2026

Uh oh!

aldehir commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Mar 11, 2026

Uh oh!

aldehir commented Mar 11, 2026

Uh oh!

pwilkin commented Mar 12, 2026

Uh oh!

bsdice commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Mar 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aldehir Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pwilkin commented Mar 15, 2026

Uh oh!

aldehir commented Mar 15, 2026

Uh oh!

pwilkin commented Mar 15, 2026

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pwilkin commented Mar 15, 2026

Uh oh!

aldehir commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pwilkin Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aldehir Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aldehir commented Mar 11, 2026 •

edited

Loading

aldehir commented Mar 11, 2026 •

edited

Loading

bsdice commented Mar 14, 2026 •

edited

Loading

aldehir Mar 14, 2026 •

edited

Loading

aldehir commented Mar 15, 2026 •

edited

Loading

aldehir Mar 15, 2026 •

edited

Loading

pwilkin Mar 15, 2026 •

edited

Loading

aldehir Mar 15, 2026 •

edited

Loading

CISC Mar 15, 2026 •

edited

Loading

pwilkin Mar 16, 2026 •

edited

Loading