server: fix data race in to_json_anthropic by ngxson · Pull Request #18283 · ggml-org/llama.cpp

ngxson · 2025-12-22T11:10:18Z

ggerganov · 2025-12-22T19:16:06Z

tools/server/server-task.cpp

+    bool text_block_started = false;

    if (first) {
        text_block_started = false;


Hm, this logic seems broken - @noname22 PTAL as the author of this code.

I'll have a look tomorrow

noname22 · 2025-12-23T13:32:11Z

The original code used static bool to track state across multiple calls to to_json_anthropic() during streaming. On the first streaming chunk, it should emit content_block_start (and set text_block_started = true). On subsequent chunks, it should skip content_block_start since the block is already started

With this changed to just a bool the variable is now reset to false on every call, which means that every single streaming chunk will emit a content_block_start event.

I agree that the static bool was a bad solution and problematic with concurrent users/calls but we can't solve it like this.

Could we maybe store text_block_started as a member variable of the server_task_result_cmpl_partial class (or in whatever struct tracks per-request state), so each request has its own isolated state without sharing it across concurrent requests?

ngxson · 2025-12-23T13:43:35Z

then what about this line of code?

llama.cpp/tools/server/server-task.cpp

Line 1153 in 849d021

bool first = (n_decoded == 1);

noname22 · 2025-12-23T14:59:54Z

I'm not sure what you mean. Yes, we could add text_block_started to server_task_result_cmpl_partial just like n_decoded.

If you mean to do something like:

if (first && !diff.content_delta.empty()) {
      // emit content_block_start
  }

That could work too but only if the first token is guaranteed to be non-empty. Otherwise we would fail to generate the block start.

Is there any situation when models produce empty deltas?

ggerganov · 2026-01-05T11:19:14Z

Could we maybe store text_block_started as a member variable of the server_task_result_cmpl_partial class (or in whatever struct tracks per-request state), so each request has its own isolated state without sharing it across concurrent requests?

@noname22 I think the closest logic to this is the server_slot::has_new_line state:

llama.cpp/tools/server/server-context.cpp

Lines 111 to 115 in e66ed75

    
           bool has_next_token = true; 
        
           bool has_new_line   = false; 
        
           bool truncated      = false;

You can keep track of this in a similar way and populate the server_task_result_cmpl_partial as needed, in a similar way as we populate the server_task_result_cmpl_final here:

llama.cpp/tools/server/server-context.cpp

Lines 1448 to 1450 in e66ed75

    
           res->n_tokens_cached     = slot.prompt.n_tokens(); 
        
           res->has_new_line        = slot.has_new_line; 
        
           res->stopping_word       = slot.stopping_word;

ngxson · 2026-01-05T11:27:09Z

@noname22 I don't get why this logic is needed if we already had the same logic as on OAI, using the bool first that I pointed out earlier.

To remind, OAI also send some "starting" chunks before sending the stream of tokens. I believe you are trying to reinvent the same thing here.

If you really need to have extra logic to track the response state, use response->update() that was documented here

ggerganov · 2026-01-05T11:43:49Z

If you really need to have extra logic to track the response state, use response->update() that was documented here

@ngxson Should we move all task-related state from server_slot (such as has_new_line) to the task_result_state?

ngxson · 2026-01-05T12:10:15Z

Yes, I think it's also useful to move anything related to response formatting (e.g. res_type, oaicompat_cmpl_id, etc) to task_result_state, as they are currently simply copied to from task to result each time we create a new result.

The long-term direction would be to also move string-related logic like validate_utf8, slot.generated_text and maybe even detokenizing to task_result_state, so they will be handled by HTTP thread. Though, I'm not sure if it worth doing, it can be quite complicated while not improving performance as much.

server: fix data race in to_json_anthropic

666c0c1

ngxson requested a review from CISC December 22, 2025 11:10

ngxson requested a review from ggerganov as a code owner December 22, 2025 11:10

github-actions bot added examples server labels Dec 22, 2025

CISC approved these changes Dec 22, 2025

View reviewed changes

ngxson merged commit 3997c78 into ggml-org:master Dec 22, 2025
68 of 71 checks passed

ggerganov reviewed Dec 22, 2025

View reviewed changes

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

server: fix data race in to_json_anthropic (ggml-org#18283)

825ba8b

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

server: fix data race in to_json_anthropic (#18283)

4be1ded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: fix data race in to_json_anthropic#18283

server: fix data race in to_json_anthropic#18283
ngxson merged 1 commit intoggml-org:masterfrom
ngxson:xsn/server_anthropic_fix

ngxson commented Dec 22, 2025

Uh oh!

Uh oh!

ggerganov Dec 22, 2025

Uh oh!

noname22 Dec 22, 2025

Uh oh!

noname22 commented Dec 23, 2025

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

noname22 commented Dec 23, 2025

Uh oh!

ggerganov commented Jan 5, 2026

Uh oh!

ngxson commented Jan 5, 2026 •

edited

Loading

Uh oh!

ggerganov commented Jan 5, 2026

Uh oh!

ngxson commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ngxson commented Dec 22, 2025

Uh oh!

Uh oh!

ggerganov Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

noname22 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

noname22 commented Dec 23, 2025

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

noname22 commented Dec 23, 2025

Uh oh!

ggerganov commented Jan 5, 2026

Uh oh!

ngxson commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jan 5, 2026

Uh oh!

ngxson commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Jan 5, 2026 •

edited

Loading