UPSTREAM PR #17595: server: move server-context to its own cpp|h by loci-dev · Pull Request #364 · auroralabs-loci/llama.cpp

loci-dev · 2025-11-29T17:34:46Z

Extracted part of the changes in ggml-org/llama.cpp#17554 into this dedicated PR, just in case something goes wrong it's easier to trace back.

Compare to the proposed approach in the mentioned PR, which simply move everything to .h, this PR do some extra thing:

Moving code via a dedicated commit using git mv, so that auto-merge can be more happy (I hope so, will need to test)
Expose only a subset of infrastructure via server-context.h; so for example, server_slot is now a private implementation
Simplify the public API of server_context, consolidate everything into 4 main functions: init(), load_model(), start_loop(), terminate()

This should allow easier integration of server inside CLI, while allow downstream to incorporate server as a library (cc @bandoti , probably pre-cursor to llamax)

loci-review · 2025-11-29T18:10:36Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #364

Overview

This PR performs a pure code refactoring that extracts server context management from server.cpp into dedicated server-context.cpp and server-context.h files. The changes reorganize 3,610 lines of code without modifying any algorithmic logic or performance-critical paths.

Performance Impact

Analysis across all 16 binaries shows zero measurable performance impact:

Response Time: No changes detected
Throughput Time: No changes detected
Power Consumption: 0.0% change across all binaries

The refactoring does not touch inference or tokenization functions (llama_decode, llama_encode, llama_tokenize). No functions within the Performance-Critical Areas (Model Processing, Token Processing, Memory Management, Batch Processing) were modified. The changes are limited to code organization and API surface definition.

Tokens per Second Impact: None. The inference pipeline remains unchanged as no tokenization or decoding functions were modified.

Power Consumption: All binaries maintain identical power consumption profiles, confirming the refactoring produces functionally equivalent machine code.

This is a maintenance-focused change that improves code organization without affecting runtime characteristics.

loci-review · 2025-11-29T19:18:21Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #364

Overview

This PR performs a code refactoring that extracts server context implementation into dedicated files (server-context.cpp and server-context.h). The changes reorganize 3,719 lines of code without modifying any algorithmic logic or performance-critical execution paths.

Performance Impact

Zero measurable performance impact detected across all metrics:

Response Time: No function-level changes detected
Throughput Time: No function-level changes detected
Power Consumption: 0.0% change across all 16 binaries
- libllama.so: +0.50 nJ absolute (within measurement noise)
- All other binaries: 0.00 nJ delta

Inference Performance: No impact on tokens per second. The refactoring does not modify any tokenization or inference functions (llama_decode, llama_encode, llama_tokenize). All performance-critical paths remain unchanged.

Code Changes

The PR implements architectural improvements through the pimpl idiom, moving implementation details from server.cpp (reduced from 3,671 to 15 lines) into server-context.cpp. The public API is simplified to four methods: init(), load_model(), start_loop(), and terminate(). Internal structures (server_slot, server_metrics) are now encapsulated as private implementation details. This is purely a code organization change with identical compiled output.

loci-review · 2025-11-29T20:08:45Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #364

Analysis Type: Code Refactoring
Scope: Server infrastructure reorganization
Performance Impact: None

Summary

This PR implements a pure code refactoring that extracts 3,619 lines from server.cpp into dedicated server-context.cpp and server-context.h files using the Pimpl design pattern. The change reorganizes server context management without modifying any algorithmic logic or data structures.

Performance measurements show zero impact across all metrics. No functions within the Performance-Critical Areas (Model Processing, Token Processing, Memory Management, Batch Processing) were modified. The refactoring affects only server infrastructure code responsible for HTTP request handling and task queue management, which operates outside the inference pipeline.

Power consumption analysis confirms negligible variance (< 0.001%) across all binaries, with the maximum observed change being 1.09 nJ in libllama.so, well within measurement noise.

Inference Impact: None. Token processing functions (llama_decode, llama_encode, llama_tokenize) remain unchanged. No impact on tokens per second throughput.

ngxson added 2 commits November 29, 2025 16:55

git mv

3375c20

add server-context.h

0150602

loci-dev temporarily deployed to PROD__AL_DEMO November 29, 2025 17:34 — with GitHub Actions Inactive

add server-context.h

9a7b4f3

loci-dev force-pushed the main branch from 7475023 to fc0f51d Compare November 29, 2025 18:10

clean up headers

239c7a2

loci-dev force-pushed the upstream-PR17595-branch_ngxson-xsn/create_server_context branch from 22039aa to 239c7a2 Compare November 29, 2025 18:40

loci-dev temporarily deployed to PROD__AL_DEMO November 29, 2025 18:40 — with GitHub Actions Inactive

ggerganov and others added 3 commits November 29, 2025 20:42

cont : cleanup

26204ae

also expose server_response_reader (to be used by CLI)

8b510b4

fix windows build

2141a3e

decouple server_routes and server_http

c5f8cdf

loci-dev temporarily deployed to PROD__AL_DEMO November 29, 2025 19:33 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 14 times, most recently from 82b1c0b to 8c7587c Compare December 1, 2025 21:07

loci-dev force-pushed the main branch 30 times, most recently from df48f9e to cb46586 Compare December 6, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17595: server: move server-context to its own cpp|h#364

UPSTREAM PR #17595: server: move server-context to its own cpp|h#364
loci-dev wants to merge 8 commits intomainfrom
upstream-PR17595-branch_ngxson-xsn/create_server_context

loci-dev commented Nov 29, 2025

Uh oh!

loci-review bot commented Nov 29, 2025

Uh oh!

loci-review bot commented Nov 29, 2025

Uh oh!

loci-review bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Nov 29, 2025

Uh oh!

loci-review bot commented Nov 29, 2025

Performance Analysis Summary: PR #364

Overview

Performance Impact

Uh oh!

loci-review bot commented Nov 29, 2025

Performance Analysis Summary - PR #364

Overview

Performance Impact

Code Changes

Uh oh!

loci-review bot commented Nov 29, 2025

Performance Analysis Summary - PR #364

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants