Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -587,7 +587,7 @@ struct common_params {
// server params
int32_t port = 8080; // server listens on this network port
bool reuse_port = false; // allow multiple sockets to bind to the same port
int32_t timeout_read = 600; // http read timeout in seconds
int32_t timeout_read = 3600; // http read timeout in seconds
int32_t timeout_write = timeout_read; // http write timeout in seconds
int32_t n_threads_http = -1; // number of threads to process HTTP requests (TODO: support threadpool)
int32_t n_cache_reuse = 0; // min chunk size to reuse from the cache via KV shifting
Expand Down
6 changes: 4 additions & 2 deletions tools/server/server-queue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -381,8 +381,10 @@ server_task_result_ptr server_response_reader::next(const std::function<bool()>
if (result == nullptr) {
// timeout, check stop condition
if (should_stop()) {
SRV_WRN("%s", "stopping wait for next result due to should_stop condition (adjust the --timeout argument if needed)\n");
SRV_WRN("%s", "ref: https://github.com/ggml-org/llama.cpp/pull/22907\n");
const int64_t time_elapsed_ms = ggml_time_ms() - time_start_ms;
if (time_elapsed_ms > 30000) {
SRV_WRN("%s", "request cancelled after 30s, potentially a client-side timeout; please check your client's code\n");
}
Comment on lines +385 to +387
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: it would be better to detect if time_elapsed_ms > server's --timeout here, then log another message. but due to the way things are structured, this proved to be quite complicated

Copy link
Copy Markdown

@sasa7812 sasa7812 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the client-set timeout for request be easier to log? It would have avoided the confusion

Copy link
Copy Markdown
Contributor Author

@ngxson ngxson May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how? AFAIK client never communicate such info to server

Copy link
Copy Markdown

@bonswouar bonswouar May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might misunderstand the PR but isn't the log request cancelled after 30s a bit confusing?
It could show this message with a time_elapsed_ms anywhere between 30s & 3600s, but for an user reading this log it looks like it's exactly 30s.
Why not use the actual time_elapsed_ms value?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how? AFAIK client never communicate such info to server

You are right, I should have checked before asking. The whole log messaging was just implying that request is cancelled on server initiative.

return nullptr;
}
} else {
Expand Down
2 changes: 2 additions & 0 deletions tools/server/server-queue.h
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ struct server_response_reader {
bool cancelled = false;
int polling_interval_seconds;

const int64_t time_start_ms = ggml_time_ms();

// tracking generation state and partial tool calls
// only used by streaming completions
std::vector<task_result_state> states;
Expand Down
Loading