Skip to content

UPSTREAM PR #17357: chat: fix int overflow, prevent size calculation in float/double#257

Open
DajanaV wants to merge 2 commits intomainfrom
upstream-PR17357-branch_ngxson-xsn/chat_int_overflow
Open

UPSTREAM PR #17357: chat: fix int overflow, prevent size calculation in float/double#257
DajanaV wants to merge 2 commits intomainfrom
upstream-PR17357-branch_ngxson-xsn/chat_int_overflow

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 18, 2025

Mirrored from ggml-org/llama.cpp#17357

Fix ggml-org/llama.cpp#17355

Hmm on second thought, yes it can be better to use size_t, as even when overflow it won't be a negative.

But please note that overflow can still potentially happen.

Also, I didn't notice that * 1.25 will convert it to double, so we should use integer division instead.

ngxson and others added 2 commits November 18, 2025 11:47
@loci-review
Copy link

loci-review bot commented Nov 18, 2025

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Pull Request #257 implements integer overflow fixes and arithmetic optimizations in the chat template system (common_chat_templates_apply_legacy function). The changes include converting int to size_t for buffer allocation, replacing floating-point multiplication with integer arithmetic, and adding safety validation checks.

Key Findings

Performance Regression Identified:

  • Highest Response Time Change: std::vector<std::sub_match>::begin() shows +210.8% increase (84 ns → 260 ns, +176 ns absolute)
  • Highest Throughput Change: std::unordered_set<std::string>::begin() shows +303.0% increase (60 ns → 242 ns, +182 ns absolute)

Core Function Impact Assessment:
The performance regressions occur in STL container iterator functions, not in LLaMA.cpp core inference functions (llama_decode, llama_encode, llama_tokenize). These functions are not part of the critical tokenization/inference pipeline, therefore tokens per second performance remains unaffected.

Power Consumption Analysis:

  • build.bin.llama-tts: +3.97% increase (2833 nJ → 2945 nJ)
  • build.bin.llama-run: -5.56% decrease (2659 nJ → 2511 nJ)
  • build.bin.llama-cvector-generator: -1.23% decrease (3593 nJ → 3549 nJ)

Root Cause Analysis:
CFG comparison reveals the performance regression stems from compiler optimization changes, not the PR code modifications. The new version introduces an unnecessary unconditional branch in the function prologue, fragmenting instruction cache locality and disrupting pipeline efficiency.

Code Quality Assessment:
The PR implements essential correctness improvements:

  • Prevents integer overflow in buffer allocation
  • Eliminates floating-point precision issues
  • Adds defensive validation checks

Actionable Recommendations:

  1. Investigate compiler configuration - Review optimization flags and compiler versions between builds
  2. Consider profile-guided optimization - Address the unconditional branch overhead in container operations
  3. Verify build consistency - Ensure identical compilation environments across versions

The code changes represent sound engineering practices and should not cause performance regressions. The observed issues appear related to build system or compiler optimization differences rather than the implemented fixes.

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from d5eb05c to 819d372 Compare November 24, 2025 19:07
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from cbd9848 to 77b6fa0 Compare December 1, 2025 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants