UPSTREAM PR #16941: Model: add openPangu-Embedded #69

DajanaV · 2025-11-04T03:45:22Z

Mirrored from ggml-org/llama.cpp#16941

Add a new model openPangu-Embedded-1/7B-V1.1.
Yu can get the the model from model path.

loci-agentic-ai · 2025-11-04T06:47:43Z

Access the complete analysis in the LOCI Dashboard

Based on my analysis of the performance data and code changes, here's the comprehensive performance impact assessment:

Performance Analysis Summary

Critical Function Performance Changes

Primary Performance Degradation

Function: std::vector<std::pair<...>>::begin() (regex_automaton.h:873-874)
Response Time: 208% increase (85ns → 261ns)
Throughput: 282% degradation (62ns → 239ns self-time)
Root Cause: Assembly analysis reveals missing stack canary load instruction and inefficient block splitting

Secondary Performance Impact

Function: std::vector<unsigned int>::back() (regex_automaton.h:1233-1237)
Bottleneck: 538% increase (32ns → 204ns)
Throughput: 266% degradation (70ns → 255ns)

KPI Impact Analysis

1. Tokens Per Second Impact

Status: No Direct Impact Expected

Analysis: The degraded functions are located in regex processing components, not in core inference functions:

llama_decode() - No performance changes detected
llama_encode() - No performance changes detected
llama_tokenize() - No performance changes detected

Conclusion: Based on the reference that 2ms slower llama_decode() results in 7% tokens/second reduction, the current regex-related degradation should not impact inference throughput as these functions are not in the critical inference path.

2. Power Consumption Impact

Status: Minimal Impact

Affected Binary: build.bin.libllama.so

Power Consumption Change: +0.169% (280,662nJ → 281,136nJ)
Absolute Delta: +475nJ increase
Other Binaries: No measurable changes across remaining 15 binaries

Root Cause: Increased CPU cycles from inefficient STL container operations in regex processing.

3. Quantization Efficiency

Status: No Impact

Analysis: No changes detected in quantization-related functions:

llama_model_quantize() - No performance changes
Quantization format handling - Unchanged
GGUF loading mechanisms - Stable performance

4. Memory Usage

Status: Potential Indirect Impact

Affected Areas:

Regex Container Operations: Degraded std::vector::begin() and back() operations suggest potential memory access inefficiencies
Stack Management: Missing stack canary load instruction indicates memory protection overhead
Container Growth: Performance degradation may correlate with increased container sizes in tokenization patterns

No Direct Impact on core memory management functions:

llama_memory_clear() - No changes
llama_memory_seq_rm() - No changes
KV cache operations - Stable performance

5. Batch Processing

Status: No Impact

Analysis: Core batch processing functions show no performance degradation:

llama_batch_init() - No changes
llama_batch_get_one() - No changes
llama_decode() with batches - No changes

Root Cause Analysis

Assembly-Level Issues

Missing Stack Canary Load: Critical security instruction absent in current version
Inefficient Block Splitting: Entry block unnecessarily divided, increasing execution overhead
Compiler Optimization Regression: Suboptimal code generation affecting STL container performance

Code Changes Context

The performance degradation appears unrelated to PR #69 (PanguEmbedded model addition), suggesting:

Compiler optimization changes between builds
Build system modifications affecting STL container code generation
Separate changes not captured in the analyzed pull request

Action Items

Immediate Code-Level Actions

Investigate Compiler Settings: Review optimization flags and STL container code generation between versions
Assembly Code Analysis: Restore missing stack canary load instruction in std::vector::begin() implementation
Block Structure Optimization: Eliminate unnecessary block splitting in entry sequences

Build System Actions

Compiler Version Verification: Ensure consistent compiler versions and optimization settings
STL Implementation Check: Verify STL library versions and container implementation consistency
Debug Symbol Analysis: Compare debug information between versions to identify optimization differences

Performance Monitoring Focus

Regex Processing Paths: Monitor tokenization performance for patterns using regex automaton
Container Operation Profiling: Track STL container performance in text processing pipelines
Memory Access Patterns: Analyze cache efficiency in container operations

Conclusion

The identified performance regression is isolated to regex processing components and does not directly impact core inference performance metrics. The 0.169% power consumption increase in libllama.so represents minimal system impact. Primary concern is the missing security instruction which requires immediate attention for both performance and security reasons.

Lpzhan931 added 3 commits November 4, 2025 11:22

Model: add openPangu-Embedded

ee7b595

fixed according to reviewer's comments

1c754ba

fixed the chat template check condition

5becaad

DajanaV temporarily deployed to PROD__AL_DEMO November 4, 2025 03:45 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from f15bc81 to 3667b8e Compare November 4, 2025 05:09

DajanaV temporarily deployed to PROD__AL_DEMO November 4, 2025 06:13 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from 3667b8e to 491f903 Compare November 4, 2025 12:15

auroralabs-loci deleted a comment from loci-agentic-ai bot Nov 4, 2025

DajanaV force-pushed the main branch 21 times, most recently from 94381d7 to 0eeb29b Compare November 7, 2025 18:11

DajanaV force-pushed the main branch 30 times, most recently from 6196a56 to 39290d7 Compare November 16, 2025 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #16941: Model: add openPangu-Embedded #69

UPSTREAM PR #16941: Model: add openPangu-Embedded #69

Uh oh!

DajanaV commented Nov 4, 2025

Uh oh!

loci-agentic-ai bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #16941: Model: add openPangu-Embedded #69

Are you sure you want to change the base?

UPSTREAM PR #16941: Model: add openPangu-Embedded #69

Uh oh!

Conversation

DajanaV commented Nov 4, 2025

Uh oh!

loci-agentic-ai bot commented Nov 4, 2025

Performance Analysis Summary

Critical Function Performance Changes

Primary Performance Degradation

Secondary Performance Impact

KPI Impact Analysis

1. Tokens Per Second Impact

2. Power Consumption Impact

3. Quantization Efficiency

4. Memory Usage

5. Batch Processing

Root Cause Analysis

Assembly-Level Issues

Code Changes Context

Action Items

Immediate Code-Level Actions

Build System Actions

Performance Monitoring Focus

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants