Skip to content

Conversation

@zyw-bot
Copy link
Collaborator

@zyw-bot zyw-bot commented Jun 19, 2025

Link: llvm/llvm-project#143208
Requested by: @artagnon

@github-actions github-actions bot mentioned this pull request Jun 19, 2025
@zyw-bot
Copy link
Collaborator Author

zyw-bot commented Jun 19, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@01d648a
patch: llvm/llvm-project#143208
sha256: 787218a49f4457bcdf159165c1301adfa260f6997e0296e7ddd65d8f7b5c4026
commit: 16c678d

7 files changed, 2521 insertions(+), 2752 deletions(-)

Improvements:
  indvars.NumElimIdentity 1885 -> 1888 +0.16%
  indvars.NumFoldedUser 2202 -> 2204 +0.09%
  constmerge.NumIdenticalMerged 15663 -> 15671 +0.05%
  loop-delete.NumBackedgesBroken 46391 -> 46405 +0.03%
  indvars.NumElimCmp 57987 -> 58001 +0.02%
  loop-idiom.NumMemCpy 9706 -> 9708 +0.02%
  gvn.NumGVNBlocks 200845 -> 200875 +0.01%
  globalsmodref-aa.NumNonAddrTakenGlobalVars 428951 -> 428965 +0.00%
  loop-delete.NumDeleted 120051 -> 120053 +0.00%
  indvars.NumElimExt 310098 -> 310101 +0.00%
Regressions:
  indvars.NumLFTR 337255 -> 337251 -0.00%
  instsimplify.NumSimplified 2622166 -> 2622148 -0.00%
  gvn.IsValueFullyAvailableInBlockNumSpeculationsMax 610372 -> 610368 -0.00%
  lcssa.NumLCSSA 16136379 -> 16136275 -0.00%
  gvn.NumGVNSimpl 4741634 -> 4741611 -0.00%
  scalar-evolution.NumExitCountsComputed 4284860 -> 4284843 -0.00%
  loop-vectorize.LoopsAnalyzed 2062807 -> 2062805 -0.00%
  memdep.NumUncacheNonLocalPtr 269995956 -> 269995823 -0.00%
  sccp.NumInstRemoved 2106735 -> 2106734 -0.00%
  memdep.NumCacheNonLocalPtr 284061609 -> 284061513 -0.00%

34 56 bench/clamav/optimized/crc.ll
21 32 bench/hdf5/optimized/H5checksum.ll
383 467 bench/linux/optimized/i2c-core-smbus.ll
46 106 bench/openmpi/optimized/crc.ll
53 77 bench/wireshark/optimized/packet-mstp.ll

@github-actions
Copy link
Contributor

The provided LLVM IR diff introduces several changes across multiple benchmark files related to CRC (Cyclic Redundancy Check) table initialization and computation. Below is a high-level summary of the up to 5 major changes, ignoring formatting, comments, and reordering:


1. Introduction of Precomputed CRC Tables as Private Constants

Across all modified files (crc.ll, H5checksum.ll, i2c-core-smbus.ll, packet-mstp.ll), new private global constants like @.crctable or similar are introduced that contain precomputed CRC lookup tables.

  • These tables replace dynamic loop-based CRC computation previously used during initialization.
  • This change likely improves performance by eliminating runtime computation of CRC values in favor of direct memory loads from precomputed data.

Example:

@.crctable = private unnamed_addr constant [256 x i32] [...]

2. Simplification of CRC Initialization Loops

Functions like _Z9InitCRC32Pj, prte_initialize_crc_table, and others were previously computing CRC values using complex bit manipulation loops with shifts and XORs over 8 iterations per byte.

  • These loops have been replaced with simple memcpy operations from the precomputed table.
  • The use of memcpy to copy the static CRC table into the target array significantly simplifies code and reduces runtime overhead.

Before:
Manual CRC polynomial calculation via bitwise operations in a loop.

After:

tail call void @llvm.memcpy.p0.p0.i64(...)

3. Inlining of CRC Table Access into Computation Functions

In several cases (e.g., @i2c_smbus_pec, @dissect_mstp), the original code computed CRC values using iterative bit shifting and conditional XORs in a loop.

  • These computations are now replaced with:
    • A byte-to-index conversion,
    • A load from the precomputed @.crctable,
    • A shift and xor to update the running CRC value.

This transforms an O(8) per-byte loop into a single lookup plus shift/xor, which is more efficient and better suited for modern CPUs.

Key benefit: Eliminates small inner loops in CRC logic, reducing branch mispredictions and improving throughput.


4. Elimination of Polynomial-Based Bitwise Loops

Several functions previously used a loop structure that iterated 8 times per byte to compute CRC manually using the generator polynomial.

  • These loops are removed and replaced with indexed loads from the precomputed CRC table.
  • For example, the function i2c_smbus_pec no longer uses a loop to manipulate bits; instead, it computes the index using zext and getelementptr.

Result: More compact and faster code due to reduced control flow complexity.


5. Refactoring of Loop Structures and PHI Nodes

Loop structures in many functions are rewritten with simplified induction variables and fewer PHIs.

  • Phi nodes are updated to reflect simpler iteration patterns.
  • Some labels and branches associated with inner loops are eliminated.
  • Loop metadata is updated accordingly.

This indicates that the optimizer has recognized and merged or vectorized some of the previous complex control flows, especially where CRC calculations were involved.

Impact: Cleaner control flow graphs, potentially enabling further optimizations like unrolling or vectorization.


Summary

These changes collectively represent a shift from runtime CRC computation to lookup-based CRC using precomputed tables. Key benefits include:

  • Reduced CPU usage due to elimination of per-byte bit-twiddling loops.
  • Faster initialization via memcpy.
  • Better cache utilization and predictability from memory-bound lookups.

The patch appears to be optimizing performance-critical paths involving CRC calculations in various subsystems (filesystems, I2C bus, network protocols, etc.).

model: qwen-plus-latest
CompletionUsage(completion_tokens=807, prompt_tokens=72547, total_tokens=73354, completion_tokens_details=None, prompt_tokens_details=None)

@artagnon
Copy link

artagnon commented Jul 6, 2025

/close

@github-actions github-actions bot closed this Jul 6, 2025
@dtcxzyw dtcxzyw deleted the test-run15762887441 branch July 10, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants