Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 2.12 KB

Loop-interchange.md

File metadata and controls

59 lines (46 loc) · 2.12 KB

Loop interchange

Loop interchange is a performance optimization technique that is used to improve the loop's memory access pattern and potentially enable vectorization. Loop interchange if applied correctly can yield a huge performance improvement.

Loop interchange as an optimization technique is applicable to loop nests. A loop nest consists of two or more nested loops. After loop interchange, a loop that is once outer has now become inner, and the loop that was once inner has become outer.

To illustrate loop interchange, consider the following example:

for (int i = 0; i < n; i++) {
  for (int j = 0; j < n; j++) {
    b[i] += c[j][i];
  }
}

If we analyze the memory access pattern for the arrays b and c in the context of the innermost loop, we can determine that the access to array b is constant, whereas access to the matrix c is strided. By doing loop interchange, we get the following loop nest:

for (int j = 0; j < n; j++) {
  for (int i = 0; i < n; i++) {
    b[i] += c[j][i];
  }
}

Notice that the loop over j, that was originally the inner loop, is now the outer loop. Similarly, the loop over i that was originally an outer loop is now the inner loop.

The memory access pattern for b[i] is now sequential (originally it was constant) and the memory access pattern for c[j][i] is now sequential too (originally it was strided). The memory access pattern allows this loop to be vectorized.

Requirements for loop interchange

Loop interchange is possible if the following conditions are fulfilled:

  • The loops are perfectly nested: all the statements are inside the innermost loop. If this is not the case, often it is possible to make them perfectly nested, see Perfect-loop nesting on how to do it.

  • The value of the inner loop's iterator variable should be independent of the value of the outer's loop iterator variable.

  • Sometimes loop interchange is not possible in the presence of loop-carried dependencies.