Memory Optimization with Jit #22666

mj023 · 2024-07-25T18:38:35Z

mj023
Jul 25, 2024

I am working on a project that solves user specified economic models on the GPU. To do that I need a for loop that evaluates a function on grid and then calculates a maximum along the axes of the grid. The found maxima are then the input for the next iteration of the loop. I can't really provide a working example of the code, but it generally looks like this:

for period in range(reversed(periods)):
    compute_ccv = vmap(vmap(vmap(vmap(vmap(u(V,W,X,Y,Z, vf_array)...) #Nested Vmap with the vectors that span the grid
    ccv = compute_ccv(V,W,X,Y,Z, vf_array)
    vf_array = ccv.max(axis=[0,1,2})

When I dont use jit, the nested vmap creates a big array in memory of shape [VxWxXxYxZ], as I would expect.

for period in range(reversed(periods)):
    compute_ccv = vmap(vmap(vmap(vmap(vmap(u(V,W,X,Y,Z, vf_array)...) #Nested Vmap with the vectors that span the grid
    ccv = jax.jit(compute_ccv)(V,W,X,Y,Z, vf_array)
    vf_array = ccv.max(axis=[0,1,2})

When I use jit inside of the for loop, like this, the memory consumption is suddenly very low, some kind of optimization seems to be happening.

@jax.jit
def myfunc():
  for period in range(reversed(periods)):
    compute_ccv = vmap(vmap(vmap(vmap(vmap(u(V,W,X,Y,Z, vf_array)...) #Nested Vmap with the vectors that span the grid
    ccv = compute_ccv(V,W,X,Y,Z, vf_array)
    vf_array = ccv.max(axis=[0,1,2})

When I put the jit outside of the for loop, the memory consumption is again the same as when I don't jit anything.

Because I was interested in what optimization the XLA compiler does, I started looking at the output of compute_ccv.lower().compile().as_text(). In both cases, jit inside for loop and outside for loop, the compiled code seems to work with the huge array I would expect, here are some snippets of the output.

Jit outside of For Loop:

%fused_computation.16 (param_0.100: f32[500], param_1.146: f32[500], param_2.169: f32[], param_3.84: f32[500]) -> f32[500,500] {
  %param_0.100 = f32[500]{0} parameter(0)
  %broadcast.685 = f32[500,2,500]{2,1,0} broadcast(f32[500]{0} %param_0.100), dimensions={2}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/le" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=59}
  %broadcast.684 = f32[500,2]{1,0} broadcast(f32[500]{0} %param_0.100), dimensions={0}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/add" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=59}
  %iota.11 = s32[2]{0} iota(), iota_dimension=0
  %convert.43 = f32[2]{0} convert(s32[2]{0} %iota.11), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/convert_element_type[new_dtype=float32 weak_type=True]" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=33}
  %constant_113 = f32[] constant(3)
  %broadcast.682 = f32[2]{0} broadcast(f32[] %constant_113), dimensions={}
  %multiply.101 = f32[2]{0} multiply(f32[2]{0} %convert.43, f32[2]{0} %broadcast.682), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/mul" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=33}
  %broadcast.681 = f32[500,2]{1,0} broadcast(f32[2]{0} %multiply.101), dimensions={1}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/add" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=59}
  %add.133 = f32[500,2]{1,0} add(f32[500,2]{1,0} %broadcast.684, f32[500,2]{1,0} %broadcast.681), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/add" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=59}
  %broadcast.680 = f32[500,2,500]{2,1,0} broadcast(f32[500,2]{1,0} %add.133), dimensions={0,1}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/le" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=59}
  %compare.64 = pred[500,2,500]{2,1,0} compare(f32[500,2,500]{2,1,0} %broadcast.685, f32[500,2,500]{2,1,0} %broadcast.680), direction=LE, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/le" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=59}
  %broadcast.679 = pred[500,500,2,500,500]{4,3,2,1,0} broadcast(pred[500,2,500]{2,1,0} %compare.64), dimensions={0,2,3}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/vmap(vmap(vmap(jit(_where))))/broadcast_in_dim[shape=(2, 500, 500, 500, 500) broadcast_dimensions=(0, 1, 3, 4)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  %param_3.84 = f32[500]{0} parameter(3)
  %broadcast.729 = f32[500,2,500]{2,1,0} broadcast(f32[500]{0} %param_3.84), dimensions={2}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %param_2.169 = f32[] parameter(2)
  %broadcast.728 = f32[500]{0} broadcast(f32[] %param_2.169), dimensions={}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %param_1.146 = f32[500]{0} parameter(1)
  %add.148 = f32[500]{0} add(f32[500]{0} %broadcast.728, f32[500]{0} %param_1.146), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %broadcast.727 = f32[500,2]{1,0} broadcast(f32[500]{0} %add.148), dimensions={0}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/mul" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %broadcast.726 = f32[500,2]{1,0} broadcast(f32[2]{0} %convert.43), dimensions={1}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/mul" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %multiply.113 = f32[500,2]{1,0} multiply(f32[500,2]{1,0} %broadcast.727, f32[500,2]{1,0} %broadcast.726), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/mul" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %broadcast.725 = f32[500,2,500]{2,1,0} broadcast(f32[500,2]{1,0} %multiply.113), dimensions={0,1}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %subtract.59 = f32[500,2,500]{2,1,0} subtract(f32[500,2,500]{2,1,0} %broadcast.729, f32[500,2,500]{2,1,0} %broadcast.725), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %broadcast.724 = f32[500,2,500,500]{3,2,1,0} broadcast(f32[500,2,500]{2,1,0} %subtract.59), dimensions={0,1,2}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %broadcast.723 = f32[500,2,500,500]{3,2,1,0} broadcast(f32[500]{0} %param_1.146), dimensions={3}
  %add.147 = f32[500,2,500,500]{3,2,1,0} add(f32[500,2,500,500]{3,2,1,0} %broadcast.724, f32[500,2,500,500]{3,2,1,0} %broadcast.723), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/sub" source_file="/home/mj/Git_Projects/lcm/examples/long_running.py" source_line=26}
  %broadcast.683 = f32[500,500,2,500,500]{4,3,2,1,0} broadcast(f32[500,2,500,500]{3,2,1,0} %add.147), dimensions={1,2,3,4}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/model_functions.py" source_line=131}
  %constant_156 = f32[] constant(-inf)
  %broadcast.678 = f32[500,500,2,500,500]{4,3,2,1,0} broadcast(f32[] %constant_156), dimensions={}, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/vmap(vmap(vmap(jit(_where))))/broadcast_in_dim[shape=(2, 500, 500, 500, 500) broadcast_dimensions=(1, 2, 3, 4)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  %select.56 = f32[500,500,2,500,500]{4,3,2,1,0} select(pred[500,500,2,500,500]{4,3,2,1,0} %broadcast.679, f32[500,500,2,500,500]{4,3,2,1,0} %broadcast.683, f32[500,500,2,500,500]{4,3,2,1,0} %broadcast.678), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/vmap(vmap(vmap(jit(_where))))/select_n" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  %bitcast.613 = f32[500,500,500000]{2,1,0} bitcast(f32[500,500,2,500,500]{4,3,2,1,0} %select.56), metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/jit(u_and_f)/vmap(vmap(vmap(jit(_where))))/select_n" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  ROOT %reduce.14 = f32[500,500]{1,0} reduce(f32[500,500,500000]{2,1,0} %bitcast.613, f32[] %constant_156), dimensions={2}, to_apply=%region_1.88, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/reduce_max[axes=(0,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/discrete_problem.py" source_line=111}
}

Jit inside of For Loop:

%broadcast.213 = s32[2,500,500]{2,1,0} broadcast(s32[] %constant_13), dimensions={}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.44 = s32[2,500,500]{2,1,0} add(s32[2,500,500]{2,1,0} %convert.7, s32[2,500,500]{2,1,0} %broadcast.213), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %clamp.11 = s32[2,500,500]{2,1,0} clamp(s32[2,500,500]{2,1,0} %broadcast.223, s32[2,500,500]{2,1,0} %add.44, s32[2,500,500]{2,1,0} %broadcast.222), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/jit(clip)/min" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %compare.20 = pred[2,500,500]{2,1,0} compare(s32[2,500,500]{2,1,0} %clamp.11, s32[2,500,500]{2,1,0} %broadcast.223), direction=LT, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/lt" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.212 = pred[2,500,500,500,500]{4,3,2,1,0} broadcast(pred[2,500,500]{2,1,0} %compare.20), dimensions={0,2,4}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.43 = s32[2,500,500]{2,1,0} add(s32[2,500,500]{2,1,0} %clamp.11, s32[2,500,500]{2,1,0} %broadcast.220), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.211 = s32[2,500,500,500,500]{4,3,2,1,0} broadcast(s32[2,500,500]{2,1,0} %add.43), dimensions={0,2,4}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.210 = s32[2,500,500,500,500]{4,3,2,1,0} broadcast(s32[2,500,500]{2,1,0} %clamp.11), dimensions={0,2,4}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %select.16 = s32[2,500,500,500,500]{4,3,2,1,0} select(pred[2,500,500,500,500]{4,3,2,1,0} %broadcast.212, s32[2,500,500,500,500]{4,3,2,1,0} %broadcast.211, s32[2,500,500,500,500]{4,3,2,1,0} %broadcast.210), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/select_n" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.323 = s32[1,2,500,500,500,500]{5,4,3,2,1,0} bitcast(s32[2,500,500,500,500]{4,3,2,1,0} %select.16)
  %concatenate.10 = s32[2,2,500,500,500,500]{5,4,3,2,1,0} concatenate(s32[1,2,500,500,500,500]{5,4,3,2,1,0} %bitcast.327, s32[1,2,500,500,500,500]{5,4,3,2,1,0} %bitcast.323), dimensions={0}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/concatenate[dimension=5]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.322 = s32[125000000000,2]{0,1} bitcast(s32[2,2,500,500,500,500]{5,4,3,2,1,0} %concatenate.10)
  %gather.10 = f32[125000000000,1,1]{0,2,1} gather(f32[500,500]{1,0} %param_0.3, s32[125000000000,2]{0,1} %bitcast.322), offset_dims={1,2}, collapsed_slice_dims={}, start_index_map={0,1}, index_vector_dim=1, slice_sizes={1,1}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/gather[dimension_numbers=GatherDimensionNumbers(offset_dims=(5, 6), collapsed_slice_dims=(), start_index_map=(0, 1)) slice_sizes=(1, 1) unique_indices=False indices_are_sorted=False mode=GatherScatterMode.PROMISE_IN_BOUNDS fill_value=None]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.321 = f32[2,500,500,500,500]{4,3,2,1,0} bitcast(f32[125000000000,1,1]{0,2,1} %gather.10)
  %multiply.33 = f32[2,500,500,500,500]{4,3,2,1,0} multiply(f32[2,500,500,500,500]{4,3,2,1,0} %multiply.34, f32[2,500,500,500,500]{4,3,2,1,0} %bitcast.321), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/mul" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.42 = f32[2,500,500,500,500]{4,3,2,1,0} add(f32[2,500,500,500,500]{4,3,2,1,0} %multiply.35, f32[2,500,500,500,500]{4,3,2,1,0} %multiply.33), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.209 = f32[2,500,500,500,500]{4,3,2,1,0} broadcast(f32[2,500,500]{2,1,0} %subtract.21), dimensions={0,1,3}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %multiply.32 = f32[2,500,500,500,500]{4,3,2,1,0} multiply(f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.209, f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.224), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/mul" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.41 = s32[2,500,500]{2,1,0} add(s32[2,500,500]{2,1,0} %convert.8, s32[2,500,500]{2,1,0} %broadcast.213), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %clamp.10 = s32[2,500,500]{2,1,0} clamp(s32[2,500,500]{2,1,0} %broadcast.223, s32[2,500,500]{2,1,0} %add.41, s32[2,500,500]{2,1,0} %broadcast.222), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/jit(clip)/min" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %compare.19 = pred[2,500,500]{2,1,0} compare(s32[2,500,500]{2,1,0} %clamp.10, s32[2,500,500]{2,1,0} %broadcast.223), direction=LT, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/lt" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.208 = pred[2,500,500,500,500]{4,3,2,1,0} broadcast(pred[2,500,500]{2,1,0} %compare.19), dimensions={0,1,3}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.40 = s32[2,500,500]{2,1,0} add(s32[2,500,500]{2,1,0} %clamp.10, s32[2,500,500]{2,1,0} %broadcast.220), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.207 = s32[2,500,500,500,500]{4,3,2,1,0} broadcast(s32[2,500,500]{2,1,0} %add.40), dimensions={0,1,3}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %broadcast.206 = s32[2,500,500,500,500]{4,3,2,1,0} broadcast(s32[2,500,500]{2,1,0} %clamp.10), dimensions={0,1,3}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/squeeze[dimensions=(5,)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %select.15 = s32[2,500,500,500,500]{4,3,2,1,0} select(pred[2,500,500,500,500]{4,3,2,1,0} %broadcast.208, s32[2,500,500,500,500]{4,3,2,1,0} %broadcast.207, s32[2,500,500,500,500]{4,3,2,1,0} %broadcast.206), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/select_n" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.320 = s32[1,2,500,500,500,500]{5,4,3,2,1,0} bitcast(s32[2,500,500,500,500]{4,3,2,1,0} %select.15)
  %concatenate.9 = s32[2,2,500,500,500,500]{5,4,3,2,1,0} concatenate(s32[1,2,500,500,500,500]{5,4,3,2,1,0} %bitcast.320, s32[1,2,500,500,500,500]{5,4,3,2,1,0} %bitcast.326), dimensions={0}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/concatenate[dimension=5]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.319 = s32[125000000000,2]{0,1} bitcast(s32[2,2,500,500,500,500]{5,4,3,2,1,0} %concatenate.9)
  %gather.9 = f32[125000000000,1,1]{0,2,1} gather(f32[500,500]{1,0} %param_0.3, s32[125000000000,2]{0,1} %bitcast.319), offset_dims={1,2}, collapsed_slice_dims={}, start_index_map={0,1}, index_vector_dim=1, slice_sizes={1,1}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/gather[dimension_numbers=GatherDimensionNumbers(offset_dims=(5, 6), collapsed_slice_dims=(), start_index_map=(0, 1)) slice_sizes=(1, 1) unique_indices=False indices_are_sorted=False mode=GatherScatterMode.PROMISE_IN_BOUNDS fill_value=None]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.318 = f32[2,500,500,500,500]{4,3,2,1,0} bitcast(f32[125000000000,1,1]{0,2,1} %gather.9)
  %multiply.31 = f32[2,500,500,500,500]{4,3,2,1,0} multiply(f32[2,500,500,500,500]{4,3,2,1,0} %multiply.32, f32[2,500,500,500,500]{4,3,2,1,0} %bitcast.318), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/mul" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.39 = f32[2,500,500,500,500]{4,3,2,1,0} add(f32[2,500,500,500,500]{4,3,2,1,0} %add.42, f32[2,500,500,500,500]{4,3,2,1,0} %multiply.31), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %multiply.30 = f32[2,500,500,500,500]{4,3,2,1,0} multiply(f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.209, f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.214), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/mul" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %concatenate.8 = s32[2,2,500,500,500,500]{5,4,3,2,1,0} concatenate(s32[1,2,500,500,500,500]{5,4,3,2,1,0} %bitcast.320, s32[1,2,500,500,500,500]{5,4,3,2,1,0} %bitcast.323), dimensions={0}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/concatenate[dimension=5]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.317 = s32[125000000000,2]{0,1} bitcast(s32[2,2,500,500,500,500]{5,4,3,2,1,0} %concatenate.8)
  %gather.8 = f32[125000000000,1,1]{0,2,1} gather(f32[500,500]{1,0} %param_0.3, s32[125000000000,2]{0,1} %bitcast.317), offset_dims={1,2}, collapsed_slice_dims={}, start_index_map={0,1}, index_vector_dim=1, slice_sizes={1,1}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/gather[dimension_numbers=GatherDimensionNumbers(offset_dims=(5, 6), collapsed_slice_dims=(), start_index_map=(0, 1)) slice_sizes=(1, 1) unique_indices=False indices_are_sorted=False mode=GatherScatterMode.PROMISE_IN_BOUNDS fill_value=None]" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %bitcast.316 = f32[2,500,500,500,500]{4,3,2,1,0} bitcast(f32[125000000000,1,1]{0,2,1} %gather.8)
  %multiply.28 = f32[2,500,500,500,500]{4,3,2,1,0} multiply(f32[2,500,500,500,500]{4,3,2,1,0} %multiply.30, f32[2,500,500,500,500]{4,3,2,1,0} %bitcast.316), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/mul" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %add.37 = f32[2,500,500,500,500]{4,3,2,1,0} add(f32[2,500,500,500,500]{4,3,2,1,0} %add.39, f32[2,500,500,500,500]{4,3,2,1,0} %multiply.28), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(vmap(vmap(jit(_map_coordinates))))))/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/function_evaluator.py" source_line=282}
  %multiply.27 = f32[2,500,500,500,500]{4,3,2,1,0} multiply(f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.237, f32[2,500,500,500,500]{4,3,2,1,0} %add.37), metadata={op_name="jit(u_and_f)/jit(main)/mul" source_file="/home/mj/Git_Projects/lcm/src/lcm/model_functions.py" source_line=131}
  %add.36 = f32[2,500,500,500,500]{4,3,2,1,0} add(f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.238, f32[2,500,500,500,500]{4,3,2,1,0} %multiply.27), metadata={op_name="jit(u_and_f)/jit(main)/add" source_file="/home/mj/Git_Projects/lcm/src/lcm/model_functions.py" source_line=131}
  %constant_11 = f32[] constant(-inf)
  %broadcast.205 = f32[2,500,500,500,500]{4,3,2,1,0} broadcast(f32[] %constant_11), dimensions={}, metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(jit(_where))))/broadcast_in_dim[shape=(2, 500, 500, 500, 500) broadcast_dimensions=(1, 2, 3, 4)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  %select.14 = f32[2,500,500,500,500]{4,3,2,1,0} select(pred[2,500,500,500,500]{4,3,2,1,0} %broadcast.246, f32[2,500,500,500,500]{4,3,2,1,0} %add.36, f32[2,500,500,500,500]{4,3,2,1,0} %broadcast.205), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(jit(_where))))/select_n" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  %bitcast.315 = f32[2,500,500,250000]{3,2,1,0} bitcast(f32[2,500,500,500,500]{4,3,2,1,0} %select.14), metadata={op_name="jit(u_and_f)/jit(main)/vmap(vmap(vmap(jit(_where))))/select_n" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
  ROOT %reduce.1 = f32[2,500,500]{2,1,0} reduce(f32[2,500,500,250000]{3,2,1,0} %bitcast.315, f32[] %constant_11), dimensions={3}, to_apply=%region_0.267, metadata={op_name="jit(u_and_f)/jit(main)/reduce_max[axes=(3, 4)]" source_file="/home/mj/Git_Projects/lcm/src/lcm/entry_point.py" source_line=232}
}

I know it's probably not possible to help me with my specific problem, as I can't provide a short working example. But maybe someone could tell me if the output from compute_ccv.lower().compile().as_text() is actually the fully optimzed code or why the huge array is only allocated on the GPU in one case even though both compiled functions seem to work with it?

jakevdp · 2024-07-25T18:55:05Z

jakevdp
Jul 25, 2024
Maintainer

It's not clear to me what you mean when you say "jit outside the for loop". Do you mean something like this

compute_ccv = jax.jit(vmap(vmap(vmap(vmap(u(V,W,X,Y,Z, vf_array)...)
for period in range(reversed(periods)):
    ccv = compute_ccv(V,W,X,Y,Z, vf_array)
    vf_array = ccv.max(axis=[0,1,2})

or do you mean something like this?

@jax.jit
def myfunc():
  for period in range(reversed(periods)):
    compute_ccv = vmap(vmap(vmap(vmap(vmap(u(V,W,X,Y,Z, vf_array)...) #Nested Vmap with the vectors that span the grid
    ccv = compute_ccv(V,W,X,Y,Z, vf_array)
    vf_array = ccv.max(axis=[0,1,2})

And for "jit inside the for loop", do you mean something like this?

for period in range(reversed(periods)):
    compute_ccv = vmap(vmap(vmap(vmap(vmap(u(V,W,X,Y,Z, vf_array)...) #Nested Vmap with the vectors that span the grid
    ccv = jax.jit(compute_ccv)(V,W,X,Y,Z, vf_array)
    vf_array = ccv.max(axis=[0,1,2})

4 replies

mj023 Jul 25, 2024
Author

Sorry for being unclear, outside the for loop is the second code block you provided and inside the for loop would be the third code block. I also tried using both, but then the behaviour is the same as using only the jit outside of the for loop.

jakevdp Jul 25, 2024
Maintainer

OK - that makes sense.

My guess here is that when you jit-compile compute_ccv together with ccv.max(), it allows the compiler to avoid allocating the very large arrays, because the final result is just an aggregation of those large arrays. On the other hand, when you jit-compile compute_ccv alone, or when you don't jit-compile it at all, the runtime will allocate these very large arrays, because the function outputs them explicitly.

Does that make sense?

mj023 Jul 25, 2024
Author

The problem is that its the other way round, when I jit compute_ccv alone, the compiler seems to find some optimizations that reduce the memory used. But when I jit the whole loop the compiler seems to not find these possible optimzations. That's why I was trying to look at the Jaxprs and later at the HLO Code of the compiled functions. There to me both versions seem to do operations on the large arrays, but I have no experience working with compilers, so maybe an array appearing in the HLO Code does not mean that it has to be allocated?

jakevdp Jul 25, 2024
Maintainer

In that case I'm not sure of what's going on. It's hard to guess given incomplete pseudocode, and the HLO dumps are difficult to interpret because it's not clear what part of the program they represent (i.e. how do you dump the HLO for compute_ccv if the jit is outside the for loop?)

Is there any way you could put together a more complete minimal example? Without that, I don't think you're going to get any answers to your questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Optimization with Jit #22666

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Memory Optimization with Jit #22666

mj023 Jul 25, 2024

Replies: 1 comment · 4 replies

jakevdp Jul 25, 2024 Maintainer

mj023 Jul 25, 2024 Author

jakevdp Jul 25, 2024 Maintainer

mj023 Jul 25, 2024 Author

jakevdp Jul 25, 2024 Maintainer

mj023
Jul 25, 2024

Replies: 1 comment 4 replies

jakevdp
Jul 25, 2024
Maintainer

mj023 Jul 25, 2024
Author

jakevdp Jul 25, 2024
Maintainer

mj023 Jul 25, 2024
Author

jakevdp Jul 25, 2024
Maintainer