`map` in `+`/`-` for `Array`s by jishnub · Pull Request #59961 · JuliaLang/julia

jishnub · 2025-10-27T12:49:13Z

map is a simpler operation and uses linear indexing for Arrays. This often improves performance (occasionally enabling vectorization) and improves TTFX in common cases. It also automatically returns the correct result for 0-D arrays, unlike broadcasting that returns a scalar.

Performance:

julia> A = ones(3,3);

julia> @btime $A + $A;
  44.622 ns (2 allocations: 144 bytes) # v"1.13.0-DEV.1387"
  29.047 ns (2 allocations: 144 bytes) # this PR

julia> A = ones(3,3000);

julia> @btime $A + $A;
  10.095 μs (3 allocations: 70.40 KiB) # v"1.13.0-DEV.1387"
  4.787 μs (3 allocations: 70.40 KiB) # this PR

julia> @btime A + B + C + D + E + F setup=(A = rand(200,200); B = rand(200,200); C = rand(200,200); D = rand(200,200); E = rand(200,200); F = rand(200,200));
  93.910 μs (3 allocations: 312.59 KiB) # v"1.13.0-DEV.1387"
  64.813 μs (9 allocations: 312.77 KiB) # this PR

Similarly for -.

TTFX:

julia> A = ones(3,3);

julia> @time A + A;
  0.174090 seconds (303.47 k allocations: 14.575 MiB, 99.98% compilation time) # v"1.13.0-DEV.1387"
  0.072748 seconds (220.27 k allocations: 11.139 MiB, 99.95% compilation time) # this PR

These are measured on

julia> versioninfo()
Julia Version 1.13.0-DEV.1388
Commit c5f492781e (2025-10-27 11:44 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-20.1.8 (ORCJIT, skylake)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = /usr/local/lib:
  JULIA_EDITOR = subl

Seelengrab · 2025-10-27T14:07:38Z

Performance: [...]
Similarly for -.

Are these representative? The arrays being passed in are exactly the same array after all, so it's not unlikely that there is some special casing going on with map that doesn't happen with the more complicated broadcast machinery.

jishnub · 2025-10-27T15:04:07Z

That's a good point! I've re-run the benchmarks, and some of these do hold up in more general cases:

julia> @btime A + B setup=(A = rand(3,3); B = rand(3,3));
  39.452 ns (2 allocations: 144 bytes) # v"1.13.0-DEV.1387"
  27.789 ns (2 allocations: 144 bytes) # this PR

julia> @btime A + B setup=(A = rand(3,3000); B = rand(3,3000));
  10.130 μs (3 allocations: 70.40 KiB) # v"1.13.0-DEV.1387"
  5.026 μs (3 allocations: 70.40 KiB)  # this PR

The difference in the 300x300 case seems spurious, so I'll remove it from the OP. If anything, it seems to worsen somewhat in the benchmarks, and I'm not sure why. The performance is nearly identical for 200x200 and 500x500 matrices, so this is probably insignificant.

The main benefit comes in the wide matrix case, where the first dimension is too small for vectorization to kick in. Using linear indexing offers a significant speed-up. This was suggested in #47873 (comment).

dkarrasch · 2025-11-17T17:37:04Z

This seems to have broken muladd with trailing dimensions. JuliaLang/LinearAlgebra.jl#1485 map seems to handle those dimensions differently to broadcasting.

adienes · 2025-11-17T18:05:38Z

maybe

S = promote_shape(A, B)
_broadcast_preserving_zero_d($f, reshape(A, S), reshape(A, B))

This was broken in #59961, as `map` deals with trailing singleton axes differently from broadcasting: ```julia julia> map(+, ones(1), ones(1,1)) |> size (1,) julia> broadcast(+, ones(1), ones(1,1)) |> size (1, 1) ``` This PR limits the new method to the case where the ndims match, in which case there are no trailing axes and the two are equivalent. The alternate approach suggested in #59961 (comment) is to reshape the arrays, but this adds overhead that nullifies the performance improvement for small arrays.

jishnub added performance Must go faster arrays [a, r, r, a, y, s] latency Latency labels Oct 27, 2025

jishnub requested a review from oscardssmith November 3, 2025 20:18

oscardssmith approved these changes Nov 4, 2025

View reviewed changes

jishnub added 4 commits November 10, 2025 12:31

map in +/- for Arrays

52eed60

map in scalar multiplication/division

27af6ab

Update comment

08c640a

Use a helper function to avoid ambiguities

6cd702b

jishnub force-pushed the jishnub/add_map branch from 0ecb681 to 6cd702b Compare November 10, 2025 08:31

jishnub merged commit b05afe0 into master Nov 11, 2025
7 checks passed

jishnub deleted the jishnub/add_map branch November 11, 2025 06:12

dkarrasch mentioned this pull request Nov 17, 2025

Fix 3-arg dot for empty arrays JuliaLang/LinearAlgebra.jl#1485

Merged

jishnub mentioned this pull request Nov 18, 2025

Fix linearly indexed array math by reshaping arrays #60164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`map` in `+`/`-` for `Array`s#59961

`map` in `+`/`-` for `Array`s#59961
jishnub merged 4 commits intomasterfrom
jishnub/add_map

jishnub commented Oct 27, 2025 •

edited

Loading

Uh oh!

Seelengrab commented Oct 27, 2025

Uh oh!

jishnub commented Oct 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

dkarrasch commented Nov 17, 2025

Uh oh!

adienes commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

jishnub commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Seelengrab commented Oct 27, 2025

Uh oh!

jishnub commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dkarrasch commented Nov 17, 2025

Uh oh!

adienes commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jishnub commented Oct 27, 2025 •

edited

Loading

jishnub commented Oct 27, 2025 •

edited

Loading