Atomic read-modify-write gets much slower with the new API #41843

bicycle1885 · 2021-08-09T15:14:33Z

I guess core devs already know this issue, but I couldn't find a dedicated issue. So, let me file an issue.

I discovered that the new per-field atomics API couldn't match the previous API in terms of performance.

mutable struct Atomic{T}
    @atomic data::T
end
increment!(x::Atomic) = @atomic x.data += 1
increment!(x::Threads.Atomic) = Threads.atomic_add!(x, 1)

As can be seen, the new API is almost 30 times slower:

julia> x = Atomic{Int}(0);

julia> @btime for _ in 1:1_000_000
           increment!($x)
       end
  109.877 ms (3000000 allocations: 61.04 MiB)

julia> x = Threads.Atomic{Int}(0);

julia> @btime for _ in 1:1_000_000
           increment!($x)
       end
  3.895 ms (0 allocations: 0 bytes)

Of course, this is because the @atomic x.data += 1 call is failed to be optimized down to a sequence of lock and xadd instructions of AMD64. If we deprecate the old API, I think the new API should provide an alternative way that is comparable in terms of performance.

The text was updated successfully, but these errors were encountered:

tkf · 2021-09-14T00:53:04Z

FWIW, here's yet another benchmark.

The performance difference in the OP seems to be already resolved (I'm checking this after #41859 and #42017). But the difference is observable in a parallel setting:

mutable struct Atomic{T}
    @atomic data::T
end
increment!(x::Atomic) = @atomic x.data += 1
increment!(x::Threads.Atomic) = Threads.atomic_add!(x, 1)

function serial_increments!(ref, n = 2^20)
    for _ in 1:n
        increment!(ref)
    end
end

function parallel_increments!(ref, n = 2^20)
    Threads.@threads for _ in 1:Threads.nthreads()
        serial_increments!(ref, n)
    end
end

using BenchmarkTools

suite = BenchmarkGroup()
suite["serial Threads"] = @benchmarkable serial_increments!(Threads.Atomic{Int}(0))
suite["serial @atomic"] = @benchmarkable serial_increments!(Atomic{Int}(0))
suite["parallel Threads"] = @benchmarkable parallel_increments!(Threads.Atomic{Int}(0))
suite["parallel @atomic"] = @benchmarkable parallel_increments!(Atomic{Int}(0))

result = run(suite; verbose = true)

I get

julia> sort!(collect(Dict(result)); by = first)
4-element Vector{Pair{String, BenchmarkTools.Trial}}:
 "parallel @atomic" => 457.105 ms
 "parallel Threads" => 227.970 ms
   "serial @atomic" => 5.950 ms
   "serial Threads" => 5.636 ms

julia> Threads.nthreads()
16

(Aside: I find it puzzling that the difference can be observed in the parallel benchmark but not in the serial one. I thought it'd be the other way around because the cache traffic would hide the function call overhead.)

This may be due to that modifyfield! is not completely inlined:

julia> @code_llvm increment!(Atomic(0))
;  @ ....
define i64 @"julia_increment!_1222"({}* nonnull align 8 dereferenceable(8) %0) #0 {
top:
; ┌ @ Base.jl:56 within `modifyproperty!`
   %1 = bitcast {}* %0 to i64*
   %2 = load atomic i64, i64* %1 monotonic, align 8
   br label %xchg

done_xchg:                                        ; preds = %xchg
; └
  ret i64 %4

xchg:                                             ; preds = %xchg, %top
; ┌ @ Base.jl:56 within `modifyproperty!`
   %3 = phi i64 [ %2, %top ], [ %6, %xchg ]
   %4 = call i64 @"j_+_1224"(i64 signext %3, i64 signext 1) #0
   %5 = cmpxchg i64* %1, i64 %3, i64 %4 seq_cst monotonic
   %6 = extractvalue { i64, i1 } %5, 0
   %7 = extractvalue { i64, i1 } %5, 1
   br i1 %7, label %done_xchg, label %xchg
; └
}

vtjnash added performance Must go faster multithreading Base.Threads and related functionality labels Aug 9, 2021

vtjnash mentioned this issue Aug 10, 2021

atomics: optimize modify operation (partially) #41859

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic read-modify-write gets much slower with the new API #41843

Atomic read-modify-write gets much slower with the new API #41843

bicycle1885 commented Aug 9, 2021

tkf commented Sep 14, 2021 •

edited

Loading

Atomic read-modify-write gets much slower with the new API #41843

Atomic read-modify-write gets much slower with the new API #41843

Comments

bicycle1885 commented Aug 9, 2021

tkf commented Sep 14, 2021 • edited Loading

tkf commented Sep 14, 2021 •

edited

Loading