is_threading is racy #232

dnadlinger · 2022-03-21T16:55:56Z

In is_threading, p.threads_used, a Vector of thread ids, is accessed and updated:

Lines 463 to 470 in 5826a75

    
           function is_threading(p::AbstractProgress) 
        
               Threads.nthreads() == 1 && return false 
        
               length(p.threads_used) > 1 && return true 
        
               if !in(Threads.threadid(), p.threads_used) 
        
                   push!(p.threads_used, Threads.threadid()) 
        
               end 
        
               return length(p.threads_used) > 1 
        
           end

I don't think Vector is thread-safe, though – wouldn't accesses threads_used also need some form of locking?

The text was updated successfully, but these errors were encountered:

IanButterworth · 2022-03-24T17:32:32Z

ah, true. It would be good to avoid a lock though, as the reason for this check is to avoid the need for a lock, given that locking got slower in recent julia versions

dnadlinger · 2022-03-26T21:49:03Z

Which use case are you most concerned about, performance-wise (that isn't covered by the nthreads() fast path)? I don't see an obvious way to avoid locking altogether, other than avoiding the lock by using a lock-free counter to track work altogether.

Right now, I am personally only really interested in @threads. For this, something like

function is_threading()
  Threads.nthreads() == 1 && return false 

  if Threads.threadid() > 1
    # Called from another thread; so certainly in a @threads region.
    return true
  end

  # On the main thread, so just check if we are in a @threads region.
  return ccall(:jl_in_threaded_region, Cint, ()) != 0
end

might do the trick. This wouldn't work for @spawn/…, though.

In my threaded programs, the progress meter instance is either going to be across threads anyway, or the situation will be such that the overhead from an uncontested lock is insignificant compared to the actual computation, though, so I'm not sure this problem needs to be solved. If it does, perhaps there could be an explicit thread safety flag/choice of variants (say, Progress(…, thread_safe=false)) for when the overhead matters to users?

IanButterworth · 2022-03-26T21:57:00Z

It's quite expensive to lock each iteration on tight loops. If a user has nthreads() > 1 and uses ProgressMeter on a non-threaded loop, I just felt the lock was worth avoiding.

It may be over-optimization though. Redesign proposals would definitely be welcomed by me!

dnadlinger · 2022-03-26T22:11:11Z

I've honestly not used ProgressMeter.jl enough yet to be able to judge what the correct design tradeoffs are. One option to lower the overhead would be to use lock-free atomics instead, but then printing the updates could presumably only be done from the main thread to make sure the updates are ordered correctly. Using Threads.SpinLock might also be cheaper.

danielwe · 2023-05-12T18:51:18Z

Workaround for anyone else who might be concerned about this:

p = Progress(n)
append!(p.threads_used, 1:Threads.nthreads())
# code using p

~~As for a proper solution, I think you can get away with only locking in the exceptional case, as follows:~~

function is_threading(p::AbstractProgress)
    Threads.nthreads() == 1 && return false
    length(p.threads_used) > 1 && return true
    return lock(lk) do
        if !in(Threads.threadid(), p.threads_used)
            push!(p.threads_used, Threads.threadid())
        end
        return length(p.threads_used) > 1
    end 
 end

~~That should only occur at most twice during execution.~~

EDIT: scratch, that, it's not exceptional if Threads.nthreads() > 1 but you're only accessing p from a single thread.

IanButterworth · 2023-08-20T02:27:38Z

Yeah on a nthreads() > 1 session a simple hot loop becomes 25x slower with that lock, whereas it's currently at most 9x slower.

I don't yet have a decent design to fix this. Further suggestions appreciated!

IanButterworth · 2023-08-20T13:17:10Z

Suggestion by @vchuravy

Update the counter atomically and then try to acquire the lock and then you carry on if you didn't get it
Sink the log into update progress so that you only acquire it when really needed
And use a trylock. Worst case you miss an update

Also from @vtjnash

The fastest is usually a two-split design that stores the initial threadid or Task in one counter, and everything else that isn’t those uses a lock and a different counter

MarcMush · 2024-01-31T01:01:25Z

It happened in CI

Stacktrace

ProgressThreads tests: Error During Test at /home/runner/work/ProgressMeter.jl/ProgressMeter.jl/test/test_threads.jl:2
  Got exception outside of a @test
  TaskFailedException
  
      nested task error: BoundsError: attempt to access MemoryRef{Int64} at index [1]
      Stacktrace:
        [1] getindex
          @ ./essentials.jl:882 [inlined]
        [2] iterate (repeats 2 times)
          @ ./array.jl:887 [inlined]
        [3] in
          @ ./operators.jl:1304 [inlined]
        [4] is_threading(p::ProgressThresh{Float64})
          @ ProgressMeter ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:477
        [5] lock_if_threading
          @ ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:484 [inlined]
        [6] #update!#18
          @ ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:532 [inlined]
        [7] update!(p::ProgressThresh{Float64}, val::Float64)
          @ ProgressMeter ~/work/ProgressMeter.jl/ProgressMeter.jl/src/ProgressMeter.jl:531
        [8] macro expansion
          @ ~/work/ProgressMeter.jl/ProgressMeter.jl/test/test_threads.jl:58 [inlined]
        [9] (::var"#618#threadsfor_fun#155"{var"#618#threadsfor_fun#147#156"{Float64, ReentrantLock, UnitRange{Int64}}})(tid::Int64; onethread::Bool)
          @ Main ./threadingconstructs.jl:214
       [10] #618#threadsfor_fun
          @ ./threadingconstructs.jl:181 [inlined]
       [11] (::Base.Threads.var"#1#2"{var"#618#threadsfor_fun#155"{var"#618#threadsfor_fun#147#156"{Float64, ReentrantLock, UnitRange{Int64}}}, Int64})()
          @ Base.Threads ./threadingconstructs.jl:153

https://github.com/MarcMush/ProgressMeter.jl/actions/runs/7719366745/job/21042475069

vtjnash · 2024-01-31T02:21:57Z

The push! there looks very dangerous when used by multiple threads

lmiq · 2024-07-03T20:35:34Z

Just to mention that this issue is now breaking packages (one mine at least) in Julia 1.11.

Thus, a safe fix for this is necessary asap.

lmiq · 2024-07-03T20:37:43Z

One alternative could be add a "no_lock" keyword argument, as an opt in, with the appropriate explanation.

IanButterworth · 2024-07-03T20:43:08Z

Sounds like a good quickfix

lmiq · 2024-07-04T12:38:05Z

I think this update is a reasonable one, which solves the issue with very minor modifications: #321

Relative to what is proposed in #232 (comment), what I do there is just define a new lock and just use the lock if the threads_used needs updating, which will not occur most than twice. I don't see any performance difference in the tight loop of the prog_perf example of the tests.

MarcMush · 2024-07-12T09:00:16Z

fixed by #322

dnadlinger changed the title ~~is_threading is racy (?)~~ is_threading is racy Mar 26, 2022

MarcMush mentioned this issue Jan 31, 2024

Errors in CI with threads #281

Closed

MarcMush mentioned this issue Feb 8, 2024

fix test_threads (hopefully) #298

Merged

MarcMush mentioned this issue Jul 3, 2024

fix possible concurrency in is_threading #318

Closed

This was referenced Jul 4, 2024

use Bool vector for threads_used #319

Closed

Locked threadid update #320

Closed

simple lock for threads_used update #321

Closed

lmiq mentioned this issue Jul 4, 2024

lock by default - add safe_lock option for control #322

Merged

MarcMush closed this as completed Jul 12, 2024

MarcMush mentioned this issue Nov 18, 2024

feat: add thread-safe progress meter #334

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is_threading is racy #232

is_threading is racy #232

dnadlinger commented Mar 21, 2022

IanButterworth commented Mar 24, 2022

dnadlinger commented Mar 26, 2022 •

edited

Loading

IanButterworth commented Mar 26, 2022

dnadlinger commented Mar 26, 2022

danielwe commented May 12, 2023 •

edited

Loading

IanButterworth commented Aug 20, 2023

IanButterworth commented Aug 20, 2023

MarcMush commented Jan 31, 2024

vtjnash commented Jan 31, 2024

lmiq commented Jul 3, 2024

lmiq commented Jul 3, 2024

IanButterworth commented Jul 3, 2024

lmiq commented Jul 4, 2024

MarcMush commented Jul 12, 2024

is_threading is racy #232

is_threading is racy #232

Comments

dnadlinger commented Mar 21, 2022

IanButterworth commented Mar 24, 2022

dnadlinger commented Mar 26, 2022 • edited Loading

IanButterworth commented Mar 26, 2022

dnadlinger commented Mar 26, 2022

danielwe commented May 12, 2023 • edited Loading

IanButterworth commented Aug 20, 2023

IanButterworth commented Aug 20, 2023

MarcMush commented Jan 31, 2024

vtjnash commented Jan 31, 2024

lmiq commented Jul 3, 2024

lmiq commented Jul 3, 2024

IanButterworth commented Jul 3, 2024

lmiq commented Jul 4, 2024

MarcMush commented Jul 12, 2024

dnadlinger commented Mar 26, 2022 •

edited

Loading

danielwe commented May 12, 2023 •

edited

Loading