Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in CI with threads #281

Closed
MarcMush opened this issue Oct 24, 2023 · 6 comments · Fixed by #292
Closed

Errors in CI with threads #281

MarcMush opened this issue Oct 24, 2023 · 6 comments · Fixed by #292

Comments

@MarcMush
Copy link
Collaborator

During the CI of #157, this error happened once only (didn't happen when re-running the job)
https://github.com/timholy/ProgressMeter.jl/actions/runs/6605620538/job/17940975807?pr=157#step:7:15122

this code errored:

println("Testing ProgressThresh() with Threads.@threads across $threads threads")
thresh = 1.0
prog = ProgressMeter.ProgressThresh(thresh; desc="Minimizing:")
vals = fill(300.0, 1)
threadsUsed = Int[]
Threads.@threads for _ in 1:100000
!in(Threads.threadid(), threadsUsed) && push!(threadsUsed, Threads.threadid())
push!(vals, -rand())
valssum = sum(vals)
if valssum > thresh
ProgressMeter.update!(prog, valssum)
else
ProgressMeter.finish!(prog)
break
end
sleep(0.1*rand())
end
@test sum(vals) <= thresh
@test length(threadsUsed) == threads #Ensure that all threads are used

basically sum([300.0; -rand()...]) == NaN ??

the seed is set in test.jl so it should be repetable

is it because of threads? It doesn't seem to be a problem with ProgressMeter.jl really

Or is it not a big deal since it happened only once in a CI?

@MarcMush
Copy link
Collaborator Author

I can reproduce it easily on my computer:

PS C:\Users\Marc> julia -t 12 --startup-file no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-rc1 (2023-11-03)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> vals = [1.0]
       @Threads.threads for i in 1:100000
           push!(vals, rand())
           if sum(vals) |> isnan
               error("isnan at i=$i in thread ", Threads.threadid())
           end
       end

ERROR: TaskFailedException

    nested task error: isnan at i=25079 in thread 5
    Stacktrace:
     [1] error(::String, ::Int64)
       @ Base .\error.jl:44
     [2] macro expansion
       @ .\REPL[2]:5 [inlined]
     [3] (::var"#37#threadsfor_fun#8"{var"#37#threadsfor_fun#7#9"{UnitRange{Int64}}})(tid::Int64; onethread::Bool)
       @ Main .\threadingconstructs.jl:214
     [4] #37#threadsfor_fun
       @ Main .\threadingconstructs.jl:181 [inlined]
     [5] (::Base.Threads.var"#1#2"{var"#37#threadsfor_fun#8"{var"#37#threadsfor_fun#7#9"{UnitRange{Int64}}}, Int64})()
       @ Base.Threads .\threadingconstructs.jl:153

...and 1 more exception.

Stacktrace:
 [1] threading_run(fun::var"#37#threadsfor_fun#8"{var"#37#threadsfor_fun#7#9"{UnitRange{Int64}}}, static::Bool)
   @ Base.Threads .\threadingconstructs.jl:171
 [2] macro expansion
   @ .\threadingconstructs.jl:219 [inlined]
 [3] top-level scope
   @ .\REPL[2]:2

julia> sum(vals)
28644.464412973262

with bigger iterations I can also get some nice errors:

julia> vals = [1.0]
       @Threads.threads for i in 1:1000000
           push!(vals, rand())
           if sum(vals) |> isnan
               # error("isnan at i=$i in thread ", Threads.threadid())
           end
       end

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x2aa1e9e5b33 -- + at .\float.jl:409 [inlined]
add_sum at .\reduce.jl:27 [inlined]
macro expansion at .\reduce.jl:265 [inlined]
macro expansion at .\simdloop.jl:77 [inlined]
mapreduce_impl at .\reduce.jl:263
in expression starting at REPL[6]:2
+ at .\float.jl:409 [inlined]
add_sum at .\reduce.jl:27 [inlined]
macro expansion at .\reduce.jl:265 [inlined]
macro expansion at .\simdloop.jl:77 [inlined]
mapreduce_impl at .\reduce.jl:263
mapreduce_impl at .\reduce.jl:272
mapreduce_impl at .\reduce.jl:271
mapreduce_impl at .\reduce.jl:271
mapreduce_impl at .\reduce.jl:272
mapreduce_impl at .\reduce.jl:272
mapreduce_impl at .\reduce.jl:277 [inlined]
_mapreduce at .\reduce.jl:447 [inlined]
_mapreduce_dim at .\reducedim.jl:365 [inlined]
#mapreduce#821 at .\reducedim.jl:357 [inlined]
mapreduce at .\reducedim.jl:357 [inlined]
#_sum#831 at .\reducedim.jl:1001 [inlined]
_sum at .\reducedim.jl:1001 [inlined]
#_sum#830 at .\reducedim.jl:1000 [inlined]
_sum at .\reducedim.jl:1000 [inlined]
#sum#828 at .\reducedim.jl:996 [inlined]
sum at .\reducedim.jl:996
unknown function (ip: 000002aa1e9e5df0)
macro expansion at .\REPL[6]:4 [inlined]
#82#threadsfor_fun#16 at .\threadingconstructs.jl:214
#82#threadsfor_fun at .\threadingconstructs.jl:181 [inlined]
#1 at .\threadingconstructs.jl:153
unknown function (ip: 000002aa1e9fbe2b)

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffc19d6974e --  at 0x7ffc19d6974e -- ION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATIONjulia.h:1976 [inlined]
start_task at C:/workdir/src\task.c:1238
Allocations: 841296 (Pool: 840419; Big: 877); GC: 1
memcpy at C:\WINDOWS\System32\msvcrt.dll (unknown line)
fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffc1bd4be8d --  at 0x7ffc1bd4be8d -- ION32\msvcrt.dll (unknown line)
fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION

I don't know much about thread-safety, should this code be thread-safe? Otherwise it looks like a bug in Julia

@MarcMush
Copy link
Collaborator Author

Bonus point for this error that seems to be somewhat repeatable in Julia nightly (1.11.0-DEV.879)
https://github.com/timholy/ProgressMeter.jl/actions/runs/6605620538/job/18563304749?pr=157#step:7:22798
https://github.com/timholy/ProgressMeter.jl/actions/runs/6802902122/job/18556738540?pr=282
https://github.com/timholy/ProgressMeter.jl/actions/runs/6802902122/job/18556738932?pr=282

ProgressThreads tests: Error During Test at D:\a\ProgressMeter.jl\ProgressMeter.jl\test\test_threads.jl:2
  Got exception outside of a @test
  TaskFailedException
  
      nested task error: BoundsError: attempt to access MemoryRef{Float64} at index [42]
      Stacktrace:
       [1] GenericMemoryRef
         @ .\boot.jl:516 [inlined]
       [2] unsafe_copyto!
         @ .\genericmemory.jl:75 [inlined]
       [3] (::Base.var"#121#122"{Vector{Float64}, Int64, Int64, Int64, Int64, Int64, Memory{Float64}, MemoryRef{Float64}})()
         @ Base .\array.jl:1110
       [4] _growend!
         @ Base .\array.jl:1093 [inlined]
       [5] push!(a::Vector{Float64}, item::Float64)
         @ Base .\array.jl:1237
       [6] macro expansion
         @ D:\a\ProgressMeter.jl\ProgressMeter.jl\test\test_threads.jl:27 [inlined]
       [7] (::var"#541#threadsfor_fun#132"{var"#541#threadsfor_fun#127#133"{Float64, UnitRange{Int64}}})(tid::Int64; onethread::Bool)
         @ Main .\threadingconstructs.jl:214
       [8] #541#threadsfor_fun
         @ Main .\threadingconstructs.jl:181 [inlined]
       [9] (::Base.Threads.var"#1#2"{var"#541#threadsfor_fun#132"{var"#541#threadsfor_fun#127#133"{Float64, UnitRange{Int64}}}, Int64})()
         @ Base.Threads .\threadingconstructs.jl:153

@MarcMush MarcMush changed the title Strange one-off error in CI Errors in CI with threads Nov 10, 2023
@MarcMush
Copy link
Collaborator Author

seems similar to JuliaLang/julia#52032

@MarcMush
Copy link
Collaborator Author

sometimes, it also crashes juliaup

Well, this is embarrassing.

Juliaup launcher had a problem and crashed. To help us diagnose the problem you can send us a crash report.

We have generated a report file at "C:\Users\Marc\AppData\Local\Temp\report-c4b953c2-72bb-4ee7-b060-8ced1ea54bca.toml". Submit an issue or email with the subject of "Juliaup launcher Crash Report" and include the report as an attachment.

- Homepage: https://github.com/JuliaLang/juliaup

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!

the report file in question:

name = "Juliaup launcher"
operating_system = "Windows 10.0.22631 (Windows 11 Professional) [64-bit]"
crate_version = "1.11.22"
explanation = """
Panic occurred in file 'src/bin/julialauncher.rs' at line 369
"""
cause = "called `Result::unwrap()` on an `Err` value: TryFromIntError(())"
method = "Panic"
backtrace = """

   0: 0x7ff7bf4a4b21 - <unresolved>
   1: 0x7ff7bf4923ef - <unresolved>
   2: 0x7ff7bf4924c6 - <unresolved>
   3: 0x7ff7bf4aa8b7 - <unresolved>
   4: 0x7ff7bf5e1c3c - <unresolved>
   5: 0x7ffc1a93257d - BaseThreadInitThunk
   6: 0x7ffc1bd6aa58 - RtlUserThreadStart"""

@MarcMush
Copy link
Collaborator Author

Ironically, it looks like is_threading isn't thread-safe

function is_threading(p::AbstractProgress)
Threads.nthreads() == 1 && return false
length(p.threads_used) > 1 && return true
if !in(Threads.threadid(), p.threads_used)
push!(p.threads_used, Threads.threadid())
end
return length(p.threads_used) > 1
end

error from CI:

ProgressThreads tests: Error During Test at D:\a\ProgressMeter.jl\ProgressMeter.jl\test\test_threads.jl:2
  Got exception outside of a @test
  TaskFailedException
  
      nested task error: BoundsError: attempt to access MemoryRef{Int64} at index [1]
      Stacktrace:
        [1] GenericMemoryRef
          @ .\boot.jl:524 [inlined]
        [2] unsafe_copyto!
          @ .\genericmemory.jl:75 [inlined]
        [3] (::Base.var"#124#125"{Vector{Int64}, Int64, Int64, Int64, Int64, Int64, Memory{Int64}, MemoryRef{Int64}})()
          @ Base .\array.jl:1113
        [4] _growend!
          @ .\array.jl:1096 [inlined]
        [5] push!
          @ .\array.jl:1241 [inlined]
        [6] is_threading(p::ProgressThresh{Float64})
          @ ProgressMeter D:\a\ProgressMeter.jl\ProgressMeter.jl\src\ProgressMeter.jl:478
        [7] lock_if_threading
          @ D:\a\ProgressMeter.jl\ProgressMeter.jl\src\ProgressMeter.jl:484 [inlined]
        [8] #update!#18
          @ D:\a\ProgressMeter.jl\ProgressMeter.jl\src\ProgressMeter.jl:532 [inlined]
        [9] update!(p::ProgressThresh{Float64}, val::Float64)
          @ ProgressMeter D:\a\ProgressMeter.jl\ProgressMeter.jl\src\ProgressMeter.jl:531
       [10] macro expansion
          @ D:\a\ProgressMeter.jl\ProgressMeter.jl\test\test_threads.jl:58 [inlined]
       [11] (::var"#558#threadsfor_fun#137"{var"#558#threadsfor_fun#129#138"{Float64, ReentrantLock, UnitRange{Int64}}})(tid::Int64; onethread::Bool)
          @ Main .\threadingconstructs.jl:214
       [12] #558#threadsfor_fun
          @ .\threadingconstructs.jl:181 [inlined]
       [13] (::Base.Threads.var"#1#2"{var"#558#threadsfor_fun#137"{var"#558#threadsfor_fun#129#138"{Float64, ReentrantLock, UnitRange{Int64}}}, Int64})()
          @ Base.Threads .\threadingconstructs.jl:153

@MarcMush
Copy link
Collaborator Author

dup #232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant