Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant increase in maximum heap size in GCBenchmark #51601

Closed
d-netto opened this issue Oct 5, 2023 · 9 comments
Closed

Significant increase in maximum heap size in GCBenchmark #51601

d-netto opened this issue Oct 5, 2023 · 9 comments
Labels
GC Garbage collector regression Regression in behavior compared to a previous version

Comments

@d-netto
Copy link
Member

d-netto commented Oct 5, 2023

  • 1.9:
bench = "pidigits.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      49572 │     724 │        86 │        595 │           35 │                 0 │      113 │          1 │
│  median │      49572 │     724 │        86 │        595 │           35 │                 0 │      113 │          1 │
│ maximum │      49572 │     724 │        86 │        595 │           35 │                 0 │      113 │          1 │
│   stdev │        NaN │     NaN │       NaN │        NaN │          NaN │               NaN │      NaN │        NaN │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
  • master:
bench = "pidigits.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │     119559 │      44 │        37 │          7 │           40 │                 0 │   401283 │          0 │
│  median │     119559 │      44 │        37 │          7 │           40 │                 0 │   401283 │          0 │
│ maximum │     119559 │      44 │        37 │          7 │           40 │                 0 │   401283 │          0 │
│   stdev │        NaN │     NaN │       NaN │        NaN │          NaN │               NaN │      NaN │        NaN │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

CC: @gbaraldi

@d-netto d-netto added this to the 1.10 milestone Oct 5, 2023
@d-netto d-netto added regression Regression in behavior compared to a previous version GC Garbage collector labels Oct 5, 2023
@d-netto
Copy link
Member Author

d-netto commented Oct 5, 2023

I didn't check RSS, so could be an accounting bug somewhere.

@gbaraldi
Copy link
Member

gbaraldi commented Oct 5, 2023

I think it might the accounting change, where we now need to keep a global counter in sync. Though I need to check

@d-netto
Copy link
Member Author

d-netto commented Oct 5, 2023

Measured again on Linux machine.

On 1.9 peak RSS is around 400MB-500MB.

On master it goes beyond 70GB and eventually fails at Serialization:

EOFError: read end of file
Stacktrace:
 [1] read(this::Base.PipeEndpoint, ::Type{UInt8})
   @ Base ./stream.jl:980
 [2] deserialize
   @ ~/julia-stock/usr/share/julia/stdlib/v1.11/Serialization/src/Serialization.jl:814 [inlined]
 [3] deserialize(s::Base.PipeEndpoint)
   @ Serialization ~/julia-stock/usr/share/julia/stdlib/v1.11/Serialization/src/Serialization.jl:801
 [4] run_bench(runs::Int64, threads::Int64, gcthreads::Int64, file::String, show_json::Bool)
   @ Main ~/GCBenchmarks/run_benchmarks.jl:68
 [5] run_category_files(benches::Vector{String}, args::Dict{String, Any}, show_json::Bool)
   @ Main ~/GCBenchmarks/run_benchmarks.jl:143
 [6] main(args::Dict{String, Any})
   @ Main ~/GCBenchmarks/run_benchmarks.jl:196
 [7] top-level scope
   @ ~/GCBenchmarks/run_benchmarks.jl:200

@vchuravy
Copy link
Member

vchuravy commented Oct 6, 2023

A few observations.

  1. Our internal heap_size tracking matches 1:1 with the OS size tracking. It is thus not an issue of metrics.
  2. Running: ./usr/bin/julia --heap-size-hint=5G -e "GC.enable_logging(true)" -L pidigits.jl allows it to finish without using excessive memory.
  3. Without --heap-size-hint we never run a full collection
./usr/bin/julia  -e "GC.enable_logging(true)" -L pidigits.jl 

GC: pause 27.69ms. collected 4.255280MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 40.83 MB, heap_target 43.75 MB, Fragmentation 0.019

GC: pause 2.63ms. collected 0.811043MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 42.94 MB, heap_target 43.75 MB, Fragmentation 0.909

GC: pause 0.30ms. collected 0.250763MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 43.54 MB, heap_target 87.29 MB, Fragmentation 1.799

GC: pause 0.51ms. collected 0.261078MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 87.08 MB, heap_target 174.58 MB, Fragmentation 1.363

GC: pause 0.66ms. collected 0.407867MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 174.22 MB, heap_target 305.47 MB, Fragmentation 1.162

GC: pause 0.84ms. collected 0.610123MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 304.92 MB, heap_target 523.67 MB, Fragmentation 1.224

GC: pause 1.23ms. collected 0.783508MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 522.95 MB, heap_target 785.45 MB, Fragmentation 1.290

GC: pause 1.17ms. collected 0.982773MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 784.54 MB, heap_target 1134.54 MB, Fragmentation 1.522

GC: pause 1.31ms. collected 1.117645MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 1133.51 MB, heap_target 1571.01 MB, Fragmentation 1.742

GC: pause 1.44ms. collected 1.260193MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 1569.84 MB, heap_target 2051.09 MB, Fragmentation 2.006

GC: pause 1.53ms. collected 1.396393MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 2049.79 MB, heap_target 2618.54 MB, Fragmentation 2.355

GC: pause 1.69ms. collected 1.493652MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 2617.20 MB, heap_target 4935.95 MB, Fragmentation 2.682

GC: pause 2.86ms. collected 1.593231MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 4934.47 MB, heap_target 8609.47 MB, Fragmentation 1.990

GC: pause 3.37ms. collected 2.464508MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 8607.16 MB, heap_target 14382.16 MB, Fragmentation 1.755

GC: pause 4.86ms. collected 3.399841MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 14378.99 MB, heap_target 21422.74 MB, Fragmentation 1.691

GC: pause 5.00ms. collected 4.419952MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 21418.56 MB, heap_target 29424.81 MB, Fragmentation 1.855

GC: pause 5.68ms. collected 5.157379MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 29419.79 MB, heap_target 37951.04 MB, Fragmentation 2.130

GC: pause 5.75ms. collected 5.669830MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 37945.53 MB, heap_target 47001.78 MB, Fragmentation 2.482

GC: pause 7.55ms. collected 5.999664MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 46996.11 MB, heap_target 57014.86 MB, Fragmentation 2.870

GC: pause 6.24ms. collected 6.241821MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 57008.88 MB, heap_target 67640.13 MB, Fragmentation 3.249

GC: pause 6.45ms. collected 6.479736MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 67633.97 MB, heap_target 78177.72 MB, Fragmentation 3.642

GC: pause 6.50ms. collected 6.677643MB. incr 
Heap stats: bytes_mapped 64.02 MB, bytes_resident 64.02 MB,
heap_size 78171.33 MB, heap_target 89152.58 MB, Fragmentation 4.078

Note the collected stat is wrong, since the GC aware free is happening from a finalizer.
Since GMP uses a finalizer and if enough objects per GC cycle get promoted to old.
without a full GC it does look like a memory leak.

@gbaraldi
Copy link
Member

gbaraldi commented Oct 6, 2023

I found the issue, and it's quite fun. What happens here is that the GC thinks nothing is getting freed during the GC. Because the heap barely changes size, except that the heap goes down a lot when running finalizers, so for example, if the heap size before the GC was 4.5 and after sweep it's 4, the heuristics go and assume that to devise a new target. Except that running finalizers brings the heap down to 0.2, but the heuristics have no idea.

@KristofferC
Copy link
Sponsor Member

Two questions:

  • Is this also relevant for 1.10? All the measurements seem to be against master.
  • @gbaraldi, since you understand the cause, do you have a fix for this?

@gbaraldi
Copy link
Member

This does affect 1.10, and I have a tentative solution.

d-netto added a commit that referenced this issue Oct 20, 2023
The 1.10 GC heuristics introduced in
#50144 have been a source of
concerning issues such as
#50705 and
#51601. The PR also doesn't
correctly implement the paper on which it's based, as discussed in
#51498.

Test whether the 1.8 GC heuristics are a viable option.
@d-netto
Copy link
Member Author

d-netto commented Oct 20, 2023

Removing this issue from the 1.10 milestone since it's fixed by #51661.

The issue persists on master.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 7, 2023

Fixed by #51611 (comment)

@vtjnash vtjnash closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

5 participants