Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/object allocation #3

Open
wants to merge 2 commits into
base: v1.8.2-RAI
Choose a base branch
from

Conversation

udesou
Copy link

@udesou udesou commented Jan 25, 2023

This PR changes the threshold for allocation Julia's large objects to follow immix's threshold. While this probably brings a benefit in terms of allocation (more objects can be allocated via fastpath), until get_so_size is optimised, changing the threshold can actually degrade performance, since getting the objects size of large objects is way less expensive (loading a field from the header) than getting the size of small objects (check the object type, and calculate the size based on that). It should be merged with mmtk/mmtk-julia#20.

qinsoon pushed a commit to qinsoon/julia that referenced this pull request May 2, 2024
Fixes: JuliaLang#33147
Replaces/Closes: JuliaLang#40445

The difference here, compared to past implementations, is that we use
the zero-cost `isiterable` check on every intermediate step, instead of
wrapping the call in a try/catch and then trying to re-approximate the
`isiterable` afterwards. Some samples:

```julia
julia> Dict(i for i in 1:3)                                                        
ERROR: ArgumentError: AbstractDict(kv): kv needs to be an iterator of 2-tuples or pairs                                                                               
Stacktrace:                                                                                                                                                           
 [1] _throw_dict_kv_error()                                                        
   @ Base ./dict.jl:118                                                                                                                                               
 [2] grow_to!                                                                      
   @ ./dict.jl:132 [inlined]                                                       
 [3] dict_with_eltype                                                                                                                                                 
   @ ./abstractdict.jl:592 [inlined]                                                                                                                                  
 [4] Dict(kv::Base.Generator{UnitRange{Int64}, typeof(identity)})                                                                                                     
   @ Base ./dict.jl:120                                                            
 [5] top-level scope                                                                                                                                                  
   @ REPL[1]:1                                                                     
                                                                                                                                                                      
julia> Dict(i => error("$i") for i in 1:3)                                         
ERROR: 1                                                                                                                                                              
Stacktrace:                                                                                                                                                           
 [1] error(s::String)                                                                                                                                                 
   @ Base ./error.jl:35                                                                                                                                               
 [2] (::var"mmtk#3#4")(i::Int64)                                                       
   @ Main ./none:0                                                                 
 [3] iterate                                                                                                                                                          
   @ ./generator.jl:48 [inlined]                                                   
 [4] grow_to!                                                                      
   @ ./dict.jl:124 [inlined]                                                       
 [5] dict_with_eltype                                                              
   @ ./abstractdict.jl:592 [inlined]                                               
 [6] Dict(kv::Base.Generator{UnitRange{Int64}, var"mmtk#3#4"})                                                                                                            
   @ Base ./dict.jl:120                                                                                                                                               
 [7] top-level scope                                                               
   @ REPL[2]:1                                                                                                                                                        
```

The other unrelated change here is that `dest = empty(dest, typeof(k),
typeof(v))` is made conditional, so we do not unconditionally construct
an empty Dict in order to discard it and allocate an exact duplicate of
it, but only do so if inference wasn't precise originally.

Co-authored-by: Curtis Vogt <[email protected]>
qinsoon pushed a commit to qinsoon/julia that referenced this pull request May 2, 2024
For example, we seek to eliminate the gc frame from this function, as
observed here:

```julia
julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end
; Function Signature: var"mmtk#3"(Base.BitSet)
;  @ REPL[1]:1 within `mmtk#3`
define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 {
top:
  call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23
  %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %0, align 8, !tbaa !27
  fence syncscope("singlethread") seq_cst
  %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22
  fence syncscope("singlethread") seq_cst
; ┌ @ Base.jl:49 within `getproperty`
   %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42
; └
; ┌ @ gcutils.jl:253 within `safepoint`
   %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23
   %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43
   %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27
   fence syncscope("singlethread") seq_cst, !dbg !43
   %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43
   fence syncscope("singlethread") seq_cst, !dbg !43
; └
; ┌ @ essentials.jl:892 within `getindex`
   %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53
   %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58
   ret i64 %5, !dbg !46
; └
}
```
udesou pushed a commit to udesou/julia that referenced this pull request Aug 19, 2024
The functions `toms`, `tons`, and `days` uses `sum` over a vector of
`Period`s to obtain the conversion of a `CompoundPeriod`. However, the
compiler cannot infer the return type because those functions can return
either `Int` or `Float` depending on the type of the `Period`. This PR
forces the result of those functions to be `Float64`, fixing the type
stability.

Before this PR we had:

```julia
julia> using Dates

julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1)
1 year, 1 minute, 1 second

julia> @code_warntype Dates.tons(p)
MethodInstance for Dates.tons(::Dates.CompoundPeriod)
  from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458
Arguments
  #self#::Core.Const(Dates.tons)
  c::Dates.CompoundPeriod
Body::Any
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto mmtk#3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.tons::Core.Const(Dates.tons)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
└──       return %11


julia> @code_warntype Dates.toms(p)
MethodInstance for Dates.toms(::Dates.CompoundPeriod)
  from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454
Arguments
  #self#::Core.Const(Dates.toms)
  c::Dates.CompoundPeriod
Body::Any
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto mmtk#3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.toms::Core.Const(Dates.toms)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
└──       return %11


julia> @code_warntype Dates.days(p)
MethodInstance for Dates.days(::Dates.CompoundPeriod)
  from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468
Arguments
  #self#::Core.Const(Dates.days)
  c::Dates.CompoundPeriod
Body::Any
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto mmtk#3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.days::Core.Const(Dates.days)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
└──       return %11
```

After this PR we have:

```julia
julia> using Dates

julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1)
1 year, 1 minute, 1 second

julia> @code_warntype Dates.tons(p)
MethodInstance for Dates.tons(::Dates.CompoundPeriod)
  from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458
Arguments
  #self#::Core.Const(Dates.tons)
  c::Dates.CompoundPeriod
Body::Float64
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto mmtk#3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.tons::Core.Const(Dates.tons)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
│   %12 = Dates.Float64::Core.Const(Float64)
│   %13 = Core.typeassert(%11, %12)::Float64
└──       return %13


julia> @code_warntype Dates.toms(p)
MethodInstance for Dates.toms(::Dates.CompoundPeriod)
  from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454
Arguments
  #self#::Core.Const(Dates.toms)
  c::Dates.CompoundPeriod
Body::Float64
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto mmtk#3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.toms::Core.Const(Dates.toms)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
│   %12 = Dates.Float64::Core.Const(Float64)
│   %13 = Core.typeassert(%11, %12)::Float64
└──       return %13


julia> @code_warntype Dates.days(p)
MethodInstance for Dates.days(::Dates.CompoundPeriod)
  from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468
Arguments
  #self#::Core.Const(Dates.days)
  c::Dates.CompoundPeriod
Body::Float64
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto mmtk#3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.days::Core.Const(Dates.days)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
│   %12 = Dates.Float64::Core.Const(Float64)
│   %13 = Core.typeassert(%11, %12)::Float64
└──       return %13
```
udesou pushed a commit to udesou/julia that referenced this pull request Oct 16, 2024
Rebase and extension of @alexfanqi's initial work on porting Julia to
RISC-V. Requires LLVM 19.

Tested on a VisionFive2, built with:

```make
MARCH := rv64gc_zba_zbb
MCPU := sifive-u74

USE_BINARYBUILDER:=0

DEPS_GIT = llvm
override LLVM_VER=19.1.1
override LLVM_BRANCH=julia-release/19.x
override LLVM_SHA1=julia-release/19.x
```

```julia-repl
❯ ./julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-DEV.1374 (2024-10-14)
 _/ |\__'_|_|_|\__'_|  |  riscv/25092a3982* (fork: 1 commits, 0 days)
|__/                   |

julia> versioninfo(; verbose=true)
Julia Version 1.12.0-DEV.1374
Commit 25092a3* (2024-10-14 09:57 UTC)
Platform Info:
  OS: Linux (riscv64-unknown-linux-gnu)
  uname: Linux 6.11.3-1-riscv64 #1 SMP Debian 6.11.3-1 (2024-10-10) riscv64 unknown
  CPU: unknown:
              speed         user         nice          sys         idle          irq
       #1  1500 MHz        922 s          0 s        265 s     160953 s          0 s
       mmtk#2  1500 MHz        457 s          0 s        280 s     161521 s          0 s
       mmtk#3  1500 MHz        452 s          0 s        270 s     160911 s          0 s
       mmtk#4  1500 MHz        638 s         15 s        301 s     161340 s          0 s
  Memory: 7.760246276855469 GB (7474.08203125 MB free)
  Uptime: 16260.13 sec
  Load Avg:  0.25  0.23  0.1
  WORD_SIZE: 64
  LLVM: libLLVM-19.1.1 (ORCJIT, sifive-u74)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  HOME = /home/tim
  PATH = /home/tim/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/games
  TERM = xterm-256color


julia> ccall(:jl_dump_host_cpu, Nothing, ())
CPU: sifive-u74
Features: +zbb,+d,+i,+f,+c,+a,+zba,+m,-zvbc,-zksed,-zvfhmin,-zbkc,-zkne,-zksh,-zfh,-zfhmin,-zknh,-v,-zihintpause,-zicboz,-zbs,-zvknha,-zvksed,-zfa,-ztso,-zbc,-zvknhb,-zihintntl,-zknd,-zvbb,-zbkx,-zkt,-zvkt,-zicond,-zvksh,-zvfh,-zvkg,-zvkb,-zbkb,-zvkned


julia> @code_native debuginfo=:none 1+2.
	.text
	.attribute	4, 16
	.attribute	5, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0_zba1p0_zbb1p0"
	.file	"+"
	.globl	"julia_+_3003"
	.p2align	1
	.type	"julia_+_3003",@function
"julia_+_3003":
	addi	sp, sp, -16
	sd	ra, 8(sp)
	sd	s0, 0(sp)
	addi	s0, sp, 16
	fcvt.d.l	fa5, a0
	ld	ra, 8(sp)
	ld	s0, 0(sp)
	fadd.d	fa0, fa5, fa0
	addi	sp, sp, 16
	ret
.Lfunc_end0:
	.size	"julia_+_3003", .Lfunc_end0-"julia_+_3003"

	.type	".L+Core.Float64#3005",@object
	.section	.data.rel.ro,"aw",@progbits
	.p2align	3, 0x0
".L+Core.Float64#3005":
	.quad	".L+Core.Float64#3005.jit"
	.size	".L+Core.Float64#3005", 8

.set ".L+Core.Float64#3005.jit", 272467692544
	.size	".L+Core.Float64#3005.jit", 8
	.section	".note.GNU-stack","",@progbits
```

Lots of bugs guaranteed, but with this we at least have a functional
build and REPL for further development by whoever is interested.

Also requires Linux 6.4+, since the fallback processor detection
used here relies on LLVM's `sys::getHostCPUFeatures`, which for
RISC-V is implemented using hwprobe introduced in 6.4. We could
probably add a fallback that parses `/proc/cpuinfo`, either by building
a CPU database much like how we've done for AArch64, or by parsing the
actual ISA string contained there. That would probably also be a good
place to add support for profiles, which are supposedly the way forward
to package RISC-V binaries. That can happen in follow-up PRs though.
For now, on older kernels, use the `-C` arg to Julia to specify an ISA.

Co-authored-by: Alex Fan <[email protected]>
udesou pushed a commit to udesou/julia that referenced this pull request Oct 16, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto mmtk#3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto mmtk#4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto mmtk#5
5 ──       goto mmtk#6
6 ──       goto mmtk#7
7 ──       goto mmtk#8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto mmtk#9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto mmtk#10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked
as `broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical. In
particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
for limited frames because of latency reasons, which significantly
reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
algorithm requires `:nothrow`-ness on all paths from the allocation of
the mutable struct to its last use, which is not practical for
real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
optimizations such as inserting a `finalize` call after the last use
might still be possible (JuliaLang#55990).
udesou pushed a commit that referenced this pull request Nov 25, 2024
This PR introduces a new, toplevel-only, syntax form `:worldinc` that
semantically represents the effect of raising the current task's world
age to the latest world for the remainder of the current toplevel
evaluation (that context being an entry to `eval` or a module
expression). For detailed motivation on why this is desirable, see
JuliaLang#55145, which I won't repeat here, but the gist is that we never really
defined when world-age increments and worse are inconsistent about it.
This is something we need to figure out now, because the bindings
partition work will make world age even more observable via bindings.

Having created a mechanism for world age increments, the big question is
one of policy, i.e. when should these world age increments be inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in JuliaLang#55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As an example, case, consider `a == a` at toplevel. Depending on the
semantics that could either be the same as in local scope, or each of
the four world age dependent lookups (three binding lookups, one method
lookup) could (potentially) occur in a different world age.

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any `:worldinc` statement will require us to fully
pessimize or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:worldinc`
exprs or after `:module` exprs.
2. The frontend inserts `:worldinc` after all struct definitions, method
definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if
at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would have
been better to make `include` a macro from the beginning (esp because it
already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between this PR
and option #3 above. I think option #3 is closest to what we have right
now, but if we were to choose it and actually fix the soundness issues,
I expect that we would be destroying all performance of global-scope
code. For this reason, I would like to try to make the version in this
PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
eval(:(f() = 1))
f()
```

We could apply the same `include` special case to eval, but given the
existence of `@eval` which allows addressing this at the macro level, I
decided not to. We can decide which way we want to go on this based on
what the package ecosystem looks like.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant