Skip to content

Support different subtypes of AbstractDict in merge()/mergewith()#61329

Closed
JamesWrigley wants to merge 1 commit intoJuliaLang:masterfrom
JamesWrigley:dict-merge
Closed

Support different subtypes of AbstractDict in merge()/mergewith()#61329
JamesWrigley wants to merge 1 commit intoJuliaLang:masterfrom
JamesWrigley:dict-merge

Conversation

@JamesWrigley
Copy link
Copy Markdown
Member

The merge functions internally use the _typeddict() helper function, which returns a new Dict with the correct promoted key/value types. But the fact that it always returns a Dict means that the default implementations of merge()/mergewith() cannot be used for new AbstractDict subtypes that wish those functions to return their own custom type.

For example, OrderedDict: https://github.com/JuliaCollections/OrderedCollections.jl/blob/1eaa98084bd5c6e1f6519527c8f9e856a2fd2047/src/ordered_dict.jl#L488
And on 1.13 nightly that causes a ton of invalidations when loaded through CurveFit.jl:

julia> using SnoopCompileCore

julia> invs, trees = begin
           invs = @snoop_invalidations using CurveFit
           using SnoopCompile, AbstractTrees
           trees = invalidation_trees(invs)
           invs, trees
       end;

julia> trees[end]
inserting merge(d::OrderedCollections.OrderedDict, others::AbstractDict...) @ OrderedCollections ~/.julia/dev/OrderedCollections/src/ordered_dict.jl:488 invalidated:
   mt_backedges: <38 invalidations I deleted because they have enormous types>
   backedges: 1: superseding merge(d::AbstractDict, others::AbstractDict...) @ Base abstractdict.jl:359 with MethodInstance for merge(::AbstractDict, ::Base.Pairs{Symbol, Union{}, Nothing, @NamedTuple{}}) (21 children)
              2: superseding merge(d::AbstractDict, others::AbstractDict...) @ Base abstractdict.jl:359 with MethodInstance for merge(::AbstractDict, ::Base.Pairs{Symbol, _A, Nothing, A} where {_A, A<:NamedTuple}) (2489 children)

By allowing the Base functions to work with other dict types we could delete the methods from OrderedCollections and fix the invalidations. But, this is definitely breaking. A non-breaking alternative would be to just set max_methods for merge()/mergewith() to 1. I think the arguments for doing it anyway are:

  1. Intuitively, merge(::MyDict, ::MyDict) -> MyDict is the behaviour I would expect with a custom dict. It's odd that this is supported for any AbstractDict but always returns a Dict, which is more like the behaviour of collect().
  2. The docstrings for merge()/mergewith() do not specify the type of the return value so it's kinda-sorta legal to change it. Admittedly this is a weak argument, we would probably need to do a PkgEval to see how breaking it actually is.
  3. Supporting other dict types in Base allows more code reuse. Other packages currently need to implement their own type-promotion for keys/values to implement merge()/mergewith() properly.

Hardcoding Dict was done years ago, I suspect unintentionally: ebbf1dd
Calling T.name.wrapper to get the original type is to preserve this optimization: #22737

(CC @cstjean)

The merge functions internally use the _typeddict() helper function, which
returns a new Dict with the correct promoted key/value types. But the fact that
it always returns a Dict means that the default implementations of
merge()/mergewith() cannot be used for new AbstractDict subtypes that wish
those functions to return their own custom type.
K = promoteK(keytype(d), others...)
V = promoteV(valtype(d), others...)
Dict{K,V}(d)
T.name.wrapper{K, V}(d)
Copy link
Copy Markdown
Member

@vtjnash vtjnash Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite dangerous, since T.name is not a stable result and name.wrapper is not a public API and the result of apply_type {K, V} is user-defined, and does not make a dictionary of K => V

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find another way of doing it, is there a better way?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...the result of apply_type {K, V} is user-defined, and does not make a dictionary of K => V

Do you mean if a constructor returns something other than its struct type? I would consider that a bug in user code. We can assume that AbstractDict subtypes have at least two type parameters since that's part of the definition:

abstract type AbstractDict{K,V} end

FWIW, an example of having more than two type parameters is SortedDict: https://github.com/JuliaCollections/DataStructures.jl/blob/06eb26e60e9d8dd01dd5f416722994a783ba5ae8/src/sorted_dict.jl#L50

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm though I see SortedDict also does some optimizations for constructors and would likely need to keep its custom merge functions anyway, which is a point in favour of just setting max_methods to 1 🤔

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is a faulty assumption. Many subtypes do not have parameters, or have more or have them in other orders

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I guess the requirement is that they have a constructor that does take in key/value type parameters. That is more invasive than I thought :( I still think it would be an improvement but I'm not sure if it's too breaking.

Copy link
Copy Markdown
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this is spelled empty(d, K, V)?

@JamesWrigley
Copy link
Copy Markdown
Member Author

IIRC, this is spelled empty(d, K, V)?

That is indeed what I initially did, but that would require an extra merge!() call, which would hurt performance: #22737

oscardssmith pushed a commit that referenced this pull request Mar 25, 2026
Should prevent these invalidations seen when loading CurveFit.jl on
1.13:
```julia
inserting merge!(m::DataStructures.SortedDict{K, D, Ord}, others::AbstractDict{K, D}...) where {K, D, Ord<:Ordering} @ DataStructures ~/.julia/packages/DataStructures/qUuAY/src/sorted_dict.jl:473 invalidated:
   mt_backedges: 1: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator{_A, Base.Compiler.IRShow.var"#30#31"} where _A, ::Any) where {K, V} (0 children)
                 2: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator, ::Any) where {K, V} (0 children)
                 3: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator, ::Any) where {K, V} (0 children)
                 4: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator{I, Base.var"#Generator##0#Generator##1"{NonlinearSolveBase.Utils.var"#norm_op##2#norm_op##3"{typeof(+)}}} where I<:(Base.Iterators.Zip{Is} where Is<:Tuple{Base.AbstractBroadcasted, Base.AbstractBroadcasted}), ::Tuple{Tuple{Any, Any}, Tuple{Any, Any}}) where {K, V} (0 children)
                 5: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator{_A, F} where {_A, F<:DifferentiationInterface.var"#_prepare_jacobian_aux##0#_prepare_jacobian_aux##1"}, ::Any) where {K, V} (0 children)
                 6: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator{_A, F} where {_A, F<:DifferentiationInterface.var"#_prepare_jacobian_aux##12#_prepare_jacobian_aux##13"}, ::Any) where {K, V} (0 children)
                 7: signature Tuple{typeof(merge!), Any, AbstractDict} triggered MethodInstance for Base.grow_to!(::AbstractDict{K, V}, ::Base.Generator{I, Base.var"#Generator##0#Generator##1"{NonlinearSolveBase.Utils.var"#norm_op##2#norm_op##3"{typeof(+)}}} where I<:(Base.Iterators.Zip{Is} where Is<:Tuple{Base.AbstractBroadcasted, Vector{Float64}}), ::Tuple{Tuple{Any, Any}, Int64}) where {K, V} (0 children)
   backedges: 1: superseding merge!(d::AbstractDict, others::AbstractDict...) @ Base abstractdict.jl:224 with MethodInstance for merge!(::AbstractDict, ::AbstractDict) (1126 children)

inserting merge(d::OrderedCollections.OrderedDict, others::AbstractDict...) @ OrderedCollections ~/.julia/dev/OrderedCollections/src/ordered_dict.jl:488 invalidated:
   mt_backedges: <38 invalidations I deleted because they have enormous types>
   backedges: 1: superseding merge(d::AbstractDict, others::AbstractDict...) @ Base abstractdict.jl:359 with MethodInstance for merge(::AbstractDict, ::Base.Pairs{Symbol, Union{}, Nothing, @NamedTuple{}}) (21 children)
              2: superseding merge(d::AbstractDict, others::AbstractDict...) @ Base abstractdict.jl:359 with MethodInstance for merge(::AbstractDict, ::Base.Pairs{Symbol, _A, Nothing, A} where {_A, A<:NamedTuple}) (2489 children)
```

Partial alternative to #61329. I added `mergewith()` and `mergewith!()`
to be on the safe side.
@JamesWrigley
Copy link
Copy Markdown
Member Author

I'll close this since the invalidations issue is fixed by #61330.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants