Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: IdDict replaces ObjectIdDict #25210

Merged

Conversation

mauro3
Copy link
Contributor

@mauro3 mauro3 commented Dec 20, 2017

Third time lucky? This is a the third attempt, after #25196 and #24932. After letting Jeff's comment in #25196 sink in, I decided to not wrap ObjectIdDict but to directly furbish its getters and setters with type assertions. This is cleaner than either of my previous attempts. However, this doesn't quite work yet:
the tests run locally but then the julia process does not exit (CPU is back to 0%). I suspect it has something to do with the C-part.

Also, I removed a lot of the nonspecialize stuff. Not sure what the impact of that is.

for (k,v) in pairs; d[k] = v; end
d
end
end
const _IdDict = IdDict{Any,Any} # this is needed to make src/dump.c and src/staticdata.c work
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is iffy. I did this to make it compile, see comment below.

src/dump.c Outdated
@@ -2297,7 +2297,7 @@ JL_DLLEXPORT int jl_save_incremental(const char *fname, jl_array_t *worklist)
htable_new(&backref_table, 5000);
ptrhash_put(&backref_table, jl_main_module, (char*)HT_NOTFOUND + 1);
backref_table_numel = 1;
jl_idtable_type = jl_base_module ? jl_get_global(jl_base_module, jl_symbol("ObjectIdDict")) : NULL;
jl_idtable_type = jl_base_module ? jl_get_global(jl_base_module, jl_symbol("_IdDict")) : NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/staticdata.c Outdated
@@ -1205,7 +1205,7 @@ static void jl_save_system_image_to_stream(ios_t *f)
}
}

jl_idtable_type = jl_base_module ? jl_get_global(jl_base_module, jl_symbol("ObjectIdDict")) : NULL;
jl_idtable_type = jl_base_module ? jl_get_global(jl_base_module, jl_symbol("_IdDict")) : NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rehashing will now need to be done for all instantiations of IdDict, not just IdDict{Any,Any}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just replacing it with jl_symbol("IdDict") does not work. Julia compiles fine but the parallel stuff does not work (and thus the test don't run either). On eac2ff7:

julia> using Distributed; addprocs(2)
      From worker 2:    Master process (id 1) could not connect within 60.0 seconds.Worker 2 terminated.
ERROR: 
      From worker 2:    exiting.StackOverflowError:
haskey at ./abstractdict.jl:553 [inlined]
is_root_module at ./loading.jl:371 [inlined]
serialize_mod_names(::Distributed.ClusterSerializer{TCPSocket}, ::Module) at ./serialize.jl:355
serialize_mod_names(::Distributed.ClusterSerializer{TCPSocket}, ::Module) at ./serialize.jl:358 (repeats 97 times)

...and 1 more exception(s).


Stacktrace:ERROR (unhandled task failure): Version read failed. Connection closed by peer.

 [1] sync_end() at ./task.jl:300
 [2] macro expansion at ./task.jl:316 [inlined]
 [3] #addprocs_locked#40(::NamedTuple{(),Tuple{}}, ::Function, ::Distributed.LocalManager) at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Distributed/src/cluster.jl:401
 [4] addprocs_locked at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Distributed/src/cluster.jl:372 [inlined]
 [5] #addprocs#39(::NamedTuple{(),Tuple{}}, ::Function, ::Distributed.LocalManager) at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Distributed/src/cluster.jl:365
 [6] addprocs at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Distributed/src/cluster.jl:361 [inlined]
 [7] #addprocs#268 at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Distributed/src/managers.jl:316 [inlined]
 [8] addprocs(::Int64) at /home/mauro/julia/julia-master/usr/share/julia/site/v0.7/Distributed/src/managers.jl:315
 [9] top-level scope

I think the problem is the re-hashing:

julia> haskey(Base.module_keys, Main) # true about half the time
false

julia> Base.rehash!(Base.module_keys);

julia> haskey(Base.module_keys, Main)
true

IdDict() = IdDict{Any,Any}()
IdDict(kv::Tuple{}) = IdDict()

IdDict(ps::Pair{K,V}...) where {K,V} = IdDict{K,V}(ps)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need all this; we can just default to Any,Any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will make IdDict construction different from Dict construction. Is that desired?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already is; previously all ObjectIdDicts were AbstractDict{Any,Any}s. ObjectIdDicts are also often heterogeneous, and I don't see a need to break those use cases (i.e. where different-typed keys are added to a dict after construction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make Julia more consistent (c.f. #20402) to have IdDict constructors behave like Dict constructors (and also like Arrays, and presumably many others which try to guess a tight type parameter according to the arguments). Also note that of the uses in Base, only one would need updating toIdDict{Any,Any}(...).

That would also make a possible future transition to a more unified dict-backend along the lines of #24932 easier.

end

get(t::ObjectIdDict, @nospecialize(key), @nospecialize(default)) =
ccall(:jl_eqtable_get, Any, (Any, Any, Any), t.ht, key, default)
function get(d::IdDict{K,V}, key::K, default) where {K, V}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key doesn't need to be of type K, since it's ok for it not to be found here. The @nospecialize annotations can also probably be retained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you'd prefer a KeyError to a MethodError. Would it then also make sense to throw a KeyError for setindex!?

Also, should I retain all @nospecialize or just the one you mentioned here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is similar enough to the ArgumentError("$key0 is not a valid key for type $K") that Dict throws that we should use the same error here?

And yes, we should keep all the @nospecialize. The goal of this should just be to allow type parameters in ObjectIdDict with the minimum changes.

v = ccall(:jl_eqtable_get, Any, (Any, Any, Any), d.ht, key, default)
v===default ? default : v::V
end
function getindex(d::IdDict{K,V}, key::K) where {K, V}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, there should be a KeyError here if the key is not of type K.

# TODO: this can underestimate `ndel`
val === default || (t.ndel += 1)
return val
if val==default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== ?

@JeffBezanson
Copy link
Member

👍 Yes I think this is the right approach.

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 20, 2017

Thanks for the comments. However, I'm stuck on #25210 (comment).

IdDict{K,V}(d::IdDict{K,V}) where {K, V} = new{K,V}(copy(d.ht))
function IdDict{K,V}(d::IdDict{K}) where {K, V}
d = IdDict{K,V}()
sizehint!(d, length(pairs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems wrong. Is it a mistaken copy-paste of the second method above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, the pairs is wrong. But the method itself is needed:

My thinking here was to provide a method to allow something like:

di = IdDict{Symbol,Int}(:A=>1)
IdDict{Symbol,Float64}(di)

the loop doing the setindex! will call convert on the integer values of di.

I'll fix it and add a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is not needed after all as it will call into the general function IdDict{K,V}(itr) ... method. Thanks!

@mauro3 mauro3 force-pushed the m3/IdDict-replaces-ObjectIdDict branch from eac2ff7 to a509428 Compare December 21, 2017 22:23
@mauro3
Copy link
Contributor Author

mauro3 commented Dec 21, 2017

I can spend some more time on this tomorrow to try and hunt down the right invocation in dump.c and staticdata.c to make automatic re-hashing work again. See #25210 (comment) for details. I suspect that the problem is that IdDict{K,V} is not a symbol anymore, so jl_symbol("IdDict") doesn't catch it (nor jl_symbol("IdDict{K,V}")).

However, I have no clue on how to make this work so it will be blind trial and error. If someone finds the time to point me in the right direction, that would be much appreciated.

@mauro3 mauro3 force-pushed the m3/IdDict-replaces-ObjectIdDict branch from a509428 to c50378f Compare December 22, 2017 15:39
@mauro3
Copy link
Contributor Author

mauro3 commented Dec 22, 2017

After an educational morning digging through the C-part of Julia, I found a way to make it work 🎆. The equality test needed here (and also in dump.c) had to be converted into a subtype test. I did not find a function for this so I took inspiration from, e.g., array subtyping tests here. The test suite now passes without a hickup (barring any rebase hickups before the push), but definitely needs checking by someone who groks this.

Other things I did:

  • I added all @nospecialize annotations back in (and added them get! also, as it seems inconsistent that it should be lacking them).
  • I added the normal dict-like constructors back in as they make Julia more consistent (IMO), here my argument. But I can just remove the last two commits, to revert to good old ObjectIdDict-days.

@mauro3
Copy link
Contributor Author

mauro3 commented Dec 25, 2017

I'll be away until 3 Jan. In the unlikely event that this acquires urgency, feel free modify this PR.

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 16, 2018

This still needs a review, in particular the C-part.

@JeffBezanson JeffBezanson self-assigned this Jan 16, 2018
pop!(t, key, secret_table_token)
t
function delete!(d::IdDict{K}, @nospecialize(key)) where K
!isa(key, K) && throw(KeyError(key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any other methods of delete! throw KeyErrors.

@JeffBezanson
Copy link
Member

This looks great, let's do it!

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 17, 2018

Thanks Jeff. I removed the KeyError from delete!, rebased and squashed this into one commit. I also added two NEWS.md entries.

So this is ready to be merged when CI passes, unless it is deemed necessary to run nanosoldier on this, although I think there are no direct ObjectIdDict tests in there. I have re-run some primitive performance tests in https://gist.github.com/mauro3/ada59b1fd9a09202631edccf255c4e1b#gistcomment-2323560, and find no regression.

- also updated constuctors to be in-line with Dict
- added some extra tests
@mauro3 mauro3 force-pushed the m3/IdDict-replaces-ObjectIdDict branch from bca7dd7 to 56a5133 Compare January 17, 2018 11:46
@mauro3
Copy link
Contributor Author

mauro3 commented Jan 17, 2018

Looks like the one failure on AppVeyor was a time-out (finished at 2h exactly).

@JeffBezanson
Copy link
Member

Yes that's happening on basically every build lately.

@JeffBezanson JeffBezanson merged commit 825ad78 into JuliaLang:master Jan 17, 2018
@StefanKarpinski
Copy link
Member

The doc builds take so long these days that AppVeyor times out all the time. It's a pretty urgent problem at the moment. cc @mortenpi: it would be great to get a fix – any fix for this merged asap.

@JeffBezanson
Copy link
Member

Thinking about it some more, and while updating some code to use this, I wonder if all IdDict(...) constructors should make an IdDict{Any,Any} (instead of deriving type parameters from their argument types like Dict). The advantage is that people can use IdDict as a drop-in replacement for ObjectIdDict in more cases, and we avoid extra specialization that's likely not too valuable (since there's no specialization on storage).

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 18, 2018

I'd like to argue against this once more (x-ref). Most other datatypes, e.g. Array, Set, Ref, etc. infer their type parameters based on input to the constructors (are there exceptions?). Having IdDict behave differently is a subtle gotcha, in particular with its cousin Dict(and other AbstractArrays) behaving normally. Also note that in this PR, there was only one instance where I had to add a {Any,Any}, so it may not happen that often. So, in my opinion, having a consistent API wins hands down over the slight convenience of not having to type {Any,Any} once in a while.

and we avoid extra specialization that's likely not too valuable (since there's no specialization on storage).

One reason to have the type parameters is to make the functions using an IdDict type stable, which happens even if the underlying storage is not.

Edit: would it make sense to discuss this over in #20402?

Either way, the decision should be taken now. Let me know if you want to change it and I can prepare a PR.

@StefanKarpinski
Copy link
Member

The consistency argument is pretty compelling, especially given that initializing an IdDict with values to specialize on at all is a bit strange – and if you do, perhaps you want the tighter type.

@JeffBezanson
Copy link
Member

Fair enough; it's ok to keep it as-is.

Keno pushed a commit that referenced this pull request Jun 5, 2024
- also updated constuctors to be in-line with Dict
- added some extra tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants