Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ones(), zeros(), cat() change key type of name dictionary #60

Open
scheidan opened this issue Jan 30, 2018 · 10 comments
Open

ones(), zeros(), cat() change key type of name dictionary #60

scheidan opened this issue Jan 30, 2018 · 10 comments

Comments

@scheidan
Copy link
Contributor

A very useful package, thanks a lot for your work!

Some functions seem to change the type of the keys of the name dictionaries to Any. For example:

n = NamedArray(rand(2,4))

oo = ones(n)
zz = zeros(n)

nh = hcat(n, n)
nn = cat(2, n, n)

keytype.(n.dicts)               # (String, String)

keytype.(oo.dicts)              # (Any, Any)
keytype.(zz.dicts)              # (Any, Any)

keytype.(nh.dicts)              # (String, String), good!
keytype.(nn.dicts)              # (Any, Any)

Maybe it is related to #58 but I hope it is still useful to report.

As workaround for zeros and ones I use currently this:

0.0*n       # zeros()
(0.0*n + 1) # ones()
@davidavdav
Copy link
Owner

I'm not sure if it is related to #58, but it definitely is something I wasn't aware of an that needs some attention. The type stability issue still hasn't been resolved, it is a pretty tough one.

@davidavdav
Copy link
Owner

It looks like the culprit is similar(n::NamedArray), which has to build the index dicts. It doesn't know the key types in advance, since the dimensions of the similar array may be different from n, in which the default type (String) is to be generated.

In that sense it is related to #58, because the current declaration

dicts = Array{OrderedDict{Any,Int}}(nd)

helps to fix #58, but causes this #60.

So far the conclusion is that I am not a very good type stability programmer.

@nalimilan
Copy link
Contributor

It should be much easier to fix than #58, though. You can just define a Base.similar{T,N}(n::NamedArray{T,N}, t::Type) method which always uses the same dimension types.

@davidavdav
Copy link
Owner

Thanks, that idea crossed my mind, too. That is probably easiest.

@davidavdav
Copy link
Owner

davidavdav commented Jan 31, 2018

The zeros and ones, are fixed now, but cat c.s. is non-trivial. Consider

julia> n = NamedArrays.NamedArray(rand(3), [DataStructures.OrderedDict(:a => 1, :b =>2, :c =>3)])
3-element Named Array{Float64,1}
A  │ 
───┼──────────
:a0.0794262
:b0.28769
:c0.123761

Both hcat(n, n) and vcat(n, n) have to re-invent names. This make the keytype go from Symbol to String in the case of vcat. Both situations break #58.

@nalimilan
Copy link
Contributor

What do you mean by "reinvent names"? Why couldn't Symbol names be created?

@davidavdav
Copy link
Owner

So currently, a vcat(n, n) generates names "1" ..."6" for dim A in the case above. Of course in this particular case they could have been named Symbol("1") etc., to keep the keytype the same, but I don't really see the point there. The keys along the dimension that is concatenated can be of any type, and only in the case the names of the constituent arrays don't intersect the types and names could carry on to the concatenated array. Currently there is no check for that case---but I don't even want to start imagining the trouble needed to do that checking and joining in typestable code.

@nalimilan
Copy link
Contributor

I would have expected the names generated by vcat(n, n) to be e.g. [:a, :b, :c, :a1, :b1, :c1] or something similar. Anyway the type should be preserved, and if types differ promote should be used to choose a common type.

BTW, when writing inferrable code is too hard, generated function can help, since they allow doing whatever complex operations you need to compute manually the return type based on the input types, and then the compiler doesn't have to guess that type.

@davidavdav
Copy link
Owner

The original names can be anything, not just symbol or string, so I don't see how a general sensible naming scheme can be devised. promote_type(String, Symbol) is Any, I can't imagine this is going to help a lot.

@nalimilan
Copy link
Contributor

Yes, Any would be used in the worst case, but then you can't do better anyway and users should take care of using compatible names (at least if they care about performance).

Then indeed for custom name types it might not be possible to create unique names as for strings, but that doesn't sound like a sufficient reason to avoid reusing names when possible: better optimize for the most common case where either names are unique, or they are strings and can be made unique. Maybe just throw an error when duplicated names are found and they are neither strings nor symbols?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants