Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eachindex() for AbstractString and Associative #11708

Merged
merged 1 commit into from
Jun 21, 2015
Merged

Conversation

nalimilan
Copy link
Member

Fixes #11649 which was only about strings, but in the end I didn't see the point in restricting this to strings: it could be useful for any other type which would use unevenly spaced or non-integer indices. Actually, it's very similar to Enumerate, but returning the actual indices which can be passed to getindex (while Enumerate returns the element number), and of course not returning the value.

The potential issue with this very general approach is that I'm using next instead of nextind, which is wasteful since we don't use the value. We could restore nextind, which is currently deprecated for non-strings, if we want to keep this general approach. But in my tests using nex or nextind doesn't make any difference for strings (see below).

Either way, it's not very efficient. Calling eachindex and indexing the string is about twice slower as iterating over it directly for a UTF8String. Indeed, it appears that next is not inlined in either case. Is that expected?

Another funny result is that with an ASCIIString, using nextind in eachindex is actually faster than iterating over the string. This is because next is inlined in that case, but not when iterating. I'm not sure why, but this seems to indicate something is wrong in the standard behavior.

julia> function f(s)
           [c for c in s]
       end
f (generic function with 1 method)

julia> function g(s)
           [s[i] for i in eachindex(s)]
       end
g (generic function with 1 method)

julia> s = "some string but quite long to make the test interesting €"
"some string but quite long to make the test interesting €"

julia> f(s); g(s);

julia> @time for i in 1:10000000; f(s); end
   7.724 seconds      (10000 k allocations: 3204 MB, 1.48% gc time)

julia> @time for i in 1:10000000; g(s); end
  15.677 seconds      (20000 k allocations: 3357 MB, 1.21% gc time)

And with nextind:

julia> Base.next(e::EachIndex, state) = (state, nextind(e.itr, state))
next (generic function with 60 methods)

julia> function g(s)
           [s[i] for i in eachindex(s)]
       end
g (generic function with 1 method)

julia> g(s);

julia> @time for i in 1:10000000; g(s); end
  16.803 seconds      (20000 k allocations: 3357 MB, 1.13% gc time)

@JeffBezanson
Copy link
Member

I'm not sure we want a definition as generic as this. The second value returned by next is not necessarily an index.

@nalimilan
Copy link
Member Author

@JeffBezanson Right. So would you prefer a definition just for strings, or extending nextind for all iterables?

@nalimilan
Copy link
Member Author

Bump!

@mbauman
Copy link
Member

mbauman commented Jun 18, 2015

I'm not convinced that this should be defined for all iterables. This is currently wrong for lots of base iterables — dict, set, task, zip, …

I think that this is something that non-array types should opt into themselves on a case-by-case basis.

@nalimilan
Copy link
Member Author

OK, I'll change it to be specific to strings.

@nalimilan
Copy link
Member Author

Don't you think there should also be a (different) definition for Dicts too? After all, they can be indexed too: you just need to iterate over keys.

@mbauman
Copy link
Member

mbauman commented Jun 18, 2015

Don't you think there should also be a (different) definition for Dicts too

Yes, possibly. I'm trying to think of use-cases where it'd be useful to code in the general instead of just using keys. Really, it'd be more comparable to a hypothetical each_stored_index for sparse matrices.

@nalimilan nalimilan changed the title Add eachindex() fallback for arbitrary iterables Add eachindex() for AbstractString and Associative Jun 19, 2015
@nalimilan
Copy link
Member Author

I've updated the PR. Actually, eachindex for Associative can be exactly the same as keys, which is cool since it avoids duplication.

@mbauman
Copy link
Member

mbauman commented Jun 20, 2015

👍 LGTM.

@nalimilan
Copy link
Member Author

Thanks! I'll merge this today if nobody objects.

@kmsquire
Copy link
Member

Are there thoughts about the fact that eachindex(d::Dict) and keys(d::Dict) are the same function? On one hand, I like the fact that eachindex is consistent across (some) indexible containers. On the other, I don't know if two functions with different names should do exactly the same thing.

@ScottPJones
Copy link
Contributor

It doesn't bother me, really... length and sizeof do the same thing for ASCIIString, for example, but for other types, they don't.

@nalimilan
Copy link
Member Author

Yeah. I wondered whether deprecating keys in favor of eachindex would make sense, but keys sounds a much nicer name for working with dicts. As dicts are only a corner case for eachindex, I don't think the duplication is really an issue.

@nalimilan
Copy link
Member Author

Thanks to @mbauman's new docs about interfaces, I realized it would be useful to add eltype(::EachStringIndex) too. I've updated the PR, and will merge it if tests pass.

There's no reason not to allow this. Fixes #11649.
nalimilan added a commit that referenced this pull request Jun 21, 2015
Add eachindex() for AbstractString and Associative
@nalimilan nalimilan merged commit a94da23 into master Jun 21, 2015
@nalimilan nalimilan deleted the nl/eachindex branch June 21, 2015 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants