-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unifying search & find functions #10593
Comments
Ah, just found #5664. Keeping this issue open because of the long summary above. There's also #7327, which is about finding maximum and minimum. It could be more logical to change See the spreadsheet at https://docs.google.com/spreadsheets/d/1ZLnlYQyRIWa50-mxOmKHNCaJEzGShLVSEkfid2KA-Ps/edit#gid=0 |
Any comments? |
+1 to the general unification of these APIs. If there is general consensus, we should also do it immediately. |
I'm a little disappointed this hasn't garnered any attention. I'd love to see these functions cleaned up, too… I often have difficulty remembering which function to use if it's been a little while. I'm afraid that nobody has dared comment simply because the scope here is potentially huge. Some thoughts:
In general, though, I think we should start from scratch and define what we want the
As a practical matter, we simply cannot fold in all these behaviors to one name, so perhaps that is the significant difference between search and find? find(A) # Indices of nonzero elements
find(predicate, A) # would prefer to duck-type predicate and just call it
find(A, values) # like findin; would prefer to duck-type values and just iterate over them
find(A, value) # somewhat like search (but all at once); just want to check equality
|
In terms of orthogonal design elements, we currently have a jumbled mishmash of combinations of:
|
Thanks, that's a useful way of summarizing the requirements |
It took me a while to get there, but I think that's the most useful way to think about this. Then the question is simply what names we want to give to those capabilities. Here's one terribly disruptive possibility:
The (I don't really like this because it makes gives a pretty limited meaning to the very nice name |
Nice table. We may be able to merge But I'm not a fan of the |
or |
I went with this because I find combinations of more than two words pretty unreadable and I went with suffixes for the kind of searching operation. This would result with things like I think we could only unify the |
We could call them all find(A, start::Int, rev::Bool=false) # or dir::Order.Ordering=Order.Forward Edit: this needs more explanation: without the start index, it's an all-at-once operation:
|
Way clearer than a Boolean. A Boolean would be something I'd need to check the docs for every time I encountered it. |
That's what I had initially, but Base.Order and Order.Forward are both unexported… and so I changed to match the sorting API. That's a minor issue, though. I'm not sure I like having both iterative and all-at-once behaviors under the same name. |
@mbauman, I really like your distinction of operation verses how to operate ("iteratively" vs "all-at-once"). I would suggest that the "all-at-once" operations are actually closer to filtering, though, and these seem like different enough concepts from the "iterative" operations to warrant a distinct name. I actually kind of liked your first (Also: referring to the table for |
@kmsquire As I understand it, the outcome of the threads you link to is that there's no clear distinction between "find" and "search", except that the latter insists on the process and may no return anything (but in our case, Otherwise, I find that the idea of merging forward/reverse search is appealing, but it can get quite confusing as the start index is not in the same position in |
I think the distinction at least makes some sense if you describe the two
cases as eg
"Find all matches"
and
"Search for the first match after the given index"
where the first one will indeed always find all matches (though they may be
zero) but the other one might not find the next match if there is none.
|
How about this plan, in which everything would be called
This is essentially @mbauman's last table from #10593 (comment), except that the boolean is replaced with I can have a look at a PR if you agree. |
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close #22611 Close #24613 See also: #10593 #23612 #24103
Status update which covers everything in the Proposal 3 of the Search and Find Julep and in related discussion points:
|
These seem unrelated, but they're actually linked: * If you reverse generic strings by wrapping them in `RevString` then then this generic `reverseind` is incorrect. * In order to have a correct generic `reverseind` one needs to assume that `reverse(s)` returns a string of the same type and encoding as `s` with code points in reverse order; one also needs to assume that the code units encoding each character remain the same when reversed. This is a valid assumption for UTF-8, UTF-16 and (trivially) UTF-32. Reverse string search functions are pretty messed up by this and I've fixed them well enough to work but they may be quite inefficient for long strings now. I'm not going to spend too much time on this since there's other work going on to generalize and unify searching APIs. Close JuliaLang#22611 Close JuliaLang#24613 See also: JuliaLang#10593 JuliaLang#23612 JuliaLang#24103
Even if the API implemented in the above PRs is much more consistent than the previous one, I wonder a few more changes wouldn't make it even better. I'm now tempted to say that an ideal design would involve renaming The two potential issues with this proposal are
Opinions? |
I like the idea of renaming |
|
I think the set |
#24673 closes this as far as I'm concerned. |
@JeffBezanson I don't think so, see the check list in my comment above. In particular there's still #24774 and JuliaLang/Juleps#47. We can also rename |
Just wanted to say that despite the fact that I barely interacted with @nalimilan's work here, the new names seem so compelling that when I recently had a 0.6-based project, I could barely remember the old way of doing this stuff and had to look at the docs several times ("right, |
Hi everyone! I'm not sure if anyone will see this but I am having some trouble finding the indices from a user inputted string. I have a string of ID's I am interested in finding: and want to search the second column of a matrix which has all ID's ( of type Array{AbstractString,2}) for the indices that match the strings in ppl_id. Any ideas...? It seems I can't get this to work with find() or findeach() or what you guys have mentioned above.. |
Questions are better suited for the forum: https://discourse.julialang.org/ |
Currently there are three families of search & find functions:
In the
find
family,find
,findn
return indexes of non-zero ortrue
values.findfirst
,findlast
,findprev
andfindnext
are very similar tofind
, but kind of iterative, and they additionally allow looking for an element in a collection (the latter behavior being similar tofindin
). Thefindmin
andfindmax
functions are different, as they return the value and index of the min/max. Finally,findnz
is even more different as it only works on matrices and returns a tuple of vectors (I,J,V) for the row- and column-index and value.In the
search
family,[r]search
and[r]searchindex
look for strings/chars/regex in a string (though they also support bytes), the former returning a range, the latter the first index.searchsorted
,searchsortedlast
andsearchsortedfirst
look for values equal to or lower than an argument, and return a range for the first, and index for the two others.indexin
is the same asfindin
(i.e. returns index of elements in a collection), but it returns0
for elements that were not found, instead of a shorter vector.I hope that summary is exact. Please correct me if not.
Questions/ideas:
findin
be renamed tofind
, as the signatures do not conflict? That would meanfindfirst
,findlast
,findprev
andfindnext
would just be iterating versions offind
. Currentlyfind
offers less methods than the others.That way,
indexin
could be renamed tofindin
to reunite the family (or add an argument to switch behaviors?)find
andsearch
? I suggest we rename allsearch
functions tofind*
:searchsorted*
would becomefindsorted*
,searchindex
would be merged withfindfirst
,rsearchindex
withfindlast
.search
could be renamed tofindfirstrange
, andrsearch
tofindlastrange
, making them find any sequence of values in any collection, and not only in strings; if not, nicer names could befindstr
andfindrstr
.That way, you can easily get a list of interesting functions by typing
find[tab][tab]
.findfirst
,findlast
,findprev
andfindnext
could be replaced/supplemented with an iteratoreachfind
/findeach
?The text was updated successfully, but these errors were encountered: