-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: strings: add Cut #40135
Comments
It would help if you could identify some cases in the standard library, and/or in other popular Go packages, where the new functions would be used. That is, give some supporting information for "in many projects I've seen...." Thanks. |
@ianlancetaylor A simple if i := strings.LastIndex(prefix, "/"); i >= 0 {
prefix = prefix[i+1:]
} Here's one from if i := strings.LastIndex(path, "/"); i >= 0 {
path = path[i+1:]
}
i := strings.LastIndex(authority, "@")
if i < 0 {
host, err = parseHost(authority)
} else {
host, err = parseHost(authority[i+1:])
}
if i := strings.IndexByte(resp.Status, ' '); i != -1 {
statusCode = resp.Status[:i]
}
if i := strings.LastIndex(name, "/"); i >= 0 {
name = name[i+1:]
}
if i := strings.LastIndexByte(string(n), '.'); i >= 0 {
return Name(n[i+1:])
}
return Name(n) Lots of that in I discovered these using: $ grep -A 5 -e 'strings.\(Last\)\?Index(' -r And then sifting the results manually. |
This looks like a variation on |
@networkimprov Those functions don't really split (turn one
into many) strings. And the proposed logic is different from that of
the |
@ainar-g I believe these would be equivalent name = strings.PrefixUntil(username, "@")
name, _ = strings.SplitFirst(username, "@") |
If I should rather Split off the below into its own proposal I can but it seems the frequency of both patterns (prefix, suffix and partition) together makes for a potential better outlook to be relevant enough for addition. Adding more general strings.SplitFirst and strings.SplitLast:Much code I have optimised in parsing functions involves splitting around an identifier and uses Index and LastIndex. Why I can give Googlers references to internal CLs. Here is one for std lib: Why not Alternative: strings.Partition like in Python:Another take on this could be to align this with str.partition from Python:
So we could discuss |
If something like this is accepted (and on balance I'm in favour), I'd prefer at least one variant that has a single return value so that it can be used directly in expressions. |
There may be an opportunity here, but it's a pretty subtle thing to get right as a general function.
parsing "rsc" will end up with name="rsc", host="rsc", which is not what you want. The Python partition function is more like what you want, in that it distinguishes "separator is present" from "separator is not". But it's a bit awkward to use too since there are no tuples to index like in Python, and allocating a slice defeats the purpose. There are many helper functions in strings already. Why is this one important, and - even more importantly - can we handle all the needs with a single function and still be convenient? |
The reason why I like a function that does a pair split is that I have optimized many parsing functions where you want to split off one element of a delimited list in a string without allocating and at the same time getting the rest of the string without the delimiter and split of element. Having a std library function do this that doesnt allocate a return slice would help readability: returning elem, separator, tail or elem, tail, ok as 3 return values does not need a slice or allocation. |
We certainly write code to do this operation all the time. Even just grepping for 'Index.* >=' in the Go tree I find plenty of code like:
and
If we had a function like:
then snippets like the above would become:
It does seem like it comes up a lot. What do people think? Should we add this (single) Cut function? Update: Edited to return (before, after, ok) instead of (before, sep, after), as suggested by @nightlyone. |
Since we pass in the separator, one could also return an OK bool instead of the separator. For this I would suggest
|
@nightlyone indeed, that signature is much better! Thanks. |
Since when does Go favour functions to save a line or two? 😞 Who wants to remember all those names, including the argument order? There are quite a few non-trivial things missing in the strings package. If there's room for growth then this cannot be the priority. 🙏 |
@pascaldekloe can you list the issue numbers of some open string package proposals that should be prioritized? |
I can think of some right now @martisch. // NRunes returns the first n runes in s. If s has less that n runes the return is s.
// Illegal byte sequences count as one rune per byte
func NRunes(s string, n int) string // IndexNth returns the index of the n-th occurence of substr in s.
func IndexNth(s, substr string) int // SplitSeq returns the bytes in between the seps if and only if they occur in s in that very order.
func SplitSeq(s string, seps ...string) []string Don't take these examples too literally. I'm making these up just now. The last one might even be quite useful like |
Also |
@pascaldekloe, I've never written code that would have used any of those three. On the other hand, I can point to tons of code I've written using Index and manual slicing that would be helped by Cut. Just one data point, but those seem like pretty rare needs that could easily be handled outside the standard library. |
If the usage metric is important then |
@pascaldekloe I'd encourage you to file separate proposals if you have a strong case for any new API in particular. I don't think it's helpful to continue in this thread. This proposal should be considered on its own merits and drawbacks alone, not competing for priority against a handful of other potential proposals. |
I think the latest design still reasonably supports this:
I agree with Russ's point in #40135 (comment), though I agree in some cases it would be nice to not have to declare an extra variable. |
My point was *not to add any of these functions @mvdan 🙃 |
Then, if you point is "we shouldn't add this func because we shouldn't add any of those others either", I disagree. You're assuming that they're all equally important. If you argue that it's impossible to gauge how important they are relative to each other, then we would never add any new APIs because we would be stuck in "what about" and "what if" discussions. |
This idea
looks like a special case of
|
@networkimprov please see the response that @martisch already provided:
|
He asked for SplitFirst & SplitLast, but Cut is only SplitFirst. Is the overhead of returning a slice vs two strings a problem much of the time? EDIT: also your tone may imply that I hadn't read the thread. |
An allocation is generally significantly more expensive than returning two extra values; allocations create GC work too.
I am simply pointing to previous replies to prevent the thread from going in circles. I don't imply anything else by it. |
Any possibility of optimizing creation of very short non-escaping slices, e.g. represent it as an array internally? |
@networkimprov, Yes, avoiding the allocation keeps people from using strings.Split or SplitN here.
than:
|
This thread is starting to escalate, and it feels like the kind of trivial change that leads to classic bikeshed arguments. Before you post, please think about: Thanks! |
I'd disagree that "SplitN" isn't nice enough. I name a short, constant slice to indicate both items: I'd instead suggest adding "SplitLastN" and optimizing slice creation. Both have wider applicability, and the latter improves existing programs. |
Cut has quite a nice API, and a nice name. But I'm sympathetic to the concern about adding too many new functions to the package. I buy the argument that Cut is a worthwhile addition, but I would like to understand better where we draw the line, and would be curious how others in this thread think about that. After all, adding new APIs does add complexity. |
@hherman1 I think Cut can be evaluated on its own without knowing exactly where the general threshold is for inclusion. There are many dimensions and the existing Go ecosystem and how developers write Go code to consider. If there is a general discussion needed of inclusion guidelines then I would suggest to open a new separate github issue about that to keep the discussion here focused on the specific proposal and use cases. |
Performance:Returning a variable sized value (slice backing array) on the stack requires upstack allocation and variable stack allocation (alloca). Both add considerable runtime complexity and can also affect performance of code not using those features as all Go code now has to deal with the possibility of the callee allocating a variable sized slice on the stack. While doing performance optimizations in dozens of different projects my experience strings.Split/SplitN has been often the one where the return slice allocation was the easiest to replace unfortunately with reduced readability of the new code. (unless writing Cut outside of std lib into its own library). Having a replacement in the std library would be nice to improve performance and readability and could likely also lead to Go programers using the more performant function from the start. As an anecdote to understand the potential magnitude of the performance difference of avoiding strings.Split and allocation. I have made at least one replacement CL for strings.Split in a large production code base that saved more resources than any other string function in the string package uses in absolute resources in the production environment. One reason maybe that strings.Split is often used in parsing strings (e.g. key value pairs) and parsing strings happens a lot without writing full custom hand optimized parsers. strings.Split is one of the common string functions that allocates and allocation is more compute and lock heavy vs the more lightweight computations that are usually needed for strings (Index, Trim, HasPrefix, ...). Readability:To me also the utility of Side note for function type signatureI had proposed |
I'm not v familiar with the runtime, but you'd only create a slice on the stack when it's small, can't escape, has a cap known at compile time, and can't be assigned anything of len > cap. I'm not opposed to SplitFirst & SplitLast; I'd use them. I don't think they should be added instead of optimization. |
If that is the constraints of the optimization that is supposed to optimize the allocation in Split and SplitN it wont apply since the returned slice is variable length (Update: also capacity, inside SplitN: N ist not const) and it escapes those functions. Leaving what happens after inlining aside for a moment IF its inlined which is not always the case. |
Creating it on the stack to return it means it can't escape the caller. cap(s) must be known at compile time (i.e. SplitN with const N), but len(s) can vary. |
@networkimprov I think you've made your points clear - you don't think Cut helps readability, and you think some SplitN calls could avoid allocations. For that last point, you should file an issue with the details on how exactly that would work with the compiler. I don't think it's a good idea to continue that discussion here, like Russ mentioned in his last comment. |
Sorry to belabor the point. @martisch claimed it wasn't feasible, so I tried to explain why it might be. Someone more familiar with compiler and runtime should file that issue. And maybe this could be placed on hold until that's decided. |
There's far too much discussion about this for right now. Putting on hold. Sorry for such an attractive bikeshed. |
cc @josharian @mdempsky re filing an issue to consider optimizing small, constant-cap slices; discussion began at #40135 (comment) |
Is there a reason to even consider the second example as a justification for this API? Can't this snippet
Simply be rewritten as:
|
Should this issue be closed since #46336 is accepted and implemented? |
In many projects I've seen code like this:
I think this operation—getting a prefix or suffix based on a substring—is common enough to think about adding helpers for them. So I propose to add functions
PrefixUntil
andSuffixAfter
(names up for debate) to package strings. Something like:So that the code could be rewritten as:
The text was updated successfully, but these errors were encountered: