Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API: String.Truncate(int maxLength, string truncationSuffix) #31655

Open
GSPP opened this issue Feb 3, 2020 · 8 comments
Open

New API: String.Truncate(int maxLength, string truncationSuffix) #31655

GSPP opened this issue Feb 3, 2020 · 8 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@GSPP
Copy link

GSPP commented Feb 3, 2020

I propose a new method on String to perform a fairly common task: Truncating a string to a maximum length. Scenarios:

  1. Inserting to a limited-length database field when data loss is preferable over failing. For example, when logging a user agent string of an HTTP request.
  2. Displaying a potentially long string to the user.
  3. Making sure that untrusted input is not unexpectedly long when failure is not desirable.

In my experience, this is quite a common need. I often need to define a helper function for this.

Semantics:

  1. A string that is less than the maximum length is unaltered.
  2. The API never returns a string larger than the maximum length.
  3. An "ellipsis" string can be specified. When a truncation must occur, this string is appended. A typical such string is "...".
  4. The ellipsis cannot be allowed to produce a string larger than the maximum length.

API:

class String
{
 public string Truncate(int maxLength, string truncationSuffix = "");
}

Open question 1: What happens, when maxLength < truncationSuffix.Length and a truncation must occur? The API should never exceed the maximum length. This could be an ArgumentException or the suffix itself could be truncated. In my opinion, input validation should enforce maxLength >= truncationSuffix.Length. This should always be enforced for reasons of API simplicity, not just in the cases where it matters (truncation).

An alternative view to that: What if the maximum length is dynamic? For example, when the user resizes a GUI window, maybe the maximum length is programmatically reduced. In that case, it could drop to below the ellipsis length and we'd like to truncate the ellipsis to meet the maximum length.

Open question 2: Should the empty string be normalized to string.Empty? I usually insert this optimization into string helpers that I write. Not sure if this is appropriate in the BCL.

Open question 3: Allow truncationSuffix == null? If it is allowed, it must be equivalent to "". But why would we allow two different inputs with the same semantics? Can callers not just use the default value "", or specify "" explicitly? How is this handled in other places in the BCL? My opinion: Disallow null. Let's be strict about input validation.

Open question 4: Is there a better name for truncationSuffix? This name can never be changed for compatibility reasons. Better get it right. I did not like ellipsis because a lot of people would now know what that word means. Instead of Suffix we could say Postfix.

@0xd4d
Copy link

0xd4d commented Feb 3, 2020

Sometimes you don't want to remove the last part of the string, eg. it's a file path. The last part is the file name so shouldn't be removed. Most tools I've seen truncate somewhere in the middle of the string if it's a file path.

@GrabYourPitchforks
Copy link
Member

Are there any localization concerns with such a feature? Examples would include special casing right to left languages or using a different default suffix per culture.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Feb 3, 2020

Other issues that might need addressed:

  • If we're talking storage (such as in a database), does the Truncate API necessarily assume that the backend system is storing the data as UTF-16? The data could still change length when stored as UTF-8 or UTF-32.

  • If we're talking UI, is this more of a concern for the renderer than it is for the framework? In a web browser you could use CSS to automatically add an ellipsis if appropriate. A rich GUI framework should provide a similar mechanism. In both cases this under the covers involves querying the operating system to ask "how wide would this text be if you were to draw it to the screen"?

  • Does the truncation operation itself need to be culture-aware? For example, in Hungarian, is it appropriate to truncate "xxxxxdzyyyyy" to "xxxxxd..."? (In Hungarian, {d} and {dz} are different characters.) Or in Spanish is it appropriate to truncate "xxxxxñyyyyy" to "xxxxxn..."? (The former string doesn't use a single 'ñ' character; it instead uses a standard latin 'n' followed by U+0303 COMBINING TILDE.)

Per your other open question:

Should the empty string be normalized to string.Empty?

This is already the case throughout the BCL. Even the string constructors special-case empty input and normalize to string.Empty. In fact it's quite difficult to get .NET to give you a zero-length string that's not the same object instance as string.Empty. So you don't normally need to worry about these sorts of optimizations inside your own code. :)

@huoyaoyuan
Copy link
Member

@GrabYourPitchforks

  • If we're talking UI

I wonder how it applies to console UI. It's not that simple (considering wide characters), but the demand does exist.
Is there any relatively rich CUI framework in .NET?

@joperezr joperezr added this to the Future milestone Jul 7, 2020
@IanMercer
Copy link

Should there be an option to truncate on a word boundary? It looks neater in many cases, provided of course it isn't too far back.

@GrabYourPitchforks
Copy link
Member

@IanMercer Truncation at a word boundary would still suffer from two of the three problems I mentioned in my previous comment. But if we were determined to add it, we could call into ICU and use the logic at http://www.unicode.org/reports/tr29/ as a fallback.

This makes me wonder if an out-of-band Unicode package might be appropriate. Could contain case folding (already in corefxlab), opinionated truncation, and a bunch of other things that we're hesitant to add to the BCL proper. @tarekgh?

@tarekgh
Copy link
Member

tarekgh commented Oct 16, 2020

I am seeing the whole feature should be in he UI frameworks (e.g. WPF, WinForms and Xamarin) more than in the core framework. This is mainly will be used for UI and not really for string manipulation. This feature has to work with the word breaking and line breaking which can depend on the language of the strings too. Also, usually string truncation will be needed to fit in the UI controls (like line wrapping in the editors or browsers) which means is more about the size of the used font and the width of the UI control
and not really about the string length.

@IanMercer
Copy link

@GrabYourPitchforks Oh, I agree, done 'properly' this is not a simple feature, I was just adding fuel to that argument. I don't think this belongs in core. UI Frameworks can handle it for most cases where truncation is needed and for the occasional non-UI case, like logging, it's a trivial piece of code to write for your specific situation (e.g. English, UTF-8, spaces as breaks, ...).

@GrabYourPitchforks GrabYourPitchforks added the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Nov 9, 2020
@terrajobst terrajobst removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

8 participants