Skip to content

Introducing System.Rune #23578

@migueldeicaza

Description

@migueldeicaza

Inspired by the discussion here:

dotnet/corefxlab#1751

One of the challenges that .NET faces with its Unicode support is that it is rooted on a design that is nowadays obsolete. The way that we represent characters in .NET is with System.Char which is a 16-bit value, one that is insufficient to represent Unicode values.

.NET developers need to learn about the arcane Surrogate Pairs:

https://msdn.microsoft.com/en-us/library/xcwwfbb8(v=vs.110).aspx

Developers rarely use this support, mostly because they are not familiar enough with Unicode, and let alone what .NET has to offer for them.

I propose that we introduce a System.Rune that is backed by 32 bit integer and which corresponds to a codePoint and that we surface in C# the equivalent rune type to be an alias to this type.

rune would become the preferred replacement for char and serve as the foundation for proper Unicode and string handling in .NET.

As for why the name rune, the inspiration comes from Go:

https://blog.golang.org/strings

The section "Code points, characters, and runes" provides the explanation, a short version is:

"Code point" is a bit of a mouthful, so Go introduces a shorter term for the concept: rune. The term appears in the libraries and source code, and means exactly the same as "code point", with one interesting addition.

Update I now have an implementation of System.Rune here:

https://github.com/migueldeicaza/NStack/blob/master/NStack/unicode/Rune.cs

With the following API:

public struct Rune {
	
	public Rune (uint rune);
	public Rune (char ch);
	
	public static ValueTuple<Rune,int> DecodeLastRune (byte [] buffer, int end);
	public static ValueTuple<Rune,int> DecodeLastRune (NStack.ustring str, int end);
	public static ValueTuple<Rune,int> DecodeRune (byte [] buffer, int start, int n);
	public static ValueTuple<Rune,int> DecodeRune (NStack.ustring str, int start, int n);
	public static int EncodeRune (Rune rune, byte [] dest, int offset);
	public static bool FullRune (byte [] p);
	public static bool FullRune (NStack.ustring str);
	public static int InvalidIndex (byte [] buffer);
	public static int InvalidIndex (NStack.ustring str);
	public static bool IsControl (Rune rune);
	public static bool IsDigit (Rune rune);
	public static bool IsGraphic (Rune rune);
	public static bool IsLetter (Rune rune);
	public static bool IsLower (Rune rune);
	public static bool IsMark (Rune rune);
	public static bool IsNumber (Rune rune);
	public static bool IsPrint (Rune rune);
	public static bool IsPunctuation (Rune rune);
	public static bool IsSpace (Rune rune);
	public static bool IsSymbol (Rune rune);
	public static bool IsTitle (Rune rune);
	public static bool IsUpper (Rune rune);
	public static int RuneCount (byte [] buffer, int offset, int count);
	public static int RuneCount (NStack.ustring str);
	public static int RuneLen (Rune rune);
	public static Rune SimpleFold (Rune rune);
	public static Rune To (Case toCase, Rune rune);
	public static Rune ToLower (Rune rune);
	public static Rune ToTitle (Rune rune);
	public static Rune ToUpper (Rune rune);
	public static bool Valid (byte [] buffer);
	public static bool Valid (NStack.ustring str);
	public static bool ValidRune (Rune rune);
	public override bool Equals (object obj);
	
	[System.Runtime.ConstrainedExecution.ReliabilityContractAttribute((System.Runtime.ConstrainedExecution.Consistency)3, (System.Runtime.ConstrainedExecution.Cer)2)]
	protected virtual void Finalize ();
	public override int GetHashCode ();
	public Type GetType ();
	protected object MemberwiseClone ();
	public override string ToString ();
	
	public static implicit operator uint (Rune rune);
	public static implicit operator Rune (char ch);
	public static implicit operator Rune (uint value);
	
	public bool IsValid {
		get;
	}
	
	public static Rune Error;
	public static Rune MaxRune;
	public const byte RuneSelf = 128;
	public static Rune ReplacementChar;
	public const int Utf8Max = 4;
	
	public enum Case {
		Upper,
		Lower,
		Title
	}
}

Update Known Issues

  • Some APIs above take a uint, need to take a Rune.
  • Need to implement IComparable family
  • RuneCount/RuneLen need better names, see docs (they should be perhaps Utf8BytesNeeded?)
  • Above, the "ustring" APIs reference my UTF8 API, this is really not part of the API, but we should consider whether there is a gateway to System.String in some of those, or to Utf8String.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-needs-workAPI needs work before it is approved, it is NOT ready for implementationarea-System.Runtimehelp wanted[up-for-grabs] Good issue for external contributors

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions