Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Platform-aware Marshal.StringToHGlobalUni equivalent #62475

Closed
Dennis-Petrov opened this issue Dec 7, 2021 · 1 comment · Fixed by dotnet/dotnet-api-docs#7566
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-Interop-coreclr untriaged New issue has not been triaged by the area owner

Comments

@Dennis-Petrov
Copy link

Dennis-Petrov commented Dec 7, 2021

Background and motivation

As for now, Marshal.StringToHGlobalUni internally calls this:

[System.Security.SecurityCritical]  // auto-generated
internal static unsafe void wstrcpy(char *dmem, char *smem, int charCount)
{
    Buffer.Memcpy((byte*)dmem, (byte*)smem, charCount * 2); // 2 used everywhere instead of sizeof(char)
}

Thus, we have hardcoded UTF-16 here, which works as expected on Windows, but fails on Linux.
In cases of P/Invoke with LPCWSTRs inside structures like this, native code expects UTF-32:

typedef struct _SOME_PARA {
   DWORD dwSize;
   LPCWSTR wszSomeStr;
 } SOME_PARA,
   *PSOME_PARA;

This is unobvious. Also, there are no such details in docs too. One converts string to Unicode, but what Unicode exactly?

API Proposal

public static IntPtr StringToHGlobalPUni(string source)
{
    // use PAL to obtain proper Unicode encoding
}

API Usage

var s = "String to p/invoke";
var sPtr = Marshal.StringToHGlobalPUni(s);
try
{
    var somePara = new SOME_PARA
    {
        dwSize = Marshal.SizeOf<SOME_PARA>(),
        wszSomeStr = sPtr 
    };
    
    // pass somePara to native code    
}
finally
{
    Marshal.FreeHGlobal(sPtr);
}

Alternative Designs

Probably, this will require some changes inside default string marshalling.
I mean, one can use proposed API with this struct:

    [StructLayout(LayoutKind.Sequential)]
    internal struct SOME_PARA
    {
        public uint dwSize;

        public IntPtr wszSomeStr;
    }

but in case of this declaration:

    [StructLayout(LayoutKind.Sequential)]
    internal struct SOME_PARA
    {
        public uint dwSize;

        public string wszSomeStr;
    }

we'll still get UTF-16 because of default marshalling settings.

Risks

No response

@Dennis-Petrov Dennis-Petrov added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 7, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-Interop-coreclr untriaged New issue has not been triaged by the area owner labels Dec 7, 2021
@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Jan 3, 2022

@Dennis-Petrov Thanks for filing this issue. .NET uses the term "Unicode" in ambiguous ways, especially in interop scenarios. There are two examples that illustrate this, UnicodeEncoding and CharSet.Unicode. Both document "Unicode" as two-bytes in size. The documentation for the APIs in question should likely be updated to indicate two-bytes as well. I'm not convinced we should be spending the effort to update them to be honest.

For the case above, the .NET team has been working on source generation for struct marshalling – see StructMarshalling.md. We have an issue that is tracking some encoding concerns too – #61326.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-Interop-coreclr untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants