Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port/strings: Use platform agnostic types for strings #557

Closed
wants to merge 46 commits into from

Conversation

Yangff
Copy link
Contributor

@Yangff Yangff commented Jun 11, 2024

Description

There are some ordering issues with this PR, some commits that should have been at the end, ran to the front after I split the commits ...... But I just don't have the energy to fix them. My suggestion is to squash them all when merging after review to prevent weird issues in the future.

This PR is part of Linux porting, and it makes conservative changes to the UE4SS string types.
These string types have been replaced with purpose-driven aliases, such as UEStringType, SystemStringType, etc.
Theoretically, all of these types should be unchanged on Windows platforms from before the change, meaning that what was a wstring is now essentially still a wstring.

Note the change to find_lua_mod_by_name, where I've continued to use template<class S> instead of the form in main.

All newly added types are aliases, so be sure not to overload with them when using them, e.g. func(UEStringType a) and func(SystemStringType b) will not be defined successfully on Windows because they are both wstring.

Also some changes have been made to RC::File, which now forces the use of UTF-8 encoding and should handle the BOM correctly.This part of the change may be breaking, as the original code seemed to treat it as UTF-16.But from what I've seen so far, UE4SS is actually using encodings that are UTF-8, and wrapping them incorrectly Therefore, these changes should have no real impact. These changes may be related to #556
The related commit is 2cee23a

The old APIs, aka write_string_to_file() and read_all() will keep using UEString, that is UTF16 and use/return converted UTF16 strings. And functions like read_file_all() and write_file_string_to_file() will instead using UTF8. This is to preserve backward compability as much as possible.

Some changes are to separate platform-dependent code, they were moved to the Platform directory because they were affected by both linux porting and string types, and it didn't make much sense to create a dedicated copy (since it would eventually have to be moved to its own directory) and create additional headaches for rebase.

And, as part of this PR, https://github.com/Re-UE4SS/UEPseudo/pull/94 is reuqired to use it.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Is/requires documentation update

How Has This Been Tested?

I compiled locally and tested on palworld.
Some rebase error just fixed.
I had to disable cppmods because I couldn't link it correctly, and I'm not sure what caused this.

Checklist

Please delete options that are not relevant. Update the list as the PR progresses.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • I have added the necessary description of this PR to the changelog, and I have followed the same format as other entries.
  • Any dependent changes have been merged and published in downstream modules.

Screenshots (if appropriate)

Additional context

Add any other context about the pull request here.

Yangff added 30 commits June 10, 2024 20:21
Define prupose driven string types in Macros.hpp

Use IOSTR/File::StringType for IO purpose, it's pinned to utf8/string type
Use SYSSTR/SystemStringType for interacting with STL, it's the prefer type
based on platform; On windows this is wchar_t/wstring and on Linux
this is char/string
Use STR/UEStringType for interacting with Unreal Engine, it's utf16.
On Windows we uses wchar_t/wstring to avoid extra copy/casting from u16string
On Linux we uses u16string which cannot be used in many STL functions.

With this patch, the only changes on string type on Windows should just be the
type aliases, and almost all actual types should stay unchanged.

For the IO, write_string_to_file will keep receiving utf16, aka wchar_t on windows.
A new write_file_string_to_file is added to allow receiving of utf8 string

A new WinFile using WinAPI is provided and to avoid the buggy BOM detection on the old code.
It also avoids the use of wstring to handle UTF-8 chars.
A new sets of to_xxxx/[_string] functions is provided

* to_ue/to_system/to_file can easily turn a string-like type to its new type
For a string view or XXchar* that doesn't need to be encoded for conversion,
these functions return it as is, so its return type may be the corresponding
string or string view (see can_be_string_view_t)
For a different underlying type of string, they create a new string
to hold the converted result

* to_ue_string/to_system_string/to_file_string will force
the corresponding string type to be returned,
which is appropriate if your target can't accept a string view.
@Yangff
Copy link
Contributor Author

Yangff commented Jun 11, 2024

not sure what Any dependent changes have been merged and published in downstream modules. actually means and please let me know where should I document these changes?

@UE4SS
Copy link
Collaborator

UE4SS commented Jun 11, 2024

not sure what Any dependent changes have been merged and published in downstream modules. actually means

It means any changes in other repos that this PR requires.
In this case, that would be the equivalent PR in the UEPseudo repo.

@UE4SS
Copy link
Collaborator

UE4SS commented Jun 11, 2024

To maintainers/reviewers: Do not forget to run UVTD and use the newly generated files when building UE4SS before you do any actual game testing.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 11, 2024

For UVTD a known problem from my side is this
https://github.com/Re-UE4SS/UEPseudo/pull/94/files#diff-1f94e8bef92eeb702ce4ee4fe8b87dff171b1f07191967444601eaf828f9c38a

I'm not sure why but I have to do this for clang to work (and since there is also a type change around it somehow get into my PR..)

This change it not included in the UVTD as I'm not sure how to make it to generate code like this.. given this is the only enum that needs the workaround. all other enums compile without any problem.

But anyway this is not a problem for windows.

@UE4SS
Copy link
Collaborator

UE4SS commented Jun 11, 2024

For UVTD a known problem from my side is this https://github.com/Re-UE4SS/UEPseudo/pull/94/files#diff-1f94e8bef92eeb702ce4ee4fe8b87dff171b1f07191967444601eaf828f9c38a

I'm not sure why but I have to do this for clang to work (and since there is also a type change around it somehow get into my PR..)

This change it not included in the UVTD as I'm not sure how to make it to generate code like this.. given this is the only enum that needs the workaround. all other enums compile without any problem.

But anyway this is not a problem for windows.

Yeah it's not actually allowed to forward declare enums without supplying the type.
I'm not sure what the best solution to this is that doesn't involve making UVTD generate definitions for these enums so that they don't need to be forward declared, which sounds like it would be annoying to implement.

It does appear that the generated files in this PR contains this workaround which you said you haven't updated UVTD to do in this PR.
If this is the case, that needs to change.
We shouldn't be making changes to the generated files unless UVTD can reproduce those exact changes from the repp.
This is because we regenerate these files every now and then which means that any changes that's been made to them that UVTD can't make will get overwritten.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 11, 2024

hmm.. for this PR I will rollback the commit in the uepseudo so it matches the UVTD output. Not sure if this is the best solutuon but I think use a FORWARD_DECL(enum EAspectRatioAxisConstraint) macro and generate enum EAspectRatioAxisConstraint: uint8_t on clang should be okay..? And find a place to define that macro..

@Archengius
Copy link

I'm sorry to be ignorant and probably rude, but what is the point of building so many pointless abstractions that even UE itself does not have?
TCHAR is always an alias to WIDECHAR, which is an alias to wchar_t, which is a platform-specific type with variable length and encoding.
Since UE4SS works with UE strings 99% of the time, the relevant string type to use is std::wstring in all supported platforms, including Linux and Windows.

Therefore, I do not see the need to have so many different abstractions and conversions across different string representations when the most common case is working with UE strings, and to avoid huge amount of overhead and complexity, UE4SS should use the same string type as UE itself does for the platform it is running on, which is always wchar_t.

The only problem that UE4SS could potentially have right now is making assumptions about the internal encoding of wide strings. As long as we can clean up the code that does this, and instead convert wide strings to UTF8 in case they need to be handled by the external tools or written to external storage, we should be completely fine with sticking to std::wstring on all platforms as long as UE itself keeps doing the same.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 12, 2024

many pointless abstractions that even UE itself does not have?

This is part of the linux porting and the reason for the abstraction is unlike UE which is a pure UTF16 envieonment with own standard library, this project is using STL which has different string type on Windows and Linux.

For example, wchar_t on windows is UTF-16 but it becomes UTF32 on Linux.

Now, STL does provide a u16string which is UTF16 across all platforms. However, that type cannot be used for many STL functions like std::format which UE4SS heavily relies on.

That is why we need UEStringType and SystemStringType. On Windows there is no different since both are just std::wstring, on Linux, the UEStringType would become std::u16string to hold the UTF16 string from UE while SystemStringType will become std::string to hold string to interact with STL.

Another solution can be move to use fmt for u16 string (or maybe in the future stl finally get support of utf16), but because u16string uses chat16_t on Windows which cannot be used to interact with Windows APIs and etc, will be more annoying to fix. Plus, the completely lacking of utf16 handling C functions in Linux (sscanf/ssprintf for example).

@Archengius
Copy link

It feels to me that in that case, rewriting these interaction points would be easier and take less effort than introducing completely new abstractions that cause ripples across the entire codebase and affect much more code than just one using std::format or C standard library functions.
We could create our own functions that convert to UTF8 in-place, perform the necessary operation, and convert back, or bring a third party library that handles wide strings.

Either way, there is much more code interacting with UE than interacting with a few functions that do not support wide strings, so I think from both the scope and the performance perspective it would be better to keep using wide strings and do conversions where it is unavoidable.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 12, 2024

Either way, there is much more code interacting with UE than interacting with a few functions that do not support wide strings

No, there are more SystemString than UEString
image
image

And when you take account that at least half of the UEString are UVTD generated code, this changes make more sense.

image

This makes sense, the only reason UE4SS wants UEString is on a very few points from lua and cppmod to register hook point and compare the name. And FText/FString handling which is just wrapping of tchar* and all string handling is done by lua which need a convert anyway. Anohter point is the SDK generator but you need string code convert there anyway because it will invlove std::format and etc..

I tried both ways and I think this is better and less invasive change to existing code. Because all you need to do is change type, and no change to the logic.

And it causes 0 overhead for windows as all types remain the same, to_ue and to_system for string view is doing nothing for windows unless it's meant to create a new string..

@narknon
Copy link
Collaborator

narknon commented Jun 12, 2024

I added back fmt lib in a branch. Would that simplify anything if it's merged?

@Yangff
Copy link
Contributor Author

Yangff commented Jun 12, 2024

I added back fmt lib in a branch. Would that simplify anything if it's merged?

It might simplify a bit but I'm not sure if it can get rid of the use of SystemString completely.. (if I recall correctly last time I give up the fmt + u16string only is because it won't simplify much while introducing unnecessary charset convert.. so I decide just to make all types clear based on its purpose) But that is almost redo the entire porting.. and it's anyway going to change a lof of code in both side..

@Yangff
Copy link
Contributor Author

Yangff commented Jun 12, 2024

So.. It might reduce the use of many SYSSTR though.. but the result is all such output will need an extra charset convert after formatting..

@Yangff
Copy link
Contributor Author

Yangff commented Jun 13, 2024

I can't be sure for now but an easy way to estimate that is to change all wstring to u16string on Windows.. after adopt to fmtlib..

@narknon
Copy link
Collaborator

narknon commented Jun 13, 2024

Changing StringType typedef to be std::basic_string should be enough to resolve all UEStrings I would think?

@UE4SS
Copy link
Collaborator

UE4SS commented Jun 13, 2024

Changing StringType typedef to be std::basic_string should be enough to resolve all UEStrings I would think?

Not sure I follow you, std::basic_string is a template, and wstring is basic_string<wchar_t>, while string is basic_string<char>, and u16string is basic_string<char16_t>.
So StringType is currently basic_string<wchar_t>.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 13, 2024

Changing StringType typedef to be std::basic_string should be enough to resolve all UEStrings I would think?

There are also some places where it should be StringType while wstring is used in the code. (on the other hand, some code are meant to be wstring for using wchar_t* with c_str() say windows api and explode_by_occurrence etc.. )

For other use of UEString, it should be covered by changing it to u16string.

@narknon
Copy link
Collaborator

narknon commented Jun 13, 2024

I'd also prefer to use UE's macros for any conversions that we can.

I.e.,

#define TCHAR_TO_ANSI(str)
#define ANSI_TO_TCHAR(str)
#define TCHAR_TO_UTF8(str)
#define UTF8_TO_TCHAR(str)

@Yangff
Copy link
Contributor Author

Yangff commented Jun 13, 2024

I'd also prefer to use UE's macros for any conversions that we can.

I.e.,

#define TCHAR_TO_ANSI(str) #define ANSI_TO_TCHAR(str) #define TCHAR_TO_UTF8(str) #define UTF8_TO_TCHAR(str)

it's not even the way used in current code. say on the lua/ue interface..

And I don't know where it's implemented in UE but it can't do TCHAR_TO_SYSTEM, so it's useless.

Unless you want to have many #ifdef WIN32 for not doing convert and #ifdef LINUX for calling TCHAR_TO_UTF8 for example.. now all you need is to_system_string(x).c_str() I can't see why this isn't better.. in the end you shouldn't care what endoing is within the type.. for most code you can just auto s = to_ue(something you don't know) and you can use it with ue..

@Yangff
Copy link
Contributor Author

Yangff commented Jun 13, 2024

The underlaying converting maybe change to UE functions in StringConv.h though but it seems to make Helper depends on UEPseudo..

@narknon
Copy link
Collaborator

narknon commented Jun 13, 2024

The underlaying code convert maybe change to UE functions in StringConv.h though but it seems to make Helper depends on UEPseudo..

Yeah, I was playing around with getting UE to not rely on File earlier. It wouldn't be that difficult to move all the string macros out of File with just a couple edits.

@Yangff
Copy link
Contributor Author

Yangff commented Jun 13, 2024

oh and beyond fmt is those printfs scanfs will need rewrite to other functions as they don't have utf16 version on linux..

@Yangff Yangff closed this Jun 18, 2024
@UE4SS UE4SS mentioned this pull request Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants