Skip to content

Conversation

@ELDment
Copy link
Contributor

@ELDment ELDment commented Sep 9, 2025

Through Issue #1014, I discovered this critical problem

Initially, I suspected that the root cause of the above issue was related to string encoding problems

However, as our debugging deepened, I pinpointed that this problem actually occurs in

template <typename T> inline void SetResult(const T& value)
{
auto functionData = (uint64_t*)m_result;
if (sizeof(T) < ArgumentSize)
{
*reinterpret_cast<uint64_t*>(&functionData[0]) = 0;
}
*reinterpret_cast<T*>(&functionData[0]) = value;
m_numResults = 1;
m_numArguments = 0;
}

As an example:

static void PbReadString(ScriptContext& scriptContext)
{
GET_MESSAGE_OR_ERR();
GET_FIELD_NAME_OR_ERR();
std::string returnValue;
auto index = scriptContext.GetArgument<int>(2);
if (index < 0)
{
if (!message->GetString(fieldName, returnValue))
{
scriptContext.ThrowNativeError("Invalid field \"%s\" for message \"%s\"", fieldName,
message->GetProtobufMessage()->GetTypeName().c_str());
return;
}
}
else
{
if (!message->GetRepeatedString(fieldName, index, returnValue))
{
scriptContext.ThrowNativeError("Invalid field \"%s\"[%d] for message \"%s\"", fieldName, index,
message->GetProtobufMessage()->GetTypeName().c_str());
return;
}
}
scriptContext.SetResult(returnValue.c_str());
}

  1. We declare std::string returnValue inside the function
  2. Finally call scriptContext.SetResult(returnValue.c_str()) to pass the string to the managed layer
  3. !!!Fatal Issue!!! When the PbReadString function ends, returnValue is automatically destroyed
  4. At this point, returnValue.c_str() becomes an invalid pointer
  5. When managed layer attempts to parse this invalid pointer, errors occur (manifested as parsing garbled text)

So, I implemented a thread-local string pool in the SetResult template to ensure string lifetime extends beyond function scope, preventing dangling pointer issues in native-managed interop

(...)
            // Create string copies to ensure persistent memory lifetime
            static thread_local std::vector<std::string> stringPool;
            static thread_local size_t poolIndex = 0;

            if (stringPool.size() < 16)
            {
                stringPool.emplace_back();
            }
(...)

Regarding the observation mentioned in #1014: "When a player sends a short message containing Chinese characters (e.g., up to 5 characters, which is approx. 15 bytes in UTF-8), the message is read correctly by um.ReadString("param2")"

GUESS:

  • Short Strings (≤15 bytes): Utilize Small String Optimization (SSO), where string data is stored within the std::string object's internal buffer on the stack. Upon object destruction, the stack frame is marked as available for reuse but the actual memory content remains unchanged until overwritten by subsequent function calls. This creates a probabilistic window where the string data may still be accessible, depending on call stack reuse patterns and compiler optimizations
// SSO Case
std::string str = "hello";
char* ptr = str.c_str();    // Points to internal array on stack
  • Long Strings (>15 bytes): Allocate dynamic memory on the heap via malloc/new. The std::string object on the stack contains only a pointer to this heap-allocated buffer. Upon destruction, the string's destructor immediately calls free/delete on the heap memory, making it available to the memory allocator and potentially overwriting the content immediately. This eliminates any timing window for accidental data recovery
// Heap Case
std::string str = "long_string...";
char* ptr = str.c_str();    // Points to heap-allocated memory

Testing confirms the expected behavior
TEST

#1014

Expected Behavior

When a player sends the message 这是一个超过五个汉字的测试消息, the chat box should display the modified message correctly:
[Diagnosis] 这是一个超过五个汉字的测试消息

Actual Behavior

The chat box displays a corrupted string where the latter part of the message is replaced with replacement characters (�) or other garbled text:
[Diagnosis] ������

@ELDment ELDment requested a review from roflmuffin as a code owner September 9, 2025 05:16
@roflmuffin
Copy link
Owner

roflmuffin commented Sep 9, 2025

This seems sane to me. Most issues of this nature are somewhat mitigated because the c# side will take a copy of the string at the return address immediately after invoking the method, so for some semi-transient things in the game engine this is generally enough, but not for immediately freed strings like this one.

For that reason however, the string pool may actually be overkill, because we know that the consumer of the result will take a copy on the managed side, we can probably just keep a single copy of the latest result, and free it on each subsequent GetResult()

@ELDment
Copy link
Contributor Author

ELDment commented Sep 9, 2025

This seems sane to me. Most issues of this nature are somewhat mitigated because the c# side will take a copy of the string at the return address immediately after invoking the method, so for some semi-transient things in the game engine this is generally enough, but not for immediately freed strings like this one.

For that reason however, the string pool may actually be overkill, because we know that the consumer of the result will take a copy on the managed side, we can probably just keep a single copy of the latest result, and free it on each subsequent GetResult()

I understand your suggestion about holding a single string copy

However, regarding "free it on each subsequent GetResult()", I believe this may not be necessary

When the string template SetResult is called again, lastResult holds a new string copy, and the old string content is immediately freed by C++ during the assignment operation


@roflmuffin
Do you want to minimize memory footprint in the C++ layer? If you think manual memory management is required, I'm willing to follow your recommendation and commit the changes

@roflmuffin
Copy link
Owner

This seems sane to me. Most issues of this nature are somewhat mitigated because the c# side will take a copy of the string at the return address immediately after invoking the method, so for some semi-transient things in the game engine this is generally enough, but not for immediately freed strings like this one.

For that reason however, the string pool may actually be overkill, because we know that the consumer of the result will take a copy on the managed side, we can probably just keep a single copy of the latest result, and free it on each subsequent GetResult()

I understand your suggestion about holding a single string copy

However, regarding "free it on each subsequent GetResult()", I believe this may not be necessary

When the string template SetResult is called again, lastResult holds a new string copy, and the old string content is immediately freed by C++ during the assignment operation


@roflmuffin
Do you want to minimize memory footprint in the C++ layer? If you think manual memory management is required, I'm willing to follow your recommendation and commit the changes

Yes my apologies I didn't mean manually free, but freed by reassigning the last result. I think it would be good if you could implement that and retest if you could.

@ELDment
Copy link
Contributor Author

ELDment commented Sep 9, 2025

This seems sane to me. Most issues of this nature are somewhat mitigated because the c# side will take a copy of the string at the return address immediately after invoking the method, so for some semi-transient things in the game engine this is generally enough, but not for immediately freed strings like this one.
For that reason however, the string pool may actually be overkill, because we know that the consumer of the result will take a copy on the managed side, we can probably just keep a single copy of the latest result, and free it on each subsequent GetResult()

I understand your suggestion about holding a single string copy
However, regarding "free it on each subsequent GetResult()", I believe this may not be necessary
When the string template SetResult is called again, lastResult holds a new string copy, and the old string content is immediately freed by C++ during the assignment operation

@roflmuffin
Do you want to minimize memory footprint in the C++ layer? If you think manual memory management is required, I'm willing to follow your recommendation and commit the changes

Yes my apologies I didn't mean manually free, but freed by reassigning the last result. I think it would be good if you could implement that and retest if you could.

https://stackoverflow.com/questions/13873544/deallocation-of-memory-allocated-in-stdstring-in-c

I believe that when using STL-allocated strings, we do not need to concern ourselves with memory deallocation. From my perspective, C++ automatically manages their memory through established RAII principles. I have written test code to demonstrate this behavior

static thread_local std::string lastResult;

const char* oldPtr = lastResult.empty() ? nullptr : lastResult.c_str();
size_t oldCapacity = lastResult.capacity();
std::string oldContent = lastResult; // Copy for logging

printf("[BEFORE ASSIGNMENT] String: '%s' | Ptr: %p | Capacity: %zu\n", oldContent.c_str(), oldPtr, oldCapacity);

// Assignment operator automatically frees previous string content
// and allocates new memory for the incoming string
lastResult = value; // RAII: old memory freed, new memory allocated

const char* newPtr = lastResult.c_str();
size_t newCapacity = lastResult.capacity();

printf("[AFTER ASSIGNMENT] String: '%s' | Ptr: %p | Capacity: %zu\n", lastResult.c_str(), newPtr, newCapacity);
[BEFORE ASSIGNMENT] String: 'world' | Ptr: 000005C811FDB4C0 | Capacity: 63
[AFTER ASSIGNMENT] String: 'hello' | Ptr: 000005C811FDB4C0 | Capacity: 63

The test results demonstrate that the new string directly overwrites the old string data. This behavior indicates sophisticated memory optimization within the STL implementation

@ELDment
Copy link
Contributor Author

ELDment commented Sep 9, 2025

Actually, I did try what you suggested: releasing the previous lastResult copy after assigning a new one. It was through this attempt that I discovered lastResult doesn't seem to need manual management

the STL handles all of these operations transparently

lol, Is there anything else you'd like me to confirm before merging? @roflmuffin

@roflmuffin roflmuffin merged commit 44922da into roflmuffin:main Sep 9, 2025
6 checks passed
@roflmuffin
Copy link
Owner

LGTM, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants