-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make JSStringToSTLString 23x faster #26955
Conversation
Post scriptum: I can't even divide right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I wonder what was slower - the zero-allocation or the linear scan to get the string length. 😄 Left a small comment but LGTM otherwise. I'll also loop in some folks with more context about this code.
ReactCommon/jsi/JSCRuntime.cpp
Outdated
std::vector<char> buffer(maxBytes); | ||
JSStringGetUTF8CString(str, buffer.data(), maxBytes); | ||
return std::string(buffer.data()); | ||
char* buffer = new char[maxBytes]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use std::make_unique<char[]>
instead of raw new[]
and delete[]
- it has the same effect (allocating without zero-initialising) but is also exception-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very strange. I don't really understand why the previous implementation is slower.
When I read the changed code without looking at the previous version of it, I thought "oh, using std::vector is more ideomatic here and should be as fast as a plain buffer".
I think the most efficient way to implement this function is to use something like folly::small_vector
(or simply use the same approch manually) trying to avoid heap allocation for small strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, look at this:
https://en.cppreference.com/w/cpp/container/vector/vector
and then look at this 3-4) Linear in count
. That's probably because we need to set to 0
every single element.
So, I would suggest rewriting this to:
std::vector<char> buffer;
buffer.reserve(maxBytes);
Awesome work, @radex! 😍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shergin AFAICT it's not legal to use vector::data
after reserve
, and actually resizing the vector runs into the same needless zero-initialisation issue. So I think there's no way to do this with vector
without paying this cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@motiz88 is right, I am wrong. The unique_ptr
looks like the best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As a huge a huge fan of SSO, I would suggest to implement it here as well (e.g. having if the size is smaller than 21 or whatever, use stack-allocated buffer), but that's out of the scope of this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this. I'd structure the code a little differently so there's less repetition and therefore fewer places for future bugs to creep in:
std::array<char, 20> stackBuffer;
std::unique_ptr<char[]> heapBuffer;
char* buffer;
if (maxBytes <= stackBuffer.size()) {
buffer = stackBuffer.data();
} else {
heapBuffer = std::make_unique<char[]>(maxBytes);
buffer = heapBuffer.get();
}
size_t actualBytes = JSStringGetUTF8CString(str, buffer, maxBytes);
return std::string(buffer, actualBytes - 1);
I like @motiz88 's last suggestion. My only suggestion on top of that would be to extract the magic "20" number to an explaining static const variable (or whatever the FB C++ standards prefer that's similar in intent). |
@motiz88 Let's land this (your improved version)? |
@shergin Sure, I'll push my changes in a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@motiz88 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This pull request was successfully merged by @radex in 733532e. When will my fix make it into a release? | Upcoming Releases |
woohoo! Thanks @motiz88 |
Summary: [A recent change to JSStringToSTLString](#26955) causes a crash when the function is invoked with invalid UTF-16 data. The old behaviour, restored here, was to truncate the string before the first invalid character. Here's how [the original code](https://github.com/facebook/react-native/blob/aee88b6843cea63d6aa0b5879ad6ef9da4701846/ReactCommon/jsi/JSCRuntime.cpp#L287) handled this case: ``` std::string JSStringToSTLString(JSStringRef str) { size_t maxBytes = JSStringGetMaximumUTF8CStringSize(str); // ^ maxBytes >= 1 regardless of str's contents std::vector<char> buffer(maxBytes); // ^ vector is zero initialised JSStringGetUTF8CString(str, buffer.data(), maxBytes); // ^ writes '\0' at the first invalid character and returns early (see JSC source code) return std::string(buffer.data()); // ^ copies the string up to the first '\0' } ``` See the JSC implementations of [`JSStringGetUTF8CString`](https://opensource.apple.com/source/JavaScriptCore/JavaScriptCore-7600.8.7/API/JSStringRef.cpp.auto.html) and [`convertUTF16ToUTF8`](https://opensource.apple.com/source/WTF/WTF-7600.7.2/wtf/unicode/UTF8.cpp.auto.html). Based on the fact that `JSStringGetUTF8CString` *always* null-terminates the buffer - even when it bails out of converting an invalid string - here we're able to both 1. keep the fast path (not zero-initialising, not scanning for the null terminator) for the common case when the data is valid and JSStringGetUTF8CString returns a nonzero length; and 2. return the truncated string when JSStringGetUTF8CString returns an error code of 0, by scanning for the null terminator. Changelog: [General] [Fixed] - Fix crash when passing invalid UTF-16 data from JSC into native code Differential Revision: D19902751 fbshipit-source-id: 06bace2719800e921ec115ad6a29251eafd473f6
Summary: In my app I have a case where I need to pass a very large string (45MB) between JS and native. This is obviously suboptimal, but… this is where I'm at. The main bottleneck to doing this turned out to be `jsi`'s `JSStringToSTLString()`, which was extremely slow. In my case, 4.7s to execute. After this change, 204ms. I don't really know C++, so I'm not sure this code is 100% correct and safe, and I bet it could be done even better by avoiding the extra memory allocation (would shave off another 70ms). ## Changelog [General] [Changed] - Make JSStringToSTLString 23x faster Pull Request resolved: facebook#26955 Reviewed By: shergin Differential Revision: D19578728 Pulled By: motiz88 fbshipit-source-id: 2fbce83166953ce928f0a6aa36eed710bfe05383
Summary: [A recent change to JSStringToSTLString](facebook#26955) causes a crash when the function is invoked with invalid UTF-16 data. The old behaviour, restored here, was to truncate the string before the first invalid character. Here's how [the original code](https://github.com/facebook/react-native/blob/aee88b6843cea63d6aa0b5879ad6ef9da4701846/ReactCommon/jsi/JSCRuntime.cpp#L287) handled this case: ``` std::string JSStringToSTLString(JSStringRef str) { size_t maxBytes = JSStringGetMaximumUTF8CStringSize(str); // ^ maxBytes >= 1 regardless of str's contents std::vector<char> buffer(maxBytes); // ^ vector is zero initialised JSStringGetUTF8CString(str, buffer.data(), maxBytes); // ^ writes '\0' at the first invalid character and returns early (see JSC source code) return std::string(buffer.data()); // ^ copies the string up to the first '\0' } ``` See the JSC implementations of [`JSStringGetUTF8CString`](https://opensource.apple.com/source/JavaScriptCore/JavaScriptCore-7600.8.7/API/JSStringRef.cpp.auto.html) and [`convertUTF16ToUTF8`](https://opensource.apple.com/source/WTF/WTF-7600.7.2/wtf/unicode/UTF8.cpp.auto.html). Based on the fact that `JSStringGetUTF8CString` *always* null-terminates the buffer - even when it bails out of converting an invalid string - here we're able to both 1. keep the fast path (not zero-initialising, not scanning for the null terminator) for the common case when the data is valid and JSStringGetUTF8CString returns a nonzero length; and 2. return the truncated string when JSStringGetUTF8CString returns an error code of 0, by scanning for the null terminator. Changelog: [General] [Fixed] - Fix crash when passing invalid UTF-16 data from JSC into native code Differential Revision: D19902751 fbshipit-source-id: 06bace2719800e921ec115ad6a29251eafd473f6
Summary
In my app I have a case where I need to pass a very large string (45MB) between JS and native. This is obviously suboptimal, but… this is where I'm at.
The main bottleneck to doing this turned out to be
jsi
'sJSStringToSTLString()
, which was extremely slow. In my case, 4.7s to execute. After this change, 204ms.I don't really know C++, so I'm not sure this code is 100% correct and safe, and I bet it could be done even better by avoiding the extra memory allocation (would shave off another 70ms).
Changelog
[General] [Changed] - Make JSStringToSTLString 23x faster
Test Plan