Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JSStringToSTLString 23x faster #26955

Closed
wants to merge 3 commits into from
Closed

Conversation

radex
Copy link
Contributor

@radex radex commented Oct 22, 2019

Summary

In my app I have a case where I need to pass a very large string (45MB) between JS and native. This is obviously suboptimal, but… this is where I'm at.

The main bottleneck to doing this turned out to be jsi's JSStringToSTLString(), which was extremely slow. In my case, 4.7s to execute. After this change, 204ms.

I don't really know C++, so I'm not sure this code is 100% correct and safe, and I bet it could be done even better by avoiding the extra memory allocation (would shave off another 70ms).

Changelog

[General] [Changed] - Make JSStringToSTLString 23x faster

Test Plan

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 22, 2019
@kelset kelset requested a review from ericlewis October 22, 2019 12:19
@radex radex changed the title Make JSStringToSTLString 44x faster Make JSStringToSTLString 23x faster Oct 22, 2019
@radex
Copy link
Contributor Author

radex commented Oct 22, 2019

Post scriptum: I can't even divide right

Copy link
Contributor

@motiz88 motiz88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I wonder what was slower - the zero-allocation or the linear scan to get the string length. 😄 Left a small comment but LGTM otherwise. I'll also loop in some folks with more context about this code.

std::vector<char> buffer(maxBytes);
JSStringGetUTF8CString(str, buffer.data(), maxBytes);
return std::string(buffer.data());
char* buffer = new char[maxBytes];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use std::make_unique<char[]> instead of raw new[] and delete[] - it has the same effect (allocating without zero-initialising) but is also exception-safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very strange. I don't really understand why the previous implementation is slower.
When I read the changed code without looking at the previous version of it, I thought "oh, using std::vector is more ideomatic here and should be as fast as a plain buffer".

I think the most efficient way to implement this function is to use something like folly::small_vector (or simply use the same approch manually) trying to avoid heap allocation for small strings.

Copy link
Contributor

@shergin shergin Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my, look at this:
https://en.cppreference.com/w/cpp/container/vector/vector
and then look at this 3-4) Linear in count. That's probably because we need to set to 0 every single element.

So, I would suggest rewriting this to:

std::vector<char> buffer;
buffer.reserve(maxBytes);

Awesome work, @radex! 😍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shergin AFAICT it's not legal to use vector::data after reserve, and actually resizing the vector runs into the same needless zero-initialisation issue. So I think there's no way to do this with vector without paying this cost.

Copy link
Contributor

@shergin shergin Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@motiz88 is right, I am wrong. The unique_ptr looks like the best option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As a huge a huge fan of SSO, I would suggest to implement it here as well (e.g. having if the size is smaller than 21 or whatever, use stack-allocated buffer), but that's out of the scope of this PR.)

@radex
Copy link
Contributor Author

radex commented Oct 24, 2019

Thanks @shergin & @motiz88 for pushing me to try to learn a little bit of C++ ;)

I hope I didn't mess anything up here -- I changed raw pointers to unique_ptr, and also added stack allocated buffer for small strings

Copy link
Contributor

@motiz88 motiz88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this. I'd structure the code a little differently so there's less repetition and therefore fewer places for future bugs to creep in:

std::array<char, 20> stackBuffer;
std::unique_ptr<char[]> heapBuffer;
char* buffer;
if (maxBytes <= stackBuffer.size()) {
  buffer = stackBuffer.data();
} else {
  heapBuffer = std::make_unique<char[]>(maxBytes);
  buffer = heapBuffer.get();
}
size_t actualBytes = JSStringGetUTF8CString(str, buffer, maxBytes);
return std::string(buffer, actualBytes - 1);

@matthargett
Copy link
Contributor

I like @motiz88 's last suggestion. My only suggestion on top of that would be to extract the magic "20" number to an explaining static const variable (or whatever the FB C++ standards prefer that's similar in intent).

@shergin
Copy link
Contributor

shergin commented Jan 27, 2020

@motiz88 Let's land this (your improved version)?

@motiz88
Copy link
Contributor

motiz88 commented Jan 27, 2020

@shergin Sure, I'll push my changes in a bit.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@motiz88 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@react-native-bot
Copy link
Collaborator

This pull request was successfully merged by @radex in 733532e.

When will my fix make it into a release? | Upcoming Releases

@react-native-bot react-native-bot added the Merged This PR has been merged. label Jan 28, 2020
@radex radex deleted the patch-1 branch January 28, 2020 19:01
@radex
Copy link
Contributor Author

radex commented Jan 28, 2020

woohoo! Thanks @motiz88

facebook-github-bot pushed a commit that referenced this pull request Feb 14, 2020
Summary:
[A recent change to JSStringToSTLString](#26955) causes a crash when the function is invoked with invalid UTF-16 data. The old behaviour, restored here, was to truncate the string before the first invalid character.

Here's how [the original code](https://github.com/facebook/react-native/blob/aee88b6843cea63d6aa0b5879ad6ef9da4701846/ReactCommon/jsi/JSCRuntime.cpp#L287) handled this case:
```
std::string JSStringToSTLString(JSStringRef str) {
  size_t maxBytes = JSStringGetMaximumUTF8CStringSize(str);
  // ^ maxBytes >= 1 regardless of str's contents
  std::vector<char> buffer(maxBytes);
  // ^ vector is zero initialised
  JSStringGetUTF8CString(str, buffer.data(), maxBytes);
  // ^ writes '\0' at the first invalid character and returns early (see JSC source code)
  return std::string(buffer.data());
  // ^ copies the string up to the first '\0'
}
```

See the JSC implementations of [`JSStringGetUTF8CString`](https://opensource.apple.com/source/JavaScriptCore/JavaScriptCore-7600.8.7/API/JSStringRef.cpp.auto.html) and [`convertUTF16ToUTF8`](https://opensource.apple.com/source/WTF/WTF-7600.7.2/wtf/unicode/UTF8.cpp.auto.html).

Based on the fact that `JSStringGetUTF8CString` *always* null-terminates the buffer - even when it bails out of converting an invalid string - here we're able to both

1. keep the fast path (not zero-initialising, not scanning for the null terminator) for the common case when the data is valid and JSStringGetUTF8CString returns a nonzero length; and
2. return the truncated string when JSStringGetUTF8CString returns an error code of 0, by scanning for the null terminator.

Changelog: [General] [Fixed] - Fix crash when passing invalid UTF-16 data from JSC into native code

Differential Revision: D19902751

fbshipit-source-id: 06bace2719800e921ec115ad6a29251eafd473f6
osdnk pushed a commit to osdnk/react-native that referenced this pull request Mar 9, 2020
Summary:
In my app I have a case where I need to pass a very large string (45MB) between JS and native. This is obviously suboptimal, but… this is where I'm at.

The main bottleneck to doing this turned out to be `jsi`'s `JSStringToSTLString()`, which was extremely slow. In my case, 4.7s to execute. After this change, 204ms.

I don't really know C++, so I'm not sure this code is 100% correct and safe, and I bet it could be done even better by avoiding the extra memory allocation (would shave off another 70ms).

## Changelog

[General] [Changed] - Make JSStringToSTLString 23x faster
Pull Request resolved: facebook#26955

Reviewed By: shergin

Differential Revision: D19578728

Pulled By: motiz88

fbshipit-source-id: 2fbce83166953ce928f0a6aa36eed710bfe05383
osdnk pushed a commit to osdnk/react-native that referenced this pull request Mar 9, 2020
Summary:
[A recent change to JSStringToSTLString](facebook#26955) causes a crash when the function is invoked with invalid UTF-16 data. The old behaviour, restored here, was to truncate the string before the first invalid character.

Here's how [the original code](https://github.com/facebook/react-native/blob/aee88b6843cea63d6aa0b5879ad6ef9da4701846/ReactCommon/jsi/JSCRuntime.cpp#L287) handled this case:
```
std::string JSStringToSTLString(JSStringRef str) {
  size_t maxBytes = JSStringGetMaximumUTF8CStringSize(str);
  // ^ maxBytes >= 1 regardless of str's contents
  std::vector<char> buffer(maxBytes);
  // ^ vector is zero initialised
  JSStringGetUTF8CString(str, buffer.data(), maxBytes);
  // ^ writes '\0' at the first invalid character and returns early (see JSC source code)
  return std::string(buffer.data());
  // ^ copies the string up to the first '\0'
}
```

See the JSC implementations of [`JSStringGetUTF8CString`](https://opensource.apple.com/source/JavaScriptCore/JavaScriptCore-7600.8.7/API/JSStringRef.cpp.auto.html) and [`convertUTF16ToUTF8`](https://opensource.apple.com/source/WTF/WTF-7600.7.2/wtf/unicode/UTF8.cpp.auto.html).

Based on the fact that `JSStringGetUTF8CString` *always* null-terminates the buffer - even when it bails out of converting an invalid string - here we're able to both

1. keep the fast path (not zero-initialising, not scanning for the null terminator) for the common case when the data is valid and JSStringGetUTF8CString returns a nonzero length; and
2. return the truncated string when JSStringGetUTF8CString returns an error code of 0, by scanning for the null terminator.

Changelog: [General] [Fixed] - Fix crash when passing invalid UTF-16 data from JSC into native code

Differential Revision: D19902751

fbshipit-source-id: 06bace2719800e921ec115ad6a29251eafd473f6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged This PR has been merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants