-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update codebase from boost::string_view into std::string_view #4509
Conversation
If merged, this commit will modify the existing usage of boost::string_view into std::string_view. Due to differences in API between boost and std (existence of a clear() member function), I have resorted to abusing the swap() function. Type requirements of jss::<> responses have mandated expensive std::string conversions.
@@ -39,13 +39,14 @@ doPing(RPC::JsonContext& context) | |||
break; | |||
case Role::IDENTIFIED: | |||
ret[jss::role] = "identified"; | |||
ret[jss::username] = context.headers.user.to_string(); | |||
// Can I avoid making expensive copies for std::string? | |||
ret[jss::username] = std::string{context.headers.user}; | |||
if (context.headers.forwardedFor.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottschurr I couldn't think of a cheaper solution for this piece of code.
std::string{context.headers.user.data()}
seems to include garbage trailing data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this level of the source code I'm not seeing boost::string_view
used here. Naively, it seems like you can just leave this alone? Am I missing something?
Remember, there are lots of places that std::string_view
is not safe to use. We should not try to replace uses of std::string
with std::string_view
unless we're really sure it is safe. I suggest you not try any replacements of std::string
with std::string_view
in this initial pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. context.headers.user
and context.headers.forwardedFor
are of type boost::string_view
in the current version. Source: src/ripple/rpc/Context.h:60
I'm trying to port from this dependence on boost to std.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I don't think there's currently any way to make this cheaper than what you have done here. The Json::Value::operator[]
interface does not (yet) allow assignment from a std::string_view
.But it could be made string_view
aware in a future pull request if we chose to do that.
Alternatively, we could update the rippled
codebase to use boost::json
instead of Json::Value
. I think that would also get us std::string_view
support. But that would be a much bigger lift (with potentially more benefit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for the suggestions Scott!
Yeah, Json::Value
was written a long time, I'm guessing before boost came up with their json library. It appears to have so much of overlap with implementations from boost!
@@ -847,8 +848,14 @@ ServerHandlerImp::processRequest( | |||
*/ | |||
if (role != Role::IDENTIFIED && role != Role::PROXY) | |||
{ | |||
forwardedFor.clear(); | |||
user.clear(); | |||
// Simulate the clearing of string_view through a swap. Is this a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not comfortable with this method. Since these strings (user
and forwardedFor
) are being modified, they are not suitable candidates for std::string_view
. boost::string_view
has a clear()
member function, until std. Should I retain the original version or change this occurence to std:string
instead? @scottschurr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string_view
doesn't have a clear()
method, but it does have a remove_suffix()
method: https://en.cppreference.com/w/cpp/string/basic_string_view/remove_suffix
I think you might be able to use use...
forwardedFor.remove_suffix(forwardedFor.size())
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks, this looks much cleaner 👍 I'll incorporate this change.
…ue of user, forwardedFor vars
@@ -997,9 +997,13 @@ OverlayImpl::processValidatorList( | |||
return true; | |||
}; | |||
|
|||
auto key = req.target().substr(prefix.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally leave this alone. You're adding complexity for no real reason here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I leave type-deduction to auto, I'm not 100% sure if it is deduced as std::string_view
. On line 1006, there is a comparison of key
with std::string_view::npos
.
My observation is that construction of std::string_view
without the second length parameter results in garbage values at the end of key
. I'm not really sure why that's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sympathize with your position, @nbougalis. But key
is passed to ValidatorList::getAvailable()
, which accepts a std::string_view
. The conversion must happen somewhere.
I think perhaps the conversion can be done a bit more cleanly like this:
// Convert the boost::string_view, returned by
// boost::http::request::target(), into a std::string_view.
std::string_view key = [&req, &prefix]() {
boost::string_view const key = req.target().substr(prefix.size());
return std::string_view(key.data(), key.length());
}();
Since the auto
is gone we can always see what we're handling. And it limits the scope of the boost::string_view
as much as possible. But that doesn't remove the fundamental objection.
FWIW, I have a theory regarding the garbage values at the end of key
if the length parameter is omitted. Remember that a string_view
is a pointer and a length. There is no guarantee of a null termination to the string of characters contained by a string_view
. This is one of the things that can make string_view
dangerous to use.
If you construct a string_view
with only a char*
, then the string_view
constructor must determine the length of the string it represents. It does so by scanning for a null termination. Since the old string_view
you are using to construct the new string_view
is not required to contain a null terminated string, the constructor of the new string_view
may walk past the end of valid characters, potentially resulting in undefined behavior.
For more details see the description of constructor 4 on this page: https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view
The lesson from this is that the only situation where a string_view
should be constructed with a char*
(and no length) is if you know the char*
is a c-style string with a null termination.
string_view
is often a nice optimization. But it's easy to mess up with it. We must handle the knife carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i missed this comment, thanks for the explanation Scott! yes, that makes sense
It seems like you're running into some problems interfacing with I noticed in the Boost documentation that there's a configuration preprocessor macro that may be useful: What I don't know is how many wrenches that change would throw into the Conan build process. Perhaps @thejohnfreeman can advise us on how much trouble using that configuration preprocessor macro would cause, or if it's even possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look great to me! I left a few comments for your consideration, but none of them are required.
That said, I'm not approving this pull request yet. I'd like to hear from @thejohnfreeman about the feasibility of using the BOOST_BEAST_USE_STD_STRING_VIEW
boost configuration preprocessor macro. If that's feasible, it may make sense to integrate that change with this pull request. Or to stack this pull request on top of that change.
@@ -997,9 +997,13 @@ OverlayImpl::processValidatorList( | |||
return true; | |||
}; | |||
|
|||
auto key = req.target().substr(prefix.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sympathize with your position, @nbougalis. But key
is passed to ValidatorList::getAvailable()
, which accepts a std::string_view
. The conversion must happen somewhere.
I think perhaps the conversion can be done a bit more cleanly like this:
// Convert the boost::string_view, returned by
// boost::http::request::target(), into a std::string_view.
std::string_view key = [&req, &prefix]() {
boost::string_view const key = req.target().substr(prefix.size());
return std::string_view(key.data(), key.length());
}();
Since the auto
is gone we can always see what we're handling. And it limits the scope of the boost::string_view
as much as possible. But that doesn't remove the fundamental objection.
FWIW, I have a theory regarding the garbage values at the end of key
if the length parameter is omitted. Remember that a string_view
is a pointer and a length. There is no guarantee of a null termination to the string of characters contained by a string_view
. This is one of the things that can make string_view
dangerous to use.
If you construct a string_view
with only a char*
, then the string_view
constructor must determine the length of the string it represents. It does so by scanning for a null termination. Since the old string_view
you are using to construct the new string_view
is not required to contain a null terminated string, the constructor of the new string_view
may walk past the end of valid characters, potentially resulting in undefined behavior.
For more details see the description of constructor 4 on this page: https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view
The lesson from this is that the only situation where a string_view
should be constructed with a char*
(and no length) is if you know the char*
is a c-style string with a null termination.
string_view
is often a nice optimization. But it's easy to mess up with it. We must handle the knife carefully.
src/ripple/rpc/impl/Role.cpp
Outdated
return extractIpAddrFromField( | ||
std::string_view{it->value().substr(0, found).data(), found}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're doing a bit more work than necessary here. Since the substr
you are building has a starting position of 0
, we know that value
and substr
will return the same value for data()
. So I think you can remove the call to substr()
like this:
// http_request_type has a dependency on boost. Explicitly convert such
// boost::string_view into std::string_view
return extractIpAddrFromField(
std::string_view{it->value().data(), found});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this makes sense 👍
[&] { | ||
auto const iter = session->request().find("X-User"); | ||
if (iter != session->request().end()) | ||
return iter->value(); | ||
return boost::beast::string_view{}; | ||
return std::string_view{ | ||
iter->value().data(), iter->value().length()}; | ||
return std::string_view{}; | ||
}()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want you can remove the necessity of typing std::string_view
twice by specifying the return type of the lambda. The resulting lambda might look like this:
[&session]() -> std::string_view {
auto const iter = session->request().find("X-User");
if (iter != session->request().end())
return {iter->value().data(), iter->value().length()};
return {};
}());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you very much Scott! this looks nicer 👍
addressed PR comments. tagging @thejohnfreeman to get his feedback on Boost's macro usage and it's impact on the conan build process. |
Sorry, I missed the earlier pings for my attention. It is not a problem to use the Boost macro. We already have to do something similar for If you want to enable / require the macro
|
Thanks for the tip John. I executed the above commands, but I ran into an issue here:
Are we using |
You executed what commands? You've not included enough context to debug your situation. Please copy a full log that reproduces your error into a Gist (not a GitHub comment). |
Ok, here is the complete gist: https://gist.github.com/ckeshava/a47fcf02624c95c7d8a9e3824c961cc9 I'm asking about To summarize, I updated the conan profile and generated new
The above error is generated in the last command |
"Red herring" means distraction, irrelevant. I don't know why you linked that comment, or what it has to do with this issue. I can see it mentions From your log: [options]
boost:extra_b2_flags="define=BOOST_BEAST_USE_STD_STRING_VIEW"
[build_requires]
[env]
CFLAGS=-DBOOST_ASIO_HAS_STD_INVOKE_RESULT
CXXFLAGS=-DBOOST_BEAST_USE_STD_STRING_VIEW
[conf]
tools.build:cflags=['-DBOOST_ASIO_HAS_STD_INVOKE_RESULT']
tools.build:cxxflags=['-DBOOST_ASIO_HAS_STD_INVOKE_RESULT', '-DBOOST_BEAST_USE_STD_STRING_VIEW', '-DBOOST_BEAST_USE_STD_STRING_VIEW'] Look at what has happened here. You have added (twice) the string [options]
boost:extra_b2_flags="define=BOOST_ASIO_HAS_STD_INVOKE_RESULT define=BOOST_BEAST_USE_STD_STRING_VIEW"
[build_requires]
[env]
CFLAGS=-DBOOST_ASIO_HAS_STD_INVOKE_RESULT -DBOOST_BEAST_USE_STD_STRING_VIEW
CXXFLAGS=-DBOOST_ASIO_HAS_STD_INVOKE_RESULT -DBOOST_BEAST_USE_STD_STRING_VIEW
[conf]
tools.build:cflags=['-DBOOST_ASIO_HAS_STD_INVOKE_RESULT', '-DBOOST_BEAST_USE_STD_STRING_VIEW']
tools.build:cxxflags=['-DBOOST_ASIO_HAS_STD_INVOKE_RESULT', '-DBOOST_BEAST_USE_STD_STRING_VIEW'] That was not the cause of the error you shared, but it would have likely caused an error later in the pipeline. Your error suggests a corruption in the generated CMake modules that makes CMake look for Boost binaries in a cache directory that you deleted. You should delete your bulid directory ( |
Okay thanks for the help, I have followed your suggestions. I fixed my conan profile, deleted the I deleted the conan cache with This command fails: Snapshot of the logs:
|
@ckeshava - have you made any progress on this? |
@intelliot Sorry, I forgot about this PR, I'll work on it. @thejohnfreeman Based on your suggestions above, should we expect all rippled validators to perform the conan profile updates manually? Can we make it easier for validators to set up this conan profile? |
I don't think we expect validators to build rippled. I think they use our packages. @legleux @manojsdoshi can you confirm? Regardless, there are other profile changes we already require and document in the build instructions. This would just be one more instruction. |
I think it's a fairly safe assumption that most validtators do not build from source and are installing via binary packages. |
…this necessitates conversions to std::string wherever non-temporary uses of string_view is required compile the code with additional flags set in the conan profile. Need to update the build instructions appropriately
I have brought this branch up to speed, sorry about the delay. I need to update the build instructions, I'll do that just before the merge. I'm concerned that it will unnecessarily add complexity to the build procedure otherwise. |
@scottschurr Thanks for pointing it out, I have modified the relevant occurrences to use pass-by-value semantics in cfa960f Do I need to |
@ckeshava, no. I'm assuming you used Cupcake to install locally into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you and @thejohnfreeman have done good work here. I left just a few comments. Let me know what you think. Thanks.
@@ -46,8 +42,8 @@ getFeatureValue( | |||
return {}; | |||
boost::smatch match; | |||
boost::regex rx(feature + "=([^;\\s]+)"); | |||
std::string const value = header->value(); | |||
if (boost::regex_search(value, match, rx)) | |||
std::string const allFeatures = std::string{header->value()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to construct then assign to it. You can simply construct it:
std::string const allFeatures(header->value());
The generated code is probably identical, but I think simply constructing the string is easier to read.
src/ripple/overlay/impl/PeerImp.cpp
Outdated
@@ -368,7 +366,7 @@ PeerImp::cluster() const | |||
return static_cast<bool>(app_.cluster().member(publicKey_)); | |||
} | |||
|
|||
std::string | |||
std::string_view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may have received encouragement to change getVerson()
to return a string_view
. I'd like to respectfully disagree with that encouragement.
A string_view
, since it does not own the values that it points at, leaves many more opportunities for dangling references than a std::string
does. So returning a std::string
here (as was the original signature) is much more conservative than returning a string_view
.
My personal heuristic for safety with string_view
is:
string_view
is great for local use. You can see the lifetime of all the storage.string_view
is a great parameter to pass in. Any local that might have been constructed to support thestring_view
will be held on the stack until we return from the function. So thestring_view
will not dangle for as long as we remain in the function. Once we return then thestring_view
is automatically out of scope and can't be used.
I'm extremely cautious about returning a string_view
. If I have some way of knowing that the characters referred to by the string_view
will never go out of scope then it's safe. An example would be a string_view
built from a static const char[]
. But otherwise I think the risk/benefit ratio is wrong.
Just my opinion. But I'd be happier if this went back to returning a std::string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Scott, your explanation of the std::string_view
use cases is very clear, I have a better understanding now.
I'm returning a std::string_view
of headers_[<key-name>]
. headers_
is a private non-static data member of the PeerImp
class, it remains in-scope for as long as the PeerImp
object is alive. But I haven't looked into the internal type of boost::beast::http::fields const& headers_;
I don't know if it uses any heap-allocated memory.
I'm weakly in favor of retaining the std::string_view
, but this is a tenuous argument.
I'm open to switching it back to a std::string
@thejohnfreeman did I miss any point here? what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to return std::string_view
. It's like a container returning a const_reference
. It's generally guaranteed to be alive for the duration of the calling scope, and it's up to the caller to handle any copying necessary. But my preference is weak too. I think @scottschurr has stronger conviction than I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @thejohnfreeman! Yup, I actually do have some conviction on this one. Let me explain why.
Returning a string_view
from those places is an optimization that has some future risk associated with it and no measurement showing that the optimization buys us anything right now. If someone can show me a measurable performance gain from returning string_view
s in these locations then I'll reconsider. Lacking that, I think returning std::string
is the safest approach with no measurable downside.
src/ripple/overlay/impl/PeerImp.cpp
Outdated
@@ -836,7 +834,7 @@ PeerImp::name() const | |||
return name_; | |||
} | |||
|
|||
std::string | |||
std::string_view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment regarding PeerImp::getVersion()
. I prefer the old return type of std::string
.
src/ripple/overlay/impl/PeerImp.h
Outdated
@@ -344,7 +344,7 @@ class PeerImp : public Peer, | |||
} | |||
|
|||
/** Return the version of rippled that the peer is running, if reported. */ | |||
std::string | |||
std::string_view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer returning a std::string
here.
src/ripple/overlay/impl/PeerImp.h
Outdated
@@ -464,7 +464,7 @@ class PeerImp : public Peer, | |||
std::string | |||
name() const; | |||
|
|||
std::string | |||
std::string_view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer returning a std::string
here.
|
@scottschurr There are two more instances where a Do you want me to change their return types to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM. Thanks for battling through this.
Regarding...
There are two more instances where a
boost::string_view
is returned from two functions:extractIpAddrFromField
andforwardedFor
in Role.cpp. This has been in the code for the past 3 years.Do you want me to change their return types to
std::string
instead? That would be a change in the behavior of the existing codebase (not only a refactor fromboost::string_view
tostd::string_view
)
Personally, I think this pull request should leave those two functions alone. extractIpAddrFromField
is static
, so it is local to only that file. It's probably safe, because is used in a very limited scope.
forwardedFor
is not static
. However, since it has been in use for 3 years, changing it to return a std::string
without a specific motivation would be a pessimization. I don't see a good reason to introduce that pessimization at this time.
I hope that makes some sense.
@scottschurr I think it would be better to change the return type of those functions into a But I agree, another time, another PR |
@ckeshava could you confirm whether this is ready to merge, from your perspective? Also, please suggest a commit message for the squashed commit. |
It looks good from my end. Suggested commit message: |
- update the overloads of parseBase58 overloads in PublicKey, SecretKey classes to be compatible with the new fast base58 algorithm
* upstream/develop: (32 commits) fixInnerObjTemplate2 amendment (XRPLF#5047) Set version to 2.3.0-b1 Ignore restructuring commits (XRPLF#4997) Recompute loops (XRPLF#4997) Rewrite includes (XRPLF#4997) Rearrange sources (XRPLF#4997) Move CMake directory (XRPLF#4997) Add bin/physical.sh (XRPLF#4997) Prepare to rearrange sources: (XRPLF#4997) Change order of checks in amm_info: (XRPLF#4924) Add the fixEnforceNFTokenTrustline amendment: (XRPLF#4946) Replaces the usage of boost::string_view with std::string_view (XRPLF#4509) docs: explain how to find a clang-format patch generated by CI (XRPLF#4521) XLS-52d: NFTokenMintOffer (XRPLF#4845) chore: remove repeat words (XRPLF#5041) Expose all amendments known by libxrpl (XRPLF#5026) fixReducedOffersV2: prevent offers from blocking order books: (XRPLF#5032) Additional unit tests for testing deletion of trust lines (XRPLF#4886) Fix conan typo: (XRPLF#5044) Add new command line option to make replaying transactions easier: (XRPLF#5027) ...
This PR seeks to solve issue #3435
High Level Overview of Change
Context of Change
Type of Change
Before:
The codebase makes use of
boost:string_view
for cheap manipulation of strings.After:
This PR uses
std::string_view
instead ofboost::string_view
to perform similar operations.Can I avoid making expensive conversions into
std::string
insidePing.cpp
forjss::ip
andjss::username
response fields?The
http_request_type
has a dependency onboost
. Is it okay to occasionally convert from boost's string_view intostd::string_view
? I'm not sure about the performance implications.