fix: Replace boost::regex with RE2 in parse_duration function#15124
fix: Replace boost::regex with RE2 in parse_duration function#15124abhinavmuk04 wants to merge 1 commit intomainfrom
Conversation
|
@abhinavmuk04 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84391289. |
✅ Deploy Preview for meta-velox canceled.
|
|
@abhinavmuk04 Hi, nice changes! I have some suggestions:
template <typename T>
struct ParseDurationFunction {
VELOX_DEFINE_FUNCTION_TYPES(T);
std::unique_ptr<RE2> durationRegex_;
FOLLY_ALWAYS_INLINE void initialize(
const std::vector<TypePtr>& /*inputTypes*/,
const core::QueryConfig& /*config*/,
const arg_type<Varchar>* /*amountUnit*/) {
durationRegex_ =
std::make_unique<RE2>(R"(^\s*(\d+(?:\.\d+)?)\s*([a-zA-Z]+)\s*$)");
}
FOLLY_ALWAYS_INLINE void call(
out_type<IntervalDayTime>& result,
const arg_type<Varchar>& amountUnit) {
std::string_view valueStr;
std::string_view unit;
if (!RE2::FullMatch(
std::string_view{amountUnit}, *durationRegex_, &valueStr, &unit)) {
VELOX_USER_FAIL(
"Input duration is not a valid data duration string: {}", amountUnit);
}
double value{};
auto [_, error] = std::from_chars(
valueStr.data(), valueStr.data() + valueStr.size(), value);
if (error != std::errc{}) {
VELOX_USER_FAIL(
"Input duration value is not a valid number: {}", valueStr);
}
result = valueOfTimeUnitToMillis(value, unit);
}
}; |
|
@abhinavmuk04 will you check #15134 ? |
@MBkkt I see in this PR you seem to be handling the parse_duration function. Is this something you are working on? If so, I can let go of this PR |
@MBkkt wanted to follow up on this |
|
@abhinavmuk04 Not really. If you will merge your PR, I will just rebase. If you don't merge this PR, I will wait for review for mine, to merge it and get rid of Some explanation: I just noticed your PR and thought about it will be nice to do 3 things:
|
32f05b1 to
f77036a
Compare
f77036a to
6f2fec1
Compare
6f2fec1 to
bd64809
Compare
bd64809 to
7091e3d
Compare
7091e3d to
cbb1f3e
Compare
kevinwilfong
left a comment
There was a problem hiding this comment.
LGTM Thanks @MBkkt for all the suggestions!
kevinwilfong
left a comment
There was a problem hiding this comment.
Sorry, I noticed you didn't apply the changes from @MBkkt
cbb1f3e to
3007285
Compare
|
It looks like you had them in a previous commit (the one I was initially looking at) and accidentally reverted them. |
7f8b6af to
2862249
Compare
2862249 to
fb684ec
Compare
Summary: Some implementations of functions in Velox rely on boost::regex as regular expression engine. That is not recommended. RE2 is the preferred engine. boost relies on backtracking and it can lead to significantly slower performance compared to RE2. RE2 has O(n) complexity for a given string of length n, as it uses finite-state machines. This change replaces boost::regex with RE2 in the ParseDurationFunction in DateTimeFunctions.h. The implementation now uses RE2::FullMatch instead of boost::regex_search, and directly captures the value and unit strings using RE2's API, simplifying the code by eliminating the need for boost::smatch. Reviewed By: kevinwilfong Differential Revision: D84391289
fb684ec to
99cdebd
Compare
|
This pull request has been merged in ec15078. |
`RE2::FullMatch()` uses `absl::string_view` not `std::string_view`: https://github.com/google/re2/blob/61c4644171ee6b480540bf9e569cba06d9090b4b/re2/re2.h#L411 `absl::string_view` may not be an alias of `std::string_view`. In the case, the following error is reported: ```text In file included from velox/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp:18: velox/functions/prestosql/DateTimeFunctions.h:1939:10: error: no matching function for call to 'FullMatch' 1939 | if (!RE2::FullMatch( | ^~~~~~~~~~~~~~ /include/re2/re2.h:411:15: note: candidate function template not viable: no known conversion from 'std::string_view' (aka 'basic_string_view<char>') to 'absl::string_view' for 1st argument 411 | static bool FullMatch(absl::string_view text, const RE2& re, A&&... a) { | ^ ~~~~~~~~~~~~~~~~~~~~~~ ``` Related: facebookincubatorGH-15124
Summary: `RE2::FullMatch()` uses `absl::string_view` not `std::string_view`: https://github.com/google/re2/blob/61c4644171ee6b480540bf9e569cba06d9090b4b/re2/re2.h#L411 `absl::string_view` may not be an alias of `std::string_view`. In the case, the following error is reported: ```text In file included from velox/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp:18: velox/functions/prestosql/DateTimeFunctions.h:1939:10: error: no matching function for call to 'FullMatch' 1939 | if (!RE2::FullMatch( | ^~~~~~~~~~~~~~ /include/re2/re2.h:411:15: note: candidate function template not viable: no known conversion from 'std::string_view' (aka 'basic_string_view<char>') to 'absl::string_view' for 1st argument 411 | static bool FullMatch(absl::string_view text, const RE2& re, A&&... a) { | ^ ~~~~~~~~~~~~~~~~~~~~~~ ``` Old RE2 that is provided by CentOS Stream 9 doesn't accept `absl::string_view`. Old RE2 uses `re2::StringPiece` for `RE2::FullMatch()` and new RE2 provides `re2::StringPiece` as an alias of `absl::string_view`. So we can use `re2::StringPiece` for both of old and new RE2. We can drop support for old RE2 to always use `absl::string_view` but we use `re2::StringPiece` for now. It seems that RE2 will use `std::string_view` instead of `absl::string_view` eventually. For example, google/re2@2a029e2 is a commit to migrate to `std::optional` from `absl::optional`. We can revisit this after RE2 migrates to `std::string_view`. Related: GH-15124 Pull Request resolved: #15259 Reviewed By: kevinwilfong Differential Revision: D85539525 Pulled By: mbasmanova fbshipit-source-id: 1dde1c47d7a337d220488aa64b5efa3408876d1e
…ubator#15124) Summary: Pull Request resolved: facebookincubator#15124 Some implementations of functions in Velox rely on boost::regex as regular expression engine. That is not recommended. RE2 is the preferred engine. boost relies on backtracking and it can lead to significantly slower performance compared to RE2. RE2 has O(n) complexity for a given string of length n, as it uses finite-state machines. This change replaces boost::regex with RE2 in the ParseDurationFunction in DateTimeFunctions.h. The implementation now uses RE2::FullMatch instead of boost::regex_search, and directly captures the value and unit strings using RE2's API, simplifying the code by eliminating the need for boost::smatch. Reviewed By: kevinwilfong Differential Revision: D84391289 fbshipit-source-id: 4ac0f56e53392fc373d0976e029bd84859ab9874
…bator#15259) Summary: `RE2::FullMatch()` uses `absl::string_view` not `std::string_view`: https://github.com/google/re2/blob/61c4644171ee6b480540bf9e569cba06d9090b4b/re2/re2.h#L411 `absl::string_view` may not be an alias of `std::string_view`. In the case, the following error is reported: ```text In file included from velox/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp:18: velox/functions/prestosql/DateTimeFunctions.h:1939:10: error: no matching function for call to 'FullMatch' 1939 | if (!RE2::FullMatch( | ^~~~~~~~~~~~~~ /include/re2/re2.h:411:15: note: candidate function template not viable: no known conversion from 'std::string_view' (aka 'basic_string_view<char>') to 'absl::string_view' for 1st argument 411 | static bool FullMatch(absl::string_view text, const RE2& re, A&&... a) { | ^ ~~~~~~~~~~~~~~~~~~~~~~ ``` Old RE2 that is provided by CentOS Stream 9 doesn't accept `absl::string_view`. Old RE2 uses `re2::StringPiece` for `RE2::FullMatch()` and new RE2 provides `re2::StringPiece` as an alias of `absl::string_view`. So we can use `re2::StringPiece` for both of old and new RE2. We can drop support for old RE2 to always use `absl::string_view` but we use `re2::StringPiece` for now. It seems that RE2 will use `std::string_view` instead of `absl::string_view` eventually. For example, google/re2@2a029e2 is a commit to migrate to `std::optional` from `absl::optional`. We can revisit this after RE2 migrates to `std::string_view`. Related: facebookincubatorGH-15124 Pull Request resolved: facebookincubator#15259 Reviewed By: kevinwilfong Differential Revision: D85539525 Pulled By: mbasmanova fbshipit-source-id: 1dde1c47d7a337d220488aa64b5efa3408876d1e
Summary:
Some implementations of functions in Velox rely on boost::regex as regular expression engine. That is not recommended. RE2 is the preferred engine.
boost relies on backtracking and it can lead to significantly slower performance compared to RE2. RE2 has O(n) complexity for a given string of length n, as it uses finite-state machines.
This change replaces boost::regex with RE2 in the ParseDurationFunction in DateTimeFunctions.h. The implementation now uses RE2::FullMatch instead of boost::regex_search, and directly captures the value and unit strings using RE2's API, simplifying the code by eliminating the need for boost::smatch.
Differential Revision: D84391289