refactor: Replace boost regex with re2#15134
refactor: Replace boost regex with re2#15134MBkkt wants to merge 8 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
3279d93 to
3d1cb00
Compare
| } | ||
| // For nested case, e.g., for "{"my": {"hello": 10}}", | ||
| // "$.my" will return an object type. | ||
| case simdjson::ondemand::json_type::object: |
There was a problem hiding this comment.
cc: @philo-he Would you like to take a look at the change to get_json_object? Thanks.
There was a problem hiding this comment.
Just in case, change here, it was same code in two cases
There was a problem hiding this comment.
The changes make sense. Thanks.
3d1cb00 to
1d6f8e8
Compare
|
@kevinwilfong Can you review this changes, please? |
1d6f8e8 to
a365fdc
Compare
|
I will let Meta folks chime in as well. |
kevinwilfong
left a comment
There was a problem hiding this comment.
Thanks looks good to me
| static const LazyRE2 kDurationRegex{ | ||
| R"(^\s*(\d+(?:\.\d+)?)\s*([a-zA-Z]+)\s*$)"}; | ||
| // TODO: Remove re2::StringPiece != std::string_view hacks. | ||
| // It's needed because system is old, update please. |
There was a problem hiding this comment.
Could you be more specific either here in the the PR comments about what "the system" is, are you referring to CI?
There was a problem hiding this comment.
I updated comment.
Yes, in some system in CI, system abseil was build with absl::string_view != std::string_view and re2 was build with re2::StringPiece != absl::string_view
It's old unsupported version of abseil, because it's true only for C++14, but abseil drops C++14 support already.
|
@MBkkt it looks like there are conflicts, could you rebase? |
a365fdc to
48fc86b
Compare
|
@kevinwilfong rebased |
|
@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this in D85479675. |
czentgr
left a comment
There was a problem hiding this comment.
Looks good.
I don't see boost API being replaced though. So we optimized some re2 usage with fast_float and removed boost::regex from linking which was (apparently) not used anyway.
|
@czentgr Yes, but It's because this PR exists too long. |
| static const LazyRE2 kDurationRegex{ | ||
| R"(^\s*(\d+(?:\.\d+)?)\s*([a-zA-Z]+)\s*$)"}; |
There was a problem hiding this comment.
Hi @MBkkt, it looks like our internal build is a little stricter than what runs here.
-Wmissing-field-initializers is showing that the options_ field isn't being initialized here, combined with -Werror this is causing the build to fail and I'm unable to merge
could you either change this to invoke the constructor rather than the list initializer {} -> () or add an initializer for options_ .options_={}
There was a problem hiding this comment.
- Can you change in future (in a separate PR) to make github CI same as internal in terms of warnings?
- I'm not sure it will help, but lets try this: (same as
.options_ =, but without explicit field mention) f09e68f
// Silence warnings about missing initializers for members of LazyRE2.
#if defined(__GNUC__)
#pragma GCC diagnostic ignored "-Wmissing-field-initializers"
#endif
// Helper for writing global or static RE2s safely.
// Write
// static LazyRE2 re = {".*"};
// and then use *re instead of writing
// static RE2 re(".*");
// The former is more careful about multithreaded
// situations than the latter.
//
// N.B. This class never deletes the RE2 object that
// it constructs: that's a feature, so that it can be used
// for global and function static variables.
class LazyRE2 {
private:
struct NoArg {};
public:
typedef RE2 element_type; // support std::pointer_traits
// Constructor omitted to preserve braced initialization in C++98.
// Pretend to be a pointer to Type (never NULL due to on-demand creation):
RE2& operator*() const { return *get(); }
RE2* operator->() const { return get(); }
// Named accessor/initializer:
RE2* get() const {
absl::call_once(once_, &LazyRE2::Init, this);
return ptr_;
}
// All data fields must be public to support {"foo"} initialization.
const char* pattern_;
RE2::CannedOptions options_;
NoArg barrier_against_excess_initializers_;
mutable RE2* ptr_;
mutable absl::once_flag once_;
private:
static void Init(const LazyRE2* lazy_re2) {
lazy_re2->ptr_ = new RE2(lazy_re2->pattern_, lazy_re2->options_);
}
void operator=(const LazyRE2&); // disallowed
};There was a problem hiding this comment.
It looks like that doesn't work (it complains about barrier_against_excess_initializers_ in that case)
Explicitly listing the fields does
{.pattern_ = R"(^\s*(\d+(?:.\d+)?)\s*([a-zA-Z]+)\s*$)", .options_ = {}}
There was a problem hiding this comment.
155b0be something like this maybe can help with internal CI warning?
There was a problem hiding this comment.
While that does work internally, unfortunately it looks like it breaks the Mac build here in OS
There was a problem hiding this comment.
Thanks! Merging, sorry for the slog
51d004e to
f09e68f
Compare
f09e68f to
155b0be
Compare
f516595 to
736cf47
Compare
|
@kevinwilfong merged this pull request in 909429d. |
| #pragma once | ||
|
|
||
| #define XXH_INLINE_ALL | ||
| #include <fast_float/fast_float.h> |
There was a problem hiding this comment.
Do you need to use velox_include_directories to include the fast-float header? If the environment does not have fast-float, the velox build will fail, I'm not sure why Velox does not build it from source. @MBkkt CC @czentgr
https://github.com/IBM/velox/actions/runs/18915929559/job/53999183287
FAILED: velox/buffer/CMakeFiles/velox.dir/__/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp.o
/usr/bin/ccache /opt/rh/devtoolset-11/root/usr/bin/c++ -DAWS_MQTT_WITH_WEBSOCKETS -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=11 -DAWS_SDK_VERSION_PATCH=285 -DAWS_USE_EPOLL -DAZ_RTTI -DCURL_STATICLIB -DGEOS_INLINE -DGFLAGS_IS_A_DLL=0 -DNDEBUG -DSIMDJSON_EXPERIMENTAL_ALLOW_INCOMPLETE_JSON -DSIMDJSON_THREADS_ENABLED=1 -DUSE_UNSTABLE_GEOS_CPP_API -DVELOX_DISABLE_GOOGLETEST -DVELOX_ENABLE_ABFS -DVELOX_ENABLE_COMPRESSION_LZ4 -DVELOX_ENABLE_GCS -DVELOX_ENABLE_GEO -DVELOX_ENABLE_HDFS -DVELOX_ENABLE_PARQUET -DVELOX_ENABLE_S3 -I/work/. -I/work/velox/external/xxhash -I/work/_build/release -I/work/velox/tpch/gen/dbgen/include -I/work/velox/tpcds/gen/dsdgen/include -I/work/_build/release/_deps/simdjson-src/include -isystem /work/gluten/dev/vcpkg/vcpkg_installed/x64-linux-avx/include -isystem /work/velox -isystem /work/velox/external -isystem /work/gluten/dev/vcpkg/vcpkg_installed/x64-linux-avx/lib/pkgconfig/../../include -isystem /work/gluten/dev/vcpkg/vcpkg_installed/x64-linux-avx/include/geos -Wno-error=stringop-overflow -Wno-error=cpp -Wno-missing-field-initializers -Wno-unknown-warning-option -mavx2 -mfma -mavx -mf16c -mlzcnt -mbmi2 -D USE_VELOX_COMMON_BASE -D HAS_UNCAUGHT_EXCEPTIONS -DFOLLY_CFG_NO_COROUTINES -Wall -Wextra -Wno-unused -Wno-unused-parameter -Wno-sign-compare -Wno-ignored-qualifiers -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-comment -Wno-int-in-bool-context -Wno-redundant-move -Wno-array-bounds -Wno-maybe-uninitialized -Wno-unused-result -Wno-format-overflow -Wno-strict-aliasing -Werror -O3 -DNDEBUG -std=gnu++20 -fPIC -fdiagnostics-color=always -ffp-contract=off -DS2N_KYBER512R3_AVX2_BMI2 -DS2N_STACKTRACE -DS2N_CPUID_AVAILABLE -DS2N_FEATURES_AVAILABLE -fPIC -DS2N_FALL_THROUGH_SUPPORTED -DS2N___RESTRICT__SUPPORTED -DS2N_MADVISE_SUPPORTED -DS2N_CLONE_SUPPORTED -MD -MT velox/buffer/CMakeFiles/velox.dir/__/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp.o -MF velox/buffer/CMakeFiles/velox.dir/__/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp.o.d -o velox/buffer/CMakeFiles/velox.dir/__/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp.o -c /work/velox/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp
In file included from /work/velox/functions/prestosql/registration/DateTimeFunctionsRegistration.cpp:18:
/work/./velox/functions/prestosql/DateTimeFunctions.h:19:10: fatal error: fast_float/fast_float.h: No such file or directory
19 | #include <fast_float/fast_float.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
There was a problem hiding this comment.
No I shouldn't, because library target already linked in cmake
There was a problem hiding this comment.
There was a problem hiding this comment.
I think you build folly in some specific way, because fast_float link provided from Folly::folly now I think.
Lets do this explicit
There was a problem hiding this comment.
IBM installs the folly by vcpkg, so the environment already has folly, but vcpkg does not support install fast-float version 8.0.2 apache/gluten#10977, so fast-float does not exist in the environment. It does not call setup-script.sh
There was a problem hiding this comment.
so fast-float does not exist in the environment. It does not call setup-script.sh
So you don't have fast_float in vcpkg folly or how it works?
but vcpkg does not support install fast-float version 8.0.2
Hmm, strange, on their site they're claim support 8.0.2

Maybe I'm missing something because I'm not working with C++ package managers and system libraries.
As an alternative solution we probably can update fast_float version? I don't think it's bad, but it's also affects folly
There was a problem hiding this comment.
Maybe because Gluten uses an old vcpkg version? I'm not sure for that.
But I think current #15322 can help, so we don't need to update fast_float version and care about Gluten vcpkg fast-float version.
There was a problem hiding this comment.
Ok, lets try current approach, if it doesn't help, please ping me. And we will try something else
There was a problem hiding this comment.
Thanks for your in time fix!
re2 works better, we will also decrease binary size and count of dependencies
First commit inspired by this PR #15124