feat(cudf): Add cuDF based OrderBy operator by devavret · Pull Request #12735 · facebookincubator/velox

devavret · 2025-03-20T15:28:56Z

This PR adds a cuDF based OrderBy operator and tooling to replace existing
Velox based operators. This includes:

CudfVector class that holds a cudf::table and is a replacement for Velox's
RowVector when dealing with cuDF.
Interop code to convert between Velox and cuDF RowVectors.
CudfToVelox and CudfFromVelox operators that sit between the cuDF and Velox
operators and handle the conversion of RowVectors to cudf::table and back.
A cuDF driver adapter that converts Velox operators to cuDF operators.
Nvtx tooling to help with profiling

- West const - header cleanup - nodiscard

netlify · 2025-03-20T15:29:16Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`13a4a87`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/67f8c9cd683dc40008e0c62c

bdice

Leaving a few explanatory comments for reviewers. Thanks in advance for reviewing!

Lots of credit to @karthikeyann, @devavret, @mhaseeb123, @GregoryKimball on the cuDF side, and lots of appreciation to @oerling @pedroerp @Yuhta @kgpai @assignUser (and more!) on the Meta / Voltron side for all your assistance. We are looking forward to upstreaming more features after this initial PR lands.

bdice · 2025-03-20T15:32:42Z

We are pinning this to specific commits of cuDF and its dependencies to avoid breakage from any final changes in the 25.04 release. Once the RAPIDS 25.04 release is out (currently targeting April 9-10), we can remove a lot of this logic for rapids-cmake, rmm, and kvikio -- and just pin cuDF to the stable release.

bdice · 2025-03-20T15:33:50Z

  endif()
  find_package(CUDAToolkit REQUIRED)
+  if(VELOX_ENABLE_CUDF)
+    set(VELOX_ENABLE_ARROW ON)


cuDF itself does not need Arrow (cuDF uses nanoarrow), but the Velox-cuDF interop requires Arrow functionality in Velox.

bdice · 2025-03-20T15:34:54Z

  dnf_install autoconf automake python3-devel pip libtool

-  pip install cmake==3.28.3
+  pip install cmake==3.30.4


cuDF and its dependencies require CMake 3.30.4. That CMake version shipped with a fix for finding some CUDA Toolkit components that cuDF and its dependencies use.

Could you add a cmake_minimum_required to the top of cudf.cmake with a comment?

facebook-github-bot · 2025-03-20T16:14:22Z

Hi @devavret!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

facebook-github-bot · 2025-03-20T20:15:00Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

jinchengchenghh · 2025-04-04T10:57:22Z

+
+DECLARE_bool(velox_cudf_enabled);
+DECLARE_string(velox_cudf_memory_resource);
+


Can you also declare velox_cudf_debug` here? I also need to set the flag.

added in 5575205

jinchengchenghh · 2025-04-04T12:50:21Z

+    return nullptr;
+  }
+  finished_ = noMoreInput_;
+  return outputTable_;


The output table might be a very big table, containing all the input data, which is not allowed.

jinchengchenghh · 2025-04-04T15:19:20Z

Failed by PlanNode destructor

(gdb) bt 
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007effa62bbe73 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007effa626eb46 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007effa6258833 in __GI_abort () at abort.c:79
#4  0x00007eff20678b21 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#5  0x00007efedde3c24d in folly::exception_tracer::(anonymous namespace)::terminateHandler ()
    at /code/gluten/ep/build-velox/build/velox_ep/deps-download/folly/folly/debugging/exception_tracer/ExceptionTracer.cpp:226
#6  0x00007eff2068453c in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#7  0x00007eff20683509 in __cxa_call_terminate (ue_header=0x7effa02e3120) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#8  0x00007eff20683c8a in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=5138137972254386944, ue_header=0x7effa02e3120, context=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:685
#9  0x00007eff2de142d4 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7effa02e3120, context=context@entry=0x7fffaab195c0, frames_p=frames_p@entry=0x7fffaab196b0) at ../../../libgcc/unwind.inc:64
#10 0x00007eff2de14971 in _Unwind_RaiseException (exc=0x7effa02e3120) at ../../../libgcc/unwind.inc:136
#11 0x00007eff206847fc in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x7efedff712d8, dest=0x7efedf1f4b5e <gluten::GlutenException::~GlutenException()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:90
#12 0x00007efedf29c18f in attachCurrentThreadAsDaemonOrThrow (vm=0x7effa61cc1c0 <main_vm>, out=0x7fffaab19948) at /code/gluten/cpp/core/jni/JniCommon.h:115
#13 0x00007efedf29c969 in gluten::JniColumnarBatchIterator::~JniColumnarBatchIterator (this=0x5633d6fc5f60, __in_chrg=<optimized out>) at /code/gluten/cpp/core/jni/JniCommon.cc:98
#14 0x00007efedf29c9de in gluten::JniColumnarBatchIterator::~JniColumnarBatchIterator (this=0x5633d6fc5f60, __in_chrg=<optimized out>) at /code/gluten/cpp/core/jni/JniCommon.cc:102
#15 0x00007efedf241028 in std::default_delete<gluten::ColumnarBatchIterator>::operator() (this=0x5633d6e985f0, __ptr=0x5633d6fc5f60) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_ptr.h:95
#16 0x00007efedf23db70 in std::unique_ptr<gluten::ColumnarBatchIterator, std::default_delete<gluten::ColumnarBatchIterator> >::~unique_ptr (this=0x5633d6e985f0, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_ptr.h:396
#17 0x00007efedf24d998 in gluten::ResultIterator::~ResultIterator (this=0x5633d6e985f0, __in_chrg=<optimized out>) at /code/gluten/cpp/core/compute/ResultIterator.h:30
#18 0x00007efedf24d9b3 in std::_Destroy<gluten::ResultIterator> (__pointer=0x5633d6e985f0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
#19 0x00007efedf24d7f6 in std::allocator_traits<std::allocator<void> >::destroy<gluten::ResultIterator> (__p=0x5633d6e985f0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:648
#20 0x00007efedf24d45b in std::_Sp_counted_ptr_inplace<gluten::ResultIterator, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5633d6e985e0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:613
#21 0x00007efec7443381 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5633d6e985e0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:346
#22 0x00007efec7449765 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5633d6cfe300, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1071
#23 0x00007efec7470438 in std::__shared_ptr<gluten::ResultIterator, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5633d6cfe2f8, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1524
#24 0x00007efec7470454 in std::shared_ptr<gluten::ResultIterator>::~shared_ptr (this=0x5633d6cfe2f8, __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:175
#25 0x00007efec788fcc0 in gluten::ValueStreamNode::~ValueStreamNode (this=0x5633d6cfe2c0, __in_chrg=<optimized out>) at /code/gluten/cpp/velox/operators/plannodes/RowVectorStream.h:111
#26 0x00007efec7892beb in std::_Destroy<gluten::ValueStreamNode> (__pointer=0x5633d6cfe2c0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
#27 0x00007efec7892554 in std::allocator_traits<std::allocator<void> >::destroy<gluten::ValueStreamNode> (__p=0x5633d6cfe2c0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:648
#28 0x00007efec7890229 in std::_Sp_counted_ptr_inplace<gluten::ValueStreamNode, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5633d6cfe2b0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:613
#29 0x00007efec7443381 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5633d6cfe2b0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:346
#30 0x00007efec7449765 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5633d6cfdcb8, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1071
#31 0x00007efec746f33c in std::__shared_ptr<facebook::velox::core::PlanNode const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5633d6cfdcb0, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1524
#32 0x00007efec746f358 in std::shared_ptr<facebook::velox::core::PlanNode const>::~shared_ptr (this=0x5633d6cfdcb0, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:175
#33 0x00007efec74aff4f in std::_Destroy<std::shared_ptr<facebook::velox::core::PlanNode const> > (__pointer=0x5633d6cfdcb0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
#34 0x00007efec74ab512 in std::_Destroy_aux<false>::__destroy<std::shared_ptr<facebook::velox::core::PlanNode const>*> (__first=0x5633d6cfdcb0, __last=0x5633d6cfdcc0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:163
#35 0x00007efec74a6dbe in std::_Destroy<std::shared_ptr<facebook::velox::core::PlanNode const>*> (__first=0x5633d6cfdcb0, __last=0x5633d6cfdcc0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:196
#36 0x00007efec74a1bb9 in std::_Destroy<std::shared_ptr<facebook::velox::core::PlanNode const>*, std::shared_ptr<facebook::velox::core::PlanNode const> > (__first=0x5633d6cfdcb0, __last=0x5633d6cfdcc0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:850
#37 0x00007efec749d8fb in std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > >::~vector (this=0x5633d6fc5ef0, 
    __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_vector.h:730
#38 0x00007efec7a34f6a in facebook::velox::core::OrderByNode::~OrderByNode (this=0x5633d6fc5e90, __in_chrg=<optimized out>) at /code/gluten/ep/build-velox/build/velox_ep/./velox/core/PlanNode.h:2002
#39 0x00007efec7892c83 in std::_Destroy<facebook::velox::core::OrderByNode> (__pointer=0x5633d6fc5e90) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
#40 0x00007efec789263c in std::allocator_traits<std::allocator<void> >::destroy<facebook::velox::core::OrderByNode> (__p=0x5633d6fc5e90) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:648
#41 0x00007efec78907c1 in std::_Sp_counted_ptr_inplace<facebook::velox::core::OrderByNode, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5633d6fc5e80)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:613
#42 0x00007efec7443381 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5633d6fc5e80) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:346
#43 0x00007efec7449765 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5633d6cf60a8, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1071
#44 0x00007efec746f33c in std::__shared_ptr<facebook::velox::core::PlanNode const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5633d6cf60a0, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1524
#45 0x00007efec746f358 in std::shared_ptr<facebook::velox::core::PlanNode const>::~shared_ptr (this=0x5633d6cf60a0, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:175
#46 0x00007efec74aff4f in std::_Destroy<std::shared_ptr<facebook::velox::core::PlanNode const> > (__pointer=0x5633d6cf60a0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
--Type <RET> for more, q to quit, c to continue without paging--
#47 0x00007efec74ab512 in std::_Destroy_aux<false>::__destroy<std::shared_ptr<facebook::velox::core::PlanNode const>*> (__first=0x5633d6cf60a0, __last=0x5633d6cf60b0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:163
#48 0x00007efec74a6dbe in std::_Destroy<std::shared_ptr<facebook::velox::core::PlanNode const>*> (__first=0x5633d6cf6090, __last=0x5633d6cf60b0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:196
#49 0x00007efec74a1bb9 in std::_Destroy<std::shared_ptr<facebook::velox::core::PlanNode const>*, std::shared_ptr<facebook::velox::core::PlanNode const> > (__first=0x5633d6cf6090, __last=0x5633d6cf60b0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:850
#50 0x00007efec749d8fb in std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > >::~vector (this=0x7effa390c250, 
    __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_vector.h:730
#51 0x00007efecf2ae92d in std::_Destroy<std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > > > (__pointer=0x7effa390c250)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
#52 0x00007efecf2ae912 in std::allocator_traits<std::allocator<void> >::destroy<std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > > > (__p=0x7effa390c250) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:648
#53 0x00007efecf2ae807 in std::_Sp_counted_ptr_inplace<std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > >, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7effa390c240) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:613
#54 0x00007efec7443381 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7effa390c240) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:346
#55 0x00007efec7449765 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7effa390d998, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1071
#56 0x00007efecf2a750a in std::__shared_ptr<std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > >, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7effa390d990, __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1524
#57 0x00007efecf2a7526 in std::shared_ptr<std::vector<std::shared_ptr<facebook::velox::core::PlanNode const>, std::allocator<std::shared_ptr<facebook::velox::core::PlanNode const> > > >::~shared_ptr (
    this=0x7effa390d990, __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:175
#58 0x00007efecf2a7682 in facebook::velox::cudf_velox::cudfDriverAdapter::~cudfDriverAdapter (this=0x7effa390d980, __in_chrg=<optimized out>)
    at /code/gluten/ep/build-velox/build/velox_ep/velox/experimental/cudf/exec/ToCudf.cpp:264
#59 0x00007efecf2acf3c in std::_Function_base::_Base_manager<facebook::velox::cudf_velox::cudfDriverAdapter>::_M_destroy (__victim=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:175
#60 0x00007efecf2ac098 in std::_Function_base::_Base_manager<facebook::velox::cudf_velox::cudfDriverAdapter>::_M_manager (__dest=..., __source=..., __op=std::__destroy_functor)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:203
#61 0x00007efecf2aa73a in std::_Function_handler<void (facebook::velox::core::PlanFragment const&), facebook::velox::cudf_velox::cudfDriverAdapter>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) (__dest=..., __source=..., __op=std::__destroy_functor) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:282
#62 0x00007efec7443b7b in std::_Function_base::~_Function_base (this=0x7effa38ec590, __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:244
#63 0x00007efecea8039a in std::function<void (facebook::velox::core::PlanFragment const&)>::~function() (this=0x7effa38ec590, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/std_function.h:334
#64 0x00007efecea82728 in facebook::velox::exec::DriverAdapter::~DriverAdapter (this=0x7effa38ec570, __in_chrg=<optimized out>) at /code/gluten/ep/build-velox/build/velox_ep/./velox/exec/Driver.h:611
#65 0x00007efecea852a7 in std::_Destroy<facebook::velox::exec::DriverAdapter> (__pointer=0x7effa38ec570) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:151
#66 0x00007efecea8457b in std::_Destroy_aux<false>::__destroy<facebook::velox::exec::DriverAdapter*> (__first=0x7effa38ec570, __last=0x7effa38ec5d0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:163
#67 0x00007efecea8277a in std::_Destroy<facebook::velox::exec::DriverAdapter*> (__first=0x7effa38ec570, __last=0x7effa38ec5d0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:196
#68 0x00007efecea80635 in std::_Destroy<facebook::velox::exec::DriverAdapter*, facebook::velox::exec::DriverAdapter> (__first=0x7effa38ec570, __last=0x7effa38ec5d0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:850
#69 0x00007efecea856a7 in std::vector<facebook::velox::exec::DriverAdapter, std::allocator<facebook::velox::exec::DriverAdapter> >::~vector (
    this=0x7efed7ffba10 <facebook::velox::exec::DriverFactory::adapters>, __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_vector.h:730
#70 0x00007effa62712dd in __run_exit_handlers (status=0, listp=0x7effa6429838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:126
#71 0x00007effa6271430 in __GI_exit (status=<optimized out>) at exit.c:156
#72 0x00007effa62595d7 in __libc_start_call_main (main=main@entry=0x5633b3200720, argc=argc@entry=68, argv=argv@entry=0x7fffaab1a408) at ../sysdeps/nptl/libc_start_call_main.h:74
#73 0x00007effa6259680 in __libc_start_main_impl (main=0x5633b3200720, argc=68, argv=0x7fffaab1a408, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffaab1a3f8)
    at ../csu/libc-start.c:389

jinchengchenghh · 2025-04-04T16:43:33Z

+  bool operator()(const exec::DriverFactory& factory, exec::Driver& driver) {
+    auto state = CompileState(factory, driver, *planNodes_);
+    // Stored planNodes_ from inspect.
+    auto res = state.compile();


call planNodes_->clear(); here can solve this issue #12735 (comment)

I don't believe this is quite right. In plans involving multiple pipelines, the compile operator is called multiple times, once for each pipeline. So we need these stored plan nodes in all subsequent calls. Would adding a separate clear() method to the cudf driver adapter help? If you know when the task is finished, you can manually clear out the adapter.

Do you mean the Task parallel execution mode? I read the wave DriverAdaper https://github.com/facebookincubator/velox/blob/main/velox/experimental/wave/exec/ToWave.cpp#L251, it does not need to store the plan nodes, is there any difference?

Gluten uses the single thread task mode, so It's ok to clear it here. And if we have multiple drivers, after createAndStartDrivers in Task::start, can we clear the PlanNodes?

Is there any difference between the planNodes stored in DriverFactory and CompileState?

struct DriverFactory { std::vector<std::shared_ptr<const core::PlanNode>> planNodes;

The adapters is saved to static zone, if we call the clear after the task is finished, we don't know which driver adapter is attached to the task. Gluten has parallel tasks in a single machine.

Do you mean the Task parallel execution mode?

I meant plans that say have a join in them. Those would need to be run through the compile at least twice, once for each branch.

The adapters is saved to static zone, if we call the clear after the task is finished, we don't know which driver adapter is attached to the task. Gluten has parallel tasks in a single machine.

You wouldn't need to wait until a task is finished. You only need to wait until operator replacements have been made in it.

To be clear, I am not standing my ground here. I just think we need a different solution to the one suggested.

Is there any difference between the planNodes stored in DriverFactory and CompileState?

Yes, the CompileState stores planNodes from the whole task while planNodes stored in DriverFactory are only from the current pipeline. The latter sometimes excludes planNodes which contribute operators to multiple pipelines like partition and hash join.

But I observed now that if the required planNode cannot be found in DriverFactory::planNodes then it's usually found in DriverFactory::consumerNode. I've made this change in 5575205 and it seems to work fine both for this PR and for our internal fork with all tpch operators replaced.

@karthikeyann since you originally wrote this piece, can you verify this commit?

It looks good. I will verify this with all tpch queries. (with partition and hashjoin).

I verified with tpch benchmarks. It works.

jinchengchenghh · 2025-04-07T15:35:51Z

+      tableViews, stream, cudf::get_current_device_resource_ref());
+}
+
+std::unique_ptr<cudf::table> getConcatenatedTable(


What if the number of ConcatenatedTable rows is beyond the range? We have a case that accumulated rows in OrderBy is beyond vector_size_t range, #10848

@jinchengchenghh how does velox handle larger than vector_size_t range (all inputs together) for Orderby?

velox stores the input in RowContainer, and store the sortedRows pointer in std::vector, so I assume the input rows it can sort is size_t, and extract the outputRows with data type vector_size_t, it will output several batches whose size should satisfy the output rows config.

devavret · 2025-04-08T19:11:07Z

@jinchengchenghh Unfortunately my attempts for simple fixes for the decimal issue have been unsuccessful. I'm going to have to postpone decimal support to a follow up PR.
Also, in list of work I'm going to do in the follow up rather than here is the output size limit problem.

The planNode lifetime issue should be fixed but I have no way of verifying that.

Can you give another look to see if there's any showstopping bugs in this PR?

removing fixDictionaryIndices after cudf changed dictionary indices to signed

GregoryKimball · 2025-04-10T16:50:06Z

@Yuhta would you please share your review for this work? Are there CI or other blockers at this stage? (+ @pedroerp)

facebook-github-bot · 2025-04-14T23:52:29Z

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-04-15T02:14:36Z

@pedroerp merged this pull request in 0cba113.

jinchengchenghh · 2025-04-15T08:30:39Z

Thanks, I will verify it. @devavret

Summary: This PR adds a cuDF based OrderBy operator and tooling to replace existing Velox based operators. This includes: - CudfVector class that holds a cudf::table and is a replacement for Velox's RowVector when dealing with cuDF. - Interop code to convert between Velox and cuDF RowVectors. - CudfToVelox and CudfFromVelox operators that sit between the cuDF and Velox operators and handle the conversion of RowVectors to cudf::table and back. - A cuDF driver adapter that converts Velox operators to cuDF operators. - Nvtx tooling to help with profiling Pull Request resolved: facebookincubator#12735 Reviewed By: Yuhta Differential Revision: D73003714 Pulled By: pedroerp fbshipit-source-id: 5ac1e3db2d3754528802f51ded42b43e7250f191

jinchengchenghh · 2025-04-25T14:56:02Z

Thanks for your actively update, I have verified in Gluten, all the issues are resolved. @devavret

Operators before adapting for cuDF: count [2]
  Operator: ID 0: ValueStream[0] 0
  Operator: ID 1: OrderBy[1] 1
Operators after adapting for cuDF: count [4]
  Operator: ID 0: ValueStream[0] 0
  Operator: ID 1: CudfFromVelox[1-from-velox] 1
  Operator: ID 2: CudfOrderBy[1] 2
  Operator: ID 3: CudfToVelox[1-to-velox] 3

devavret added 18 commits March 11, 2025 14:58

Add minimal code that demonstrates cudf integration

cf55f95

Remove all other operators from ToCudf

2c03cbf

Make it compile

e846132

remove unused interop code

5ee96b3

Merge branch 'main-rapids' into cmake-upstreaming

d8feb54

Pin rmm and kvikio

46f0a94

Remove manually adding fmt

47913ae

Remove debug prints

5effbf2

Re-enable some warnings

0e43fc1

update cmake on centos

9dc09e1

Add our team to codeowners

e0ccfbd

Check off some todos

d00db4f

remove commented tests

5accb50

Ignore known warnings just for cudf_exec target

5fe2575

Misc review fixes:

c9b2c1a

- West const - header cleanup - nodiscard

Misc review changes requested by @bdice

5a75b9e

Remove only cudf adapter

3f4ca09

Add clang format to our subdir

6f7d72e

devavret requested review from assignUser and majetideepak as code owners March 20, 2025 15:28

bdice reviewed Mar 20, 2025

View reviewed changes

bdice mentioned this pull request Mar 20, 2025

Experimental RAPIDS cuDF Backend for Velox #12412

Closed

jinchengchenghh reviewed Mar 20, 2025

View reviewed changes

Comment thread velox/experimental/cudf/exec/CudfConversion.cpp Outdated

Comment thread velox/experimental/cudf/exec/CudfConversion.cpp Outdated

Fix style

2d679a4

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2025

Fix naming

757bce9

jinchengchenghh reviewed Mar 21, 2025

View reviewed changes

Comment thread velox/experimental/cudf/exec/CudfConversion.cpp Outdated

Comment thread velox/experimental/cudf/exec/CudfConversion.cpp Outdated

Comment thread velox/experimental/cudf/exec/CudfOrderBy.cpp Outdated

jinchengchenghh reviewed Apr 4, 2025

View reviewed changes

devavret added 2 commits April 7, 2025 11:21

cmake min required

7f723db

Make sure last operator from task produces velox RowVector

bee2494

jinchengchenghh reviewed Apr 7, 2025

View reviewed changes

Cudf driver adapter without storing plan nodes

5575205

devavret and others added 3 commits April 8, 2025 20:01

re-fix conversion to RowVector in sink

01f1aaf

Merge branch 'main' into cmake-upstreaming

4e0476d

remove fixDictionaryIndices

652825f

removing fixDictionaryIndices after cudf changed dictionary indices to signed

devavret added 2 commits April 11, 2025 07:50

Merge branch 'main-meta' into cmake-upstreaming

51abf30

Remove codeowners for now

13a4a87

Yuhta approved these changes Apr 14, 2025

View reviewed changes

Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Apr 14, 2025

facebook-github-bot closed this in 0cba113 Apr 15, 2025

facebook-github-bot added the Merged label Apr 15, 2025

prestodb-ci mentioned this pull request Apr 15, 2025

Rebase branch velox_pr_rebase (cca757a) with oss-main (0cba113) oap-project/velox#541

Closed

1 task

devavret mentioned this pull request Jun 10, 2025

feat: Add a CudfTpchBenchmark #13695

Closed

GregoryKimball added this to libcudf Jul 30, 2025

GregoryKimball moved this to Landed in libcudf Jul 30, 2025

GregoryKimball removed this from libcudf Sep 3, 2025


		DECLARE_bool(velox_cudf_enabled);
		DECLARE_string(velox_cudf_memory_resource);

Conversation

devavret commented Mar 20, 2025

Uh oh!

netlify bot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 20, 2025

Action Required

Process

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh commented Apr 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devavret Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karthikeyann Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devavret commented Apr 8, 2025

Uh oh!

GregoryKimball commented Apr 10, 2025

Uh oh!

facebook-github-bot commented Apr 14, 2025

Uh oh!

facebook-github-bot commented Apr 15, 2025

Uh oh!

jinchengchenghh commented Apr 15, 2025

Uh oh!

jinchengchenghh commented Apr 25, 2025

Uh oh!

Reviewers

netlify bot commented Mar 20, 2025 •

edited

Loading

devavret Apr 7, 2025 •

edited

Loading

karthikeyann Apr 7, 2025 •

edited

Loading