Add quantile aggregate. #636

ienkovich · 2023-08-16T21:32:48Z

This patch introduces QUANTILE aggregate to compute precise quantiles.

Quantile supports multiple interpolations. This is to match pandas' quantile options for Modin.

Quantile implementation is similar to approx quantile implementation except it collects all values and computes the precise quantile value. Quantile objects simply collect all values into vectors of raw buffers, memory is allocated via RowSetMemoryOwner in 1-64KB chunks on demand. On reduction, all buffers are collected together. At the finalization, a special chunked iterator is used to call std::nth_element to get values affecting the quantile result in proper places. It might seem from the code that we don't finalize quantiles until data is fetched from the ResutSet but in fact, it is implicitly finalized on rowCount call when we put the ResultSet into the registry (that's why I added a new timer there). In the future, I want to add an explicit finalization stage to be used for quantiles and topK and also avoid TargetValue materialization on rowCount.

For the quantile result type, I decided to keep the original argument type unless midpoint or linear interpolation is used for integer values. In the latter case, the result is fp64.

In the builder, I don't expand the string parser to support interpolations, and the quantile method is supposed to be used for non-default interpolations.

ienkovich · 2023-08-22T17:47:27Z

@kurapov-peter @alexbaden This finally passes Windows tests and is ready for review. I left most of the quantile runtime in the static code to avoid all those missing std:: symbols errors on for LLVM JIT on Windows. It shouldn't affect performance much and for GPU it is not available anyway.

kurapov-peter · 2023-08-23T10:34:07Z

omniscidb/QueryEngine/CgenState.cpp

-      maybeCloneFunctionRecursive(call.getCalledFunction(), is_l0);
+      // Ignore indirect calls (e.g. virtual function calls).
+      if (call.getCalledFunction()) {
+        maybeCloneFunctionRecursive(call.getCalledFunction(), is_l0);


Having this always called was a nice way to catch incorrect signatures or missing functions. I wonder if we could use something like isIndirectCall instead?

From LLVM docs: getCalledFunction Returns the function called, or null if this is an indirect function invocation or the function signature does not match the call signature.
So it's mostly the same as isIndirectCall but will also avoid SEGFAULT on wrong LLVM IR causing runtime exception instead later on LLVM check. Wrong LLVM appears from time to time in working branches when new features are added and having a proper error with a description is much more preferable.

We have a check in the maybeCloneFunctionRecursive that catches exactly that the function is not present in the module, but expected to be. Adding the if masks the problem, so we will detect it only on IR validation (worst case, after linking). I don't see how it is more preferable.

This is not the case. maybeCloneFunctionRecursive is not an IR integrity checker, it gets a declaration and checks if we have a definition of the function in addition to the declaration and copies this definition. At least for CPU, function definitions are optional in the runtime module (and we don't have them for quantile *_impl methods).

Introduced if-statement checks that we have a declaration for a call to be passed to maybeCloneFunctionRecursive. Otherwise, we would simply get a SEGFAULT. It would skip both indirect and incorrect calls, but it is OK because LLVM IR checkers would issue nice verbose errors about wrong calls, we don't need to construct our own checkers.

It is. Function definitions are optional, yes. What is not optional is the presence of the LLVM function in the module that we clone from. It can be a declaration as in your *_impl example. A missing declaration or implementation is an internal error, a bug, so the function should fail on assertion (SEGFAULT). This is precisely what happens since the first thing maybeCloneFunctionRecursive does is check if the function is null. It does not have a nice description, but that's easily fixable.

Integrity checking is not enough. Later stages of code generation do not distinguish between a declaration of a *_impl function (without definition) we should have provided and an intrinsic declaration provided by a GPU driver. Even though the IR is "correct", PTX/SPIRV generation will fail. This leads to CPU fallback, masking a real bug. Cloning a function from the runtime module should not silently ignore missing functions.

I don't see why you want to duplicate such checks. If a function is not present then you won't be able to emit call instruction. And existence of declarations for runtime module code is checked on runtime module compilation. And the problem with a function declaration not provided by GPU driver cannot be solved in this code anyway.

I'm OK with replacing the check with isIndirectCall though.

The reason is simply that I had such cases with GPU (both L0 and CUDA). The functions missing are not those that we emit a call to, but those that are copied from the runtime module recursively. It doesn't solve the problem, but not having an assertion failure for those functions that are not present makes it even worse.

omniscidb/QueryEngine/GroupByRuntime.cpp

omniscidb/QueryEngine/RowFuncBuilder.cpp

omniscidb/ResultSet/RowSetMemoryOwner.cpp

omniscidb/Shared/quantile.h

kurapov-peter · 2023-08-23T11:25:03Z

omniscidb/Shared/quantile.h

+
+    size_t res = cur_idx_;
+    for (size_t i = 0; i < chunks_.size() - 1; ++i) {
+      res += chunks_[i].size;


The size does not match the number of pushed elements here, right? Looks like vector's capacity.

Yes, but all chunks except the last one have to be full.

Signed-off-by: ienkovich <[email protected]>

ienkovich requested review from alexbaden and kurapov-peter August 16, 2023 21:32

ienkovich force-pushed the ienkovich/quntile-3 branch from e794309 to d70b2e5 Compare August 17, 2023 17:52

kurapov-peter reviewed Aug 23, 2023

View reviewed changes

ienkovich added 3 commits August 23, 2023 17:27

Add quantile aggregate.

52ee281

Signed-off-by: ienkovich <[email protected]>

Fix Windows build.

6b12a48

Signed-off-by: ienkovich <[email protected]>

Move quantile runtime to static code.

1e39f96

Signed-off-by: ienkovich <[email protected]>

ienkovich force-pushed the ienkovich/quntile-3 branch from b820de4 to d9dcf5b Compare August 23, 2023 22:49

ienkovich requested a review from kurapov-peter August 23, 2023 22:50

Fix review comments.

e0abac4

Signed-off-by: ienkovich <[email protected]>

ienkovich force-pushed the ienkovich/quntile-3 branch from d9dcf5b to e0abac4 Compare August 24, 2023 16:11

kurapov-peter approved these changes Aug 24, 2023

View reviewed changes

ienkovich merged commit c603402 into main Aug 24, 2023

ienkovich deleted the ienkovich/quntile-3 branch August 24, 2023 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantile aggregate. #636

Add quantile aggregate. #636

ienkovich commented Aug 16, 2023

ienkovich commented Aug 22, 2023

kurapov-peter Aug 23, 2023

ienkovich Aug 23, 2023

kurapov-peter Aug 23, 2023

ienkovich Aug 23, 2023

kurapov-peter Aug 24, 2023

ienkovich Aug 24, 2023

kurapov-peter Aug 24, 2023

kurapov-peter Aug 23, 2023

ienkovich Aug 23, 2023

Add quantile aggregate. #636

Add quantile aggregate. #636

Conversation

ienkovich commented Aug 16, 2023

ienkovich commented Aug 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment