Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an error in return buffer values on UDF #505

Merged
merged 5 commits into from
Oct 2, 2022

Conversation

guilhermeleobas
Copy link
Contributor

@guilhermeleobas guilhermeleobas commented Jul 15, 2022

Currently, RBC uses the Numba calling convention and bet that LLVM can optmize the code to the point the result IR conforms to the HeavyDB calling convention. This pull request is the first patch to fix this, so we don't end up with unwanted unreachable() instructions.

Calling convention

Given the function fn declared below which returns a copy of the input array. Numba will generate an LLVM IR which stores the input array pointer into the return pointer.

def fn(a: Array) -> Array:
    ret = a
    return ret
define i32 @fn__iner_cpu_0({ i64*, i64, i8 }** noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture %excinfo, { i64*, i64, i8 }* %arg.a) {
entry:
  br label %B0

B0:                                               ; preds = %entry
  store { i64*, i64, i8 }* %arg.a, { i64*, i64, i8 }** %retptr, align 8
  ret i32 0
}

By the Numba calling convention

  1. Return type is passed as the first argument to the function (%retptr)
  2. Second argument is a pointer to an exception info struct. Basically, it holds if the function raised an exception or not. Since we don't support exceptions, this struct is not used and deleted by DCE.
  3. Third argument is a pointer to the input array.

RBC also generates a wrapper function that conforms with the HeavyDB calling convention:

  • If the UDF returns a scalar, its signature is udf(arg1, ..., argN) -> scalar
  • If its a struct, then the signature is udf(retptr*, arg1, ..., argN) -> void

The problem with the wrapper we generate is that we only copy the return pointer from the inner function, whereas it should be deep copied.

define void @"fn__wrapper_cpu_0"({i64*, i64, i8}* %".1", {i64*, i64, i8}* %".2"){
entry:
  %.4 = alloca { i64*, i64, i8 }*, align 8
  store { i64*, i64, i8 }* null, { i64*, i64, i8 }** %.4, align 8
  %excinfo = alloca { i8*, i32, i8* }*, align 8
  store { i8*, i32, i8* }* null, { i8*, i32, i8* }** %excinfo, align 8
  %.8 = call i32 @fn__inner_cpu_0({ i64*, i64, i8 }** %.4, { i8*, i32, i8* }** %excinfo, i64 %.2)
  %.9 = load { i8*, i32, i8* }*, { i8*, i32, i8* }** %excinfo, align 8
  %.10 = icmp eq i32 %.8, 0
  %.11 = icmp eq i32 %.8, -2
  %.14 = or i1 %.10, %.11
  %.15 = xor i1 %.14, true
  %.16 = icmp sge i32 %.8, 1
  %.18 = load { i64*, i64, i8 }*, { i64*, i64, i8 }** %.4, align 8
  br i1 %.15, label %entry.if, label %entry.endif, !prof !0

entry.if:                                         ; preds = %entry
  ret void

entry.endif:                                      ; preds = %entry
  %.21 = load { i64*, i64, i8 }, { i64*, i64, i8 }* %.18, align 8
  ; copies the return pointer from inner into the retptr (%.1)
  store { i64*, i64, i8 } %.21, { i64*, i64, i8 }* %.1, align 8
  ret void
}

This PR introduces a new method that Buffer subclasses can overload. This method is responsible for generating LLVM code that copies each buffer element into the return pointer.

@guilhermeleobas guilhermeleobas added the enhancement New feature or request label Jul 15, 2022
@guilhermeleobas guilhermeleobas self-assigned this Jul 15, 2022
@guilhermeleobas guilhermeleobas marked this pull request as ready for review July 15, 2022 18:25
Copy link
Contributor

@brenocfg brenocfg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested a few readability/QoL changes, otherwise LGTM!

Comment on lines 48 to 50
src = builder.extract_value(builder.load(val), 0)
element_count = builder.extract_value(builder.load(val), 1)
is_null = builder.extract_value(builder.load(val), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of unrelated to this PR, but should we start naming our IR instructions? I think it makes the IR a lot more readable during debugging.

For instance, these could be:

src = builder.extract_value(builder.load(val, "array_struct"), 0, "array_buff_ptr")
element_count = builder.extract_value(builder.load(val, "array_struct"), 1, "array_size")
is_null = builder.extract_value(builder.load(val, "array_struct"), 2, "array_is_null")

It would also help separate what is rbc-generated IR from numba-generated IR. I think as long as the instruction supports it, we should name it to reflect its counterpart in the code, so that the IR is close to as readable as the code that generates it.

Not only is it a bonus for readability, but I think it helps a lot in creating an "index" for generated IR instructions. E.g., if you're trying to look for which code generated some IR, you can CTRL+F for its name and usually find the corresponding builder call that emitted it.

with otherwise:
# we can't just copy the pointer here because return buffers need
# to have their own memory, as input buffers are freed upon returning
ptr = memalloc(context, builder, ptr_type, element_count, element_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed this before but I can't quite remember the answer. Who takes care of this memory once the UDF returns? Is it deallocated by heavydb's Array<T> destructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is deallocated by heavydb

@guilhermeleobas
Copy link
Contributor Author

I was wondering if the changes introduced in this PR isn't the responsibility of heavydb. Because I think the same errors will happen in C++ UDFs as well.

@pearu pearu force-pushed the guilhermeleobas/fix-ret-val branch from 40ce56c to 1338baf Compare July 30, 2022 19:42
Copy link
Contributor

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @guilhermeleobas! Overall, this looks good.

I have a couple of nits and a concern with element_count value being negative for non-fixed-length TextEncodingNone cases (assuming that the doc string is correct). Could you add a test for this case as see what happens and if there is need to use strlen to recompute the element_count?


zero, one, two = int32_t(0), int32_t(1), int32_t(2)
ptr = memalloc(context, builder, ptr_type, element_count, element_size)
cgutils.raw_memcpy(builder, ptr, src, element_count, element_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to
https://github.com/xnd-project/rbc/blob/542d0f61d372fb333a614553f28c09b00129a568/rbc/heavydb/text_encoding_none.py#L59
element_count contains a sensible value only for fixed-length TextEncodingNone value, otherwise memcpy with negative size (read: huge positive size) will likely lead to invalid memory access.

Is it possible to add a test that uses variable length TextEncodingNone inputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can have a text encoding none with negative size. Size should reflect the number of characters on ptr_. If such case happen, it is a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docstring is wrong. At least on HeavyDB, the size member always represents the length of the string.

with otherwise:
# we can't just copy the pointer here because return buffers need
# to have their own memory, as input buffers are freed upon returning
ptr = memalloc(context, builder, ptr_type, element_count, element_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some protection for the case where element_count is negative (as is in the case of var-length TextEncodingNone)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. I think the docstring is wrong. element_count should never be negative.


struct_load = builder.load(val)
src = builder.extract_value(struct_load, 0, name='text_buff_ptr')
element_count = builder.extract_value(struct_load, 1, name='text_size')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If element_count is negative, we could use strlen to compute the actual element_count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. element_count should never be negative.

@guilhermeleobas guilhermeleobas force-pushed the guilhermeleobas/fix-ret-val branch from 8c9e388 to 58bfe42 Compare August 3, 2022 03:18
@guilhermeleobas guilhermeleobas requested a review from pearu August 25, 2022 12:42
Copy link
Contributor

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, @guilhermeleobas !

@pearu pearu merged commit dae7760 into heavyai:main Oct 2, 2022
@guilhermeleobas guilhermeleobas deleted the guilhermeleobas/fix-ret-val branch January 24, 2023 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants