Skip to content

Conversation

@paleolimbot
Copy link
Member

This PR makes it so that you can do the following without a warning:

library(arrow, warn.conflicts = FALSE)

register_scalar_function(
  "times_32",
  function(context, x) x * 32L,
  in_type = list(int32(), int64(), float64()),
  out_type = function(in_types) in_types[[1]],
  auto_convert = TRUE
)

register_scalar_function(
  "times_32",
  function(context, x) x * 32L,
  in_type = list(int32(), int64(), float64()),
  out_type = function(in_types) in_types[[1]],
  auto_convert = TRUE
)

In fixing that, I also ran across an important discovery, which is that cpp11::function does not protect the underlying SEXP from garbage collection!!!! It the two functions we used this for were being protected by proxy because the execution environment of register_scalar_function() was being saved when the binding was registered.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@pitrou
Copy link
Member

pitrou commented Oct 17, 2022

In fixing that, I also ran across an important discovery, which is that cpp11::function does not protect the underlying SEXP from garbage collection

Do you think that should be reported as a bug?

r/R/compute.R Outdated
# extra step to avoid saving this execution environment in the binding,
# which eliminates a warning when the same binding is registered twice
binding_fun <- function(...) build_expr(name, ...)
body(binding_fun)[[2]] <- name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what does this do? Why 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a less cryptic version of this...it's inlining the value of name into the function body:

binding_fun <- function(...) build_expr(name, ...)
str(as.list(body(binding_fun)))
#> List of 3
#>  $ : symbol build_expr
#>  $ : symbol name
#>  $ : symbol ...

@paleolimbot
Copy link
Member Author

Do you think that should be reported as a bug?

I opened an issue about it...it might have been intentional since protection isn't necessarily cheap compared to the function call itself.

@nealrichardson
Copy link
Member

@github-actions crossbow submit homebrew-r-autobrew

@github-actions
Copy link

Revision: 0facb9b

Submitted crossbow builds: ursacomputing/crossbow @ actions-6b8a20d344

Task Status
homebrew-r-autobrew Github Actions

@paleolimbot
Copy link
Member Author

It looks like it doesn't fix that specific test, although I still think we should merge this sooner than later just in case it was contributing to other failures!

arrow::Status ReadNext(std::shared_ptr<arrow::RecordBatch>* batch_out) {
auto batch = SafeCallIntoR<std::shared_ptr<arrow::RecordBatch>>([&]() {
cpp11::sexp result_sexp = fun_();
cpp11::sexp result_sexp = cpp11::function(fun_)();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it potentially defer any type check that would have occurred in the constructor before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point...I looked that up, and its current form, cpp11::function doesn't do any type checking ( https://github.com/r-lib/cpp11/blob/main/inst/include/cpp11/function.hpp#L17 )

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This looks reasonable on the face of it.

@pitrou pitrou requested a review from nealrichardson October 18, 2022 15:02
@pitrou
Copy link
Member

pitrou commented Oct 18, 2022

@nealrichardson @thisisnic Could you comment on the R parts?

@paleolimbot paleolimbot merged commit ebda85f into apache:master Oct 18, 2022
kou pushed a commit that referenced this pull request Oct 20, 2022
…e as the existing one (#14436)

This PR makes it so that you can do the following without a warning:

``` r
library(arrow, warn.conflicts = FALSE)

register_scalar_function(
  "times_32",
  function(context, x) x * 32L,
  in_type = list(int32(), int64(), float64()),
  out_type = function(in_types) in_types[[1]],
  auto_convert = TRUE
)

register_scalar_function(
  "times_32",
  function(context, x) x * 32L,
  in_type = list(int32(), int64(), float64()),
  out_type = function(in_types) in_types[[1]],
  auto_convert = TRUE
)
```

In fixing that, I also ran across an important discovery, which is that `cpp11::function` does *not* protect the underlying `SEXP` from garbage collection!!!! It the two functions we used this for were being protected by proxy because the execution environment of `register_scalar_function()` was being saved when the binding was registered.

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
@paleolimbot paleolimbot deleted the r-identical-udfs branch October 26, 2022 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants