Skip to content

Conversation

@paleolimbot
Copy link
Member

This is a very WIP draft that currently just sketches a few things related to calling into R from other threads. Some code to get started:

arrow:::TestSafeCallIntoR(
  list(
    function() "string one",
    function() "string two"
  )
)
#> [1] "string one" "string two"

arrow:::TestSafeCallIntoR(
  list(
    function() stop("This is an error!")
  )
)
#> Error in (function () : This is an error!

@github-actions
Copy link

github-actions bot commented Mar 3, 2022

@github-actions
Copy link

github-actions bot commented Mar 3, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@paleolimbot
Copy link
Member Author

(See also westonpace#10)

@paleolimbot
Copy link
Member Author

paleolimbot commented Mar 25, 2022

Redoing this with an eye towards where I would actually like to use it! I think that it does need a synchronous Status<cpp_type> SafeCallIntoR<cpp_type>([]() { return r_api_call(); }), even if all the synchronous version does is error when it's not safe to execute R code. I think I have this working from other threads too but I'm too new to this to know exactly what I should be testing.

The places where I would prefer to use this in some other PRs:

Some sketch examples:

arrow:::TestSafeCallIntoR(
  function() "string one!",
  opt = "on_main_thread"
)
#> [1] "string one!"

arrow:::TestSafeCallIntoR(
  function() stop("This is an error"),
  opt = "on_main_thread"
)
#> Error in (function () : This is an error

arrow:::TestSafeCallIntoR(
  function() "string one!",
  opt = "async_with_executor"
)
#> [1] "string one!"

# This runs with the expected error, but causes subsequent segfaults, probably related
# to the error_token_ (maybe having to do with the copy-constructor?)

# arrow:::TestSafeCallIntoR(
#   function() stop("This is an error"),
#   opt = "async_with_executor"
# )

arrow:::TestSafeCallIntoR(
  function() "string one!",
  opt = "async_without_executor"
)
#> Error: NotImplemented: Call to R from a non-R thread without an event loop

@paleolimbot paleolimbot marked this pull request as ready for review March 25, 2022 19:29
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be an awesome capability. A few nits and thoughts but overall I think this is the right direction.

Comment on lines +33 to +34
// [[arrow::export]]
std::string TestSafeCallIntoR(cpp11::function r_fun_that_returns_a_string,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a precedent for this in the Arrow R package (a place to test C++ code from C++ that is hard to test from R). We probably don't want something like this running on CRAN, but I'm not sure what the best way is to fence this off / keep it from compiling anywhere except CI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug in too much too the code yet, but is this resolved with new commits, or do we still need to find a way to gate this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neal took a quick look and said it it's fine as long as there's a note as to where TestSafeCallIntoR is defined (there's some Altrep tests that do this, too)

@westonpace westonpace self-requested a review April 4, 2022 16:58
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean and easier to understand now. Thanks for figuring this out.

.onLoad <- function(...) {
if (arrow_available()) {
# Make sure C++ knows on which thread it is safe to call the R API
InitializeMainRThread()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for a fact that the R thread never changes? For example, in JS, there is always "one thread" but the actual thread id can change from iteration to iteration of the event loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked in the r-lib slack channel and nobody seems to feel that this will be a problem. They did advise to check parallel::mclapply() since this creates a fork of the process, but a check seems to indicate that the value of std::this_thread::get_id() seems to be stable if somebody does happen to do that:

cpp11::cpp_source(code = '
#include "cpp11.hpp"
#include <thread>
#include <sstream>

[[cpp11::register]]
std::string thread_id() {
  std::thread::id id = std::this_thread::get_id();
  std::stringstream ss;
  ss << id;
  return ss.str();
}
')

thread_id()
#> [1] "0x100e33d40"
unique(lapply(1:1e3, function(x) thread_id()))
#> [[1]]
#> [1] "0x100e33d40"
unique(parallel::mclapply(1:1e3, function(x) thread_id(), mc.cores = 8))
#> [[1]]
#> [1] "0x100e33d40"

});

thread_ptr->join();
delete thread_ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is probably fine but you could wrap thread_ptr in a unique_ptr. For example:

thread_ptr = std::unique_ptr<std::thread>(new std::thread(...));

It gets rid of the delete call and guards you against very unlikely things like ->join() throwing an exception and the memory never getting cleaned up (not that such a thing would really matter in test code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get this to work without a crash!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd. If you want to create a commit (doesn't have to be part of any PR) then I'd be happy to take a look and see what was going on. Otherwise, like I said, it isn't very important, so let's not worry too much about it.

# under the License.

# Note that TestSafeCallIntoR is defined in safe-call-into-r-impl.cpp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skip_on_cran()

Is this sufficient to make sure we don't test this on cran?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I added them inside the test_that() blocks as mostly a stylistic choice!)

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge when CI is green. Thank you!

@jonkeane jonkeane closed this in e110eac Apr 7, 2022
@ursabot
Copy link

ursabot commented Apr 8, 2022

Benchmark runs are scheduled for baseline = 76d064c and contender = e110eac. e110eac is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.13% ⬆️0.04%] test-mac-arm
[Failed ⬇️0.71% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.09% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/468| e110eac7 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/453| e110eac7 test-mac-arm>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/454| e110eac7 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/463| e110eac7 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/467| 76d064c7 ec2-t3-xlarge-us-east-2>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/452| 76d064c7 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/453| 76d064c7 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/462| 76d064c7 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@paleolimbot paleolimbot deleted the r-safe-call-into branch December 9, 2022 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants