ARROW-9235: [R] Support for `connection` class when reading and writing files #12323

paleolimbot · 2022-02-02T17:49:10Z

This is a PR to support arbitrary R "connection" objects as Input and Output streams. In particular, this adds support for sockets (ARROW-4512), URLs, and some other IO operations that are implemented as R connections (e.g., in the archive package). The gist of it is that you should be able to do this:

# remotes::install_github("paleolimbot/arrow/r@r-connections")
library(arrow, warn.conflicts = FALSE)

addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet"

stream <- arrow:::make_readable_file(addr)
rawToChar(as.raw(stream$Read(4)))
#> [1] "PAR1"
stream$close()

stream <- arrow:::make_readable_file(url(addr, open = "rb"))
rawToChar(as.raw(stream$Read(4)))
#> [1] "PAR1"
stream$close()

There are two serious issues that prevent this PR from being useful yet. First, it uses functions that R considers "non-API" functions from the C API.

> checking compiled code ... NOTE
  File ‘arrow/libs/arrow.so’:
    Found non-API calls to R: ‘R_GetConnection’, ‘R_ReadConnection’,
      ‘R_WriteConnection’
  
  Compiled code should not call non-API entry points in R.

We can get around this by calling back into R (in the same way this PR implements Tell() and Close()). We could also go all out and implement the other half (exposing InputStream/OutputStreams as R connections) and ask for an exemption (at least one R package, curl, does this). The archive package seems to expose connections without a NOTE on the CRAN check page, so perhaps there is also a workaround.

Second, we get a crash when passing the input stream to most functions. I think this is because the Read() method is getting called from another thread but it also could be an error in my implementation. If the issue is threading, we would have to arrange a way to queue jobs for the R main thread (e.g., how the later package does it) and a way to ping it occasionally to fetch the results. This is complicated but might be useful for other reasons (supporting evaluation of R functions in more places). It also might be more work than it's worth.

# remotes::install_github("paleolimbot/arrow/r@r-connections")
library(arrow, warn.conflicts = FALSE)

addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet"
read_parquet(addr)

*** caught segfault ***
address 0x28, cause 'invalid permissions'

Traceback:
 1: parquet___arrow___FileReader__OpenFile(file, props)

github-actions · 2022-02-02T17:49:30Z

https://issues.apache.org/jira/browse/ARROW-9235

github-actions · 2022-02-02T17:49:32Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

nealrichardson · 2022-02-03T00:38:33Z

Without having looked closely at the code, I suspect you're right about threading. IIUC R's memory allocation is not thread safe, so we can't call R functions that allocate in R with multithreading. In the conversion code, @romainfrancois did something clever to distinguish the things that could run in parallel from the things that could not. Another option to explore here, perhaps to confirm the issue, would be to do all of the things to disable multithreading (thread pools to 1, arrow.use_threads = FALSE) and see if that makes the crash go away.

paleolimbot · 2022-02-07T20:20:39Z

The Parquet error was, fortunately, not a concurrency issue, but an assumption that the input would be a RandomAccessFile (the tests indicate that this was also a problem for the FeatherReader in the past). I've added a test so that this case fails in the same way:

# remotes::install_github("paleolimbot/arrow/r@r-connections")
library(arrow, warn.conflicts = FALSE)
addr <- "https://github.com/apache/arrow/raw/master/r/inst/v0.7.1.parquet"
read_parquet(addr)
#> Error: file must be a "RandomAccessFile"

I've also removed references to the R_ext/Connections.h header that was causing the CMD check issue...no need to poke that bear yet.

Tomorrow I'll implement RandomAccessFile, which should theoretically be possible for file() connections and check the other readers for segfaults/threading issues.

paleolimbot · 2022-02-15T16:55:49Z

OK, I think this is ready for review.

There is a threading issue that cause read_feather() and read_csv_arrow() to fail, although I think I implemented this such that there will not be any calls into R from another thread (we get an IOError instead). There was a point where I had both of those working on MacOS and Windows, but there were crashes on Linux (and in all cases R was getting called from another thread, so it might have crashed). Interestingly, read_parquet() seems to be fine, but maybe this is only because it's a very small file. I am well out of my depth in dealing with concurrency and perhaps I am missing how we have dealt with this in other parts of the R package.

In general, all writers work and any readers work that don't call the stream's Read() method from another thread. One of the original tickets for this requested support to stream tables over a socketConnection() (ARROW-4512), which should work with this PR in both directions. The other ticket requested support for Parquet reading and writing, which also seems to work.

Reprex:

# remotes::install_github("paleolimbot/arrow/r@r-connections")
library(arrow, warn.conflicts = FALSE)
tbl <- tibble::tibble(x = 1:5)

# all the writers I know about just work
tf_parquet <- tempfile()
write_parquet(tbl, file(tf_parquet))

tf_ipc <- tempfile()
write_ipc_stream(tbl, file(tf_ipc))

tf_feather <- tempfile()
write_feather(tbl, file(tf_feather))

tf_csv <- tempfile()
write_csv_arrow(tbl, file(tf_csv))

# some readers work...
read_parquet(file(tf_parquet))
#> # A tibble: 5 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5
read_ipc_stream(file(tf_ipc))
#> # A tibble: 5 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5

# ...except the ones that read from other threads
read_feather(file(tf_feather))
#> Error: IOError: Attempt to call into R from a non-R thread
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/io/interfaces.cc:157  Seek(position)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/reader.cc:1233  ReadFooter()
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/reader.cc:1720  result->Open(file, footer_offset, options)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/feather.cc:713  RecordBatchFileReader::Open(source_, options_)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/feather.cc:793  result->Open(source, options)
read_csv_arrow(file(tf_parquet))
#> Error in `handle_csv_read_error()` at r/R/csv.R:198:6:
#> ! IOError: Attempt to call into R from a non-R thread
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/io/interfaces.cc:86  stream_->Read(block_size_)

# ...even with use_threads = FALSE
options(arrow.use_threads = FALSE)
read_feather(file(tf_feather))
#> Error: IOError: Attempt to call into R from a non-R thread
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/io/interfaces.cc:157  Seek(position)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/reader.cc:1233  ReadFooter()
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/reader.cc:1720  result->Open(file, footer_offset, options)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/feather.cc:713  RecordBatchFileReader::Open(source_, options_)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/ipc/feather.cc:793  result->Open(source, options)
read_csv_arrow(file(tf_parquet))
#> Error in `handle_csv_read_error()` at r/R/csv.R:198:6:
#> ! IOError: Attempt to call into R from a non-R thread
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/io/interfaces.cc:86  stream_->Read(block_size_)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/util/iterator.h:270  it_.Next()
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/csv/reader.cc:996  buffer_iterator_.Next()

^{Created on 2022-02-15 by the reprex package (v2.0.1)}

nealrichardson · 2022-02-22T22:09:01Z

@westonpace can you take a look at this threading issue? I believe the constraint is that we can't call any R function that will allocate memory from other threads, so it would not surprise me that the Read methods would have to be called single-threaded. But surely that's something we should have the ability to control.

cc @romainfrancois since I know you've fought with this in the past.

paleolimbot · 2022-02-28T19:58:53Z

Just leaving a ping here since I'm back from vacation an am ready to pick this up with some feedback on whether or not what I've done here is a reasonable approach to the concurrency limitations in Arrow (or whether or not anything can be done about the concurrency limitations from the Arrow end of things).

westonpace · 2022-02-28T22:31:56Z

TL;DR: We can solve this, we probably want to solve this, but it will involve some C++ effort.

Sorry for the delay in looking at this. We certainly have some options here. This is a great chance to start putting some scaffolding we've laid down to good use. The fact that the parquet reader works here is actually a fluke that we will someday fix (:laughing:) with ARROW-14974.

There are two thread pools in Arrow. The CPU thread pool and the I/O thread pool. The CPU thread pool has one thread per core and these threads are expected to do lots of heavy CPU work. The I/O thread pool may have few threads (e.g. if using a hard disk) or it may have many threads (e.g. if using S3) and these threads are expected to spend most of their time in the waiting state.

CPU threads should generally not block for long periods of time. So when they have to do something slow (like read from disk) they put the task on the I/O thread pool and add a callback on the CPU thread pool to deal with the result.

When use_threads is false we typically interpret that as "don't use up a bunch of CPU for this task" and we limit the CPU thread pool. Ideally we limit it to the calling thread. In some cases (e.g. execution engine) we limit it to one CPU thread and the calling thread (though I'm working on that as we speak).

What we don't usually do is limit the I/O thread pool in any way. We have the tooling to do this (basically the queue that you mentioned) but will need to do some work to wire everything up. We can probably come up with a "limit all CPU and I/O tasks to the R thread" solution more easily than a "use the CPU thread pool for CPU tasks but limit all I/O tasks to the R thread" but the latter should probably be possible. It will also be easier to support the whole-table readers & writers initially and then later add support for streaming APIs.

Also, this will have some performance impact when reading multiple files. For example, if you were to read a multi-file dataset from curl you would generally want to issue parallel HTTP reads but if we're only allowed to use a single thread for the read then that will not work. Although, we could probably address that particular performance impact if the underlying technology has support for an asynchronous API (as it seems that R's curl package does) so we can have three thread pools! (:dizzy:)

What's the timing on this? I'm a little busy at the moment but I should be able to find some time this week to sketch a solution for the read_feather call (which could be adapted for read_csv or I could sketch the solution for read_csv first).

paleolimbot · 2022-03-01T13:49:49Z

Thanks!

What's the timing on this?

There is no particular rush on read_feather() and read_csv_arrow() working with R connections. It doesn't have to be solved for this PR, either, although if this PR is merged it would be best to fix before the next CRAN release.

We can probably come up with a "limit all CPU and I/O tasks to the R thread" solution more easily than a "use the CPU thread pool for CPU tasks but limit all I/O tasks to the R thread" but the latter should probably be possible.

I'm still wrapping my head around the specifics here, but because they might be related I'll list the "calling the R thread" possibilities I've run into recently in case any of them makes one of those options more obvious to pursue.

This PR, when a user wants to use some Arrow machinery but needs to implement the InputStream or OutputStream as R functions because for whatever reason the filesystem/input stream type isn't implemented in Arrow C++ or the R bindings
A user has a RecordBatchReader where calling the get next batch method is an R function. I haven't had time to look into it properly but this crashes every time I've tried to put it into the query engine (works for read_table(), though). Possibly related is a RecordBatchReader.from_batches() that was imported from Python via the C interface, which also crashes when put into the query engine (but not read_table()).
An extension type implemented in R that has a custom ExtensionEquals() method (just starting this in ARROW-15471: [R] ExtensionType support in R #12467).
A compute function that wraps an R function (e.g., for things like geospatial operators whose external dependencies are impractical or impossible to include in the arrow R package)

From the R end, I know there is a way to request the evaluation of something on the main thread from elsewhere; however, there needs to be an event loop on the main thread checking for tasks for that to work. I don't know much about it but I do know it has been used elsewhere for packages like Shiny and plumber that accept HTTP requests and funnel them to R functions.

Although, we could probably address that particular performance impact if the underlying technology has support for an asynchronous API (as it seems that R's curl package does)

In my mind, supporting R connections is more about providing a (possibly slow) workaround for things that Arrow C++ or the R bindings can't do yet (e.g., URLs). I do know that the async API for curl from the R end is along the lines of open_async(url, function(chunk, is_last_chunk)). R connections are a pain and if there are more use-cases along these lines it might be worth investing in some C struct definitions where its clear that callable members must be thread safe.

paleolimbot

This is ready for a re-review! All the writers and readers that I know about work (see reprex in 'details'), although the Parquet reader working is actually a bug (ARROW-14974). The details of wrapping the read calls in RunWithCapturedR() needs to be finalized...I've made it work here but we need a better pattern for this before this can be merged. We could also "mark" connection InputStream/Reader objects as requiring RunWithCapturedR() to avoid any issues this might cause (since the read functions are some of the most heavily used functions).

Details

# remotes::install_github("paleolimbot/arrow/r@r-connections")
library(arrow, warn.conflicts = FALSE)
tbl <- tibble::tibble(x = 1:5)

# all the writers I know about just work
tf_parquet <- tempfile()
write_parquet(tbl, file(tf_parquet))

tf_ipc <- tempfile()
write_ipc_stream(tbl, file(tf_ipc))

tf_feather <- tempfile()
write_feather(tbl, file(tf_feather))

tf_csv <- tempfile()
write_csv_arrow(tbl, file(tf_csv))

# the readers now work too
read_parquet(file(tf_parquet))
#> # A tibble: 5 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5
read_ipc_stream(file(tf_ipc))
#> # A tibble: 5 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5
read_feather(file(tf_feather))
#> # A tibble: 5 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5
read_csv_arrow(file(tf_csv))
#> # A tibble: 5 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5

paleolimbot · 2022-04-11T17:28:43Z

r/src/csv.cpp

In order for the connection thing to work for read_csv_arrow(), we need to wrap table_reader->Read() with RunWithCapturedR(), but we need a cleaner way to do it than what I have here!

Luckily we have table_reader->ReadAsync() for just this purpose (I think)

This collapses things down nicely!

Actually, table_reader->ReadAsync() seems to crash (or at least the test output abruptly ends) on Windows (the method you suggested for the Feather reader works fine though).

https://github.com/apache/arrow/runs/5996202038?check_suite_focus=true#step:23:4464

https://github.com/apache/arrow/runs/5996202147?check_suite_focus=true#step:23:3993

https://github.com/apache/arrow/runs/5996202233?check_suite_focus=true#step:23:3991

paleolimbot · 2022-04-11T17:29:09Z

r/src/feather.cpp

In order for the connection thing to work for read_csv_arrow(), we need to wrap reader->Read() with RunWithCapturedR(), but we need a cleaner way to do it than what I have here!

Unfortunately we do not have a reader->ReadAsync here. There is arrow::ipc::RecordBatchFileReader::ReadRecordBatchAsync but it is private at the moment and also, not exposed to arrow::feather at all.

So we could create a JIRA to expose that in C++. In the meantime, the "standard" way to solve "the underlying file reader doesn't have a true asynchronous implementation" is to do:

const auto& io_context = arrow::io::default_io_context(); auto fut = io_context.executor()->Submit(...);

That should work ok here.

This is much better!

...I had to skip RunWithCapturedR() here (and hence connection reading will not work) on 32-bit windows with RTools35...I'm slightly worried that I don't know why the feather reading doesn't work on old windows but the csv reading does (which uses RunWithCapturedR() and the IO thread pool but isn't skipped).

r/src/feather.cpp

r/tests/testthat/test-feather.R

westonpace

I think we can clean up those threads pretty easily. If the reader supports async we use it's async methods. If it doesn't support async then we spawn each call to the I/O thread pool.

r/R/io.R

westonpace · 2022-04-12T02:47:11Z

r/src/csv.cpp

Luckily we have table_reader->ReadAsync() for just this purpose (I think)

westonpace · 2022-04-12T02:52:03Z

r/src/feather.cpp

Unfortunately we do not have a reader->ReadAsync here. There is arrow::ipc::RecordBatchFileReader::ReadRecordBatchAsync but it is private at the moment and also, not exposed to arrow::feather at all.

So we could create a JIRA to expose that in C++. In the meantime, the "standard" way to solve "the underlying file reader doesn't have a true asynchronous implementation" is to do:

const auto& io_context = arrow::io::default_io_context(); auto fut = io_context.executor()->Submit(...);

That should work ok here.

r/src/feather.cpp

r/src/io.cpp

westonpace · 2022-04-12T02:58:38Z

r/src/io.cpp

Might be nice if there was a SafeCallIntoR<void>. It can be a bit of a pain because you can't create Result<void> but with some template specialization we could probably make a version that returns Status.

I couldn't figure out how to get a template specialization to return a different type, but I implemented Status SafeCallIntoRVoid() {} that reads much better.

westonpace · 2022-04-12T03:04:49Z

r/src/io.cpp

In the category of "probably not worth the effort & complexity but pedantic note for the purists", it might be slightly nicer if you overloaded the async versions of the arrow::io::FileInterface methods and changed the sync versions to call the async verisons (instead of vice-versa).

The only real gain is that you can avoid spinning up an I/O thread for no reason. I/O threads are supposed to block on long operations so it isn't really a problem but it is a slight bit of extra overhead that isn't necessary.

In fact, now that I draw this picture, I wonder if there might be some way to handle this by having an "R filesystem" whose I/O context's executor was the R main thread 🤔 . Not something we need to tackle right now.

I spent a bit of time trying this but I don't feel that I know how to do it safely...for CloseAsync() it feels like we run the risk of the event loop ending before the file is actually closed; for ReadAsync() feels like it's better suited to something like "open a new connection, read from it, and close it" and I end up with nested futures that feel wrong to me. I'm happy to take another try at this if you think it's important for this usage.

No need for another try. As I said, this is venturing into somewhat pedantic waters.

r/tests/testthat/test-feather.R

westonpace

I think this is probably ok how it is but I noticed you moved to ReadAsync for the CSV reader, then it looks like you had trouble with Windows, so you moved to submitting a call to the I/O executor, then it looks like you still had trouble with Windows and so disabled this on old Windows. Can you move back to ReadAsync? Or is there still trouble with Windows in that situation?

Co-authored-by: Weston Pace <[email protected]>

…ndows

paleolimbot · 2022-04-21T16:50:44Z

@github-actions crossbow submit test-r-versions

paleolimbot · 2022-04-21T16:59:42Z

I noticed you moved to ReadAsync for the CSV reader, then it looks like you had trouble with Windows, so you moved to submitting a call to the I/O executor, then it looks like you still had trouble with Windows and so disabled this on old Windows.

For the CSV reader, ReadAsync() crashed all the Windows jobs:

The next Windows trouble was with the Feather reader, which crashed on 32-bit Windows using R 3.6. We also don't build with dataset on that platform, perhaps for a similar reason. We had an internal on_old_windows() function to check for that exact platform, so I just used that to skip connection reading for that platform.

There was a crash on R 3.4 (ARROW-16201)...I added a small fix here to make sure that's an error and not a crash (we don't have to support R 3.4 as of tomorrow's release of R 4.2 but it was a fast fix).

github-actions · 2022-04-21T19:49:14Z

Revision: 76ae1ab

Submitted crossbow builds: ursacomputing/crossbow @ actions-1899

Task	Status
test-r-versions

westonpace · 2022-04-22T03:16:01Z

Got it, makes sense. Thanks for the info. Let's stick with what you have then.

nealrichardson · 2022-04-22T11:46:46Z

segfault on R 3.4:

Start test: SafeCallIntoR works from the main R thread
  test-safe-call-into-r.R:23:3 [success]
  test-safe-call-into-r.R:28:3 [success]
End test: SafeCallIntoR works from the main R thread

Start test: SafeCallIntoR works within RunWithCapturedR

 *** caught segfault ***
address 0x1002d4, cause 'memory not mapped'

Traceback:
 1: .Call(`_arrow_TestSafeCallIntoR`, r_fun_that_returns_a_string,     opt)
 2: TestSafeCallIntoR(function() "string one!", opt = "async_with_executor")

paleolimbot · 2022-04-22T12:44:33Z

@github-actions crossbow submit test-r-versions

github-actions · 2022-04-22T13:19:35Z

Revision: 13ddd20

Submitted crossbow builds: ursacomputing/crossbow @ actions-1907

Task	Status
test-r-versions

paleolimbot · 2022-04-22T14:14:27Z

@github-actions crossbow submit test-r-versions

github-actions · 2022-04-22T14:25:25Z

Revision: 82e6b6d

Submitted crossbow builds: ursacomputing/crossbow @ actions-1909

Task	Status
test-r-versions

nealrichardson · 2022-04-22T15:57:38Z

@jonkeane build failures seem unrelated right?

jonkeane · 2022-04-22T16:11:22Z

@jonkeane build failures seem unrelated right?

Yeah, we're seeing them everywhere, fix(es) are #12945 and #12958

ursabot · 2022-04-26T07:31:12Z

Benchmark runs are scheduled for baseline = c16bbe1 and contender = 6cf344b. 6cf344b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Failed ⬇️0.38% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.29% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/591| 6cf344b6 ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/579| 6cf344b6 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/577| 6cf344b6 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/589| 6cf344b6 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/590| c16bbe18 ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/578| c16bbe18 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/576| c16bbe18 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/588| c16bbe18 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the Component: R label Feb 2, 2022

paleolimbot marked this pull request as draft February 2, 2022 18:28

github-actions bot added the Component: Parquet label Feb 7, 2022

paleolimbot force-pushed the r-connections branch from a36cb73 to 5fbe59e Compare February 10, 2022 13:23

paleolimbot marked this pull request as ready for review February 15, 2022 16:55

nealrichardson requested review from nealrichardson and westonpace February 22, 2022 22:06

paleolimbot force-pushed the r-connections branch 2 times, most recently from 342ec9b to a2a95eb Compare March 25, 2022 12:39

paleolimbot force-pushed the r-connections branch from a2a95eb to c8fb8e5 Compare April 8, 2022 16:51

paleolimbot commented Apr 11, 2022

View reviewed changes

nealrichardson reviewed Apr 11, 2022

View reviewed changes

r/tests/testthat/test-feather.R Outdated Show resolved Hide resolved

westonpace reviewed Apr 12, 2022

View reviewed changes

paleolimbot requested a review from westonpace April 14, 2022 15:08

paleolimbot mentioned this pull request Apr 14, 2022

ARROW-16201: [R] SafeCallIntoR on 3.4 #12887

Closed

westonpace approved these changes Apr 21, 2022

View reviewed changes

paleolimbot added 5 commits April 21, 2022 13:09

resolve merge conflicts

a37a311

start on using safecallintor

2308545

actually use safecallintor

c47564f

uncomment tests that should pass

94d40a9

use RunWithCapturedR for some specific functions

4e4dd72

paleolimbot and others added 9 commits April 21, 2022 13:09

clang-format

7e1593f

Update r/src/io.cpp

7e745cc

Co-authored-by: Weston Pace <[email protected]>

implement and use SafeCallIntoRVoid()

db9cecd

improve SafeCallIntoR usage in io.cpp

1751eaa

also use bigger table for ipc stream test

d4e24b1

complete thought on comment

29828d1

try the other async trick for the csv reader to see if it works on wi…

4f136b2

…ndows

don't use RunWithCapturedR for feather reading on old windows

220b79c

clang-format

4acdbf4

paleolimbot force-pushed the r-connections branch from 25ce363 to 4acdbf4 Compare April 21, 2022 16:10

try to avoid segfault on R 3.4

76ae1ab

paleolimbot added 2 commits April 22, 2022 09:22

skip one more SafeCallIntoR test on R 3.4

eac17c5

maybe pass CMD check on R 3.4

13ddd20

fix csv reading on R 3.4

82e6b6d

paleolimbot force-pushed the r-connections branch from e8dec37 to 82e6b6d Compare April 22, 2022 14:23

nealrichardson closed this in 6cf344b Apr 22, 2022

paleolimbot deleted the r-connections branch December 9, 2022 16:35

asfimport mentioned this pull request Jul 2, 2022

[R] Support for connection class when reading and writing files #25332

Closed

ARROW-9235: [R] Support for connection class when reading and writing files #12323

ARROW-9235: [R] Support for connection class when reading and writing files #12323

Uh oh!

Conversation

paleolimbot commented Feb 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 2, 2022

Uh oh!

github-actions bot commented Feb 2, 2022

Uh oh!

nealrichardson commented Feb 3, 2022

Uh oh!

paleolimbot commented Feb 7, 2022

Uh oh!

paleolimbot commented Feb 15, 2022

Uh oh!

nealrichardson commented Feb 22, 2022

Uh oh!

paleolimbot commented Feb 28, 2022

Uh oh!

westonpace commented Feb 28, 2022

Uh oh!

paleolimbot commented Mar 1, 2022

Uh oh!

paleolimbot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot commented Apr 21, 2022

Uh oh!

paleolimbot commented Apr 21, 2022

Uh oh!

github-actions bot commented Apr 21, 2022

Uh oh!

westonpace commented Apr 22, 2022

Uh oh!

ARROW-9235: [R] Support for `connection` class when reading and writing files #12323

ARROW-9235: [R] Support for `connection` class when reading and writing files #12323

paleolimbot commented Feb 2, 2022 •

edited

Loading

paleolimbot left a comment •

edited

Loading