Skip to content

Conversation

@zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Apr 24, 2020

Add following Datasets APIs to Java:

  • DatasetFactory
  • Dataset
  • Scanner
  • ScanTask
  • ScanTask.BatchIterator

Add a native dataset path to bridge c++ Datasets components to Java:

  • NativeDatasetFactory (c++ class : DatasetFactory)
  • NativeDataset (c++ class: Dataset)
  • NativeScanner (c++ class: DisposableScannerAdaptor)

Following c++ components are not JNI-mapped to keep the initial implementation simple:

  • Fragment
  • ScanTask
  • (arrow::)RecordBatchIterator

Add following API to FileSystemDatasetFactory to avoid passing file system objects via JNI bridge:

  • FileSystemDatasetFactory::Make( std::string uri, std::shared_ptr<FileFormat> format, FileSystemFactoryOptions options)

Unit tests are based on FileSystemDatasetFactory.

@github-actions
Copy link

@nealrichardson nealrichardson requested a review from bkietz April 24, 2020 14:43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not copy and paste this file, factor it out to a common place that can be used by all JNI implementations.

Copy link
Member Author

@zhztheplayer zhztheplayer Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move concurrent_map.h out into src/jni/? By the way, should we somehow migrate src/gandiva/jni to src/jni/gandiva, or migrate src/jni to src/arrow/jni? It seems that currently they are organized inconsistently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments below, I don't think we should be using concurrent_map in general, especially for new use cases so we can drop it for now.

By the way, should we somehow migrate src/gandiva/jni to src/jni/gandiva, or migrate src/jni to src/arrow/jni? It seems that currently they are organized inconsistently.
I don't have a strong opinion here probably best to discuss this on the mailing list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the file has been removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these macros are necessary. you should be able to use a normal (templated) function since you throw an exception instead of return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like there are a lot of FIXMEs can you address those?

Copy link
Member Author

@zhztheplayer zhztheplayer May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. fixmes like this one are by my mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@zhztheplayer zhztheplayer Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. But I kind of like the current pattern - it benefits more or less when debugging/tracing (self-growing id, instance cache, etc.), right? Is there any other consideration making it bad?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra memory and resolution overhead. The instance cache should be handled by the underlying allocator which should be highly optimized for returning pointer sized values.

Could you explain how a self-growing ID is useful for debugging purposes? If it is required would a DEBUG log with only an atomic int associated with each new ID be sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how a self-growing ID is useful for debugging purposes?

It may not be a considerable bit deal but growing ID numbers can be much smaller and more contextual. E.g. when we experience a JVM breakpoint debugging against some sort of abnormal duplicated buffer creation, if five different buffers with ID 10-15 are created one would easily remember it. Then if ID 12 is returned from JNI again then one instantly realizes that something must went wrong there.

If it is required would a DEBUG log with only an atomic int associated with each new ID be sufficient?

Yes I think if we are debugging with logs rather than breakpoints, what type of ID is used is indeed not that important because we can use control + F to highlight a specific instance anyway.

And as the performance impact is considered I don't have strong inclination. So let's just use pointers?

Another thing is we already use maps to cache JNI objects in other modules (gandiva-jni, orc-jni), we may need to unify the approaches somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file being introduced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is for passing datasets filter via JNI. I see Wes has recommended not to include filter bindings (in the bottom comment) in this version. I may consider removing it in a later commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if you have a filter API in Java, but I think you should try to limit the exposure of the particular details of the current C++ API in Java. If the Java API has a 1-1 correspondence with the C++ API that's what I'm advising you against

Copy link
Member Author

@zhztheplayer zhztheplayer Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think I would understand any of the concerns against a too detailed mapping. (actually currently Fragment is not mapped to Java-side so it's not yet a 1-1 mapping) I'll try remove more JNI stuffs.

As for filter, If I understand correctly it's OK to keep Java API but the JNI mapping for filters is considered fragile, right? I can remove the mapping anyway but when users read parquet files from Java they'll not be able to filter row groups to reduce I/O. Which is extremely important when low-selectivity filter is specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for filter, If I understand correctly it's OK to keep Java API but the JNI mapping for filters is considered fragile, right? I can remove the mapping anyway but when users read parquet files from Java they'll not be able to filter row groups to reduce I/O. Which is extremely important when low-selectivity filter is specified.

I'm advising you to define some kind of abstraction layer so that the details of the C++ API are not exposed in the Java API. So if there are changes in the C++ API, then refactoring can take place only at the JNI C++ code and no changes will be required in Java.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. At this time I'll try keeping them detached as far as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you start a thread on the mailing list about filter support. I'm concerned with a proto specific to the java implementation of datasets. Especially, because I think this essentially replicates what is in the Gandiva proto. We should ideally have one representation for all the languages that are going to bind to the same functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion, but they don't seem like the same functionality to me for dataset expression and Gandiva expression... In C++ code they are designed and placed differently and we'll face problems merging the JNI abstraction together. e.g. arrow::dataset::FieldExpression doesn't retain data type information (it is only represented by field name) where gandiva::FieldNode is type sensitive... I don't think it's a good idea to use any workaround to make them suitable for each other. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a solution off the top of my head. But creating a new java->C++ protobuf abstraction seems like the the wrong way to go.

As far as i can tell gandiva protobuf declare column type optional. This needs more design which is why a discussion/proposal on the ML is warranted.

It might make sense to split out the filtering components of this PR so it doesn't get held up by the conversation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I prefer removing them from this PR as I think of the implementation immature too. I can open ticket/discussion later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this out if it isn't implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed FilterImpl and Filter is now made a class rather than interface. I think it's not prepared to be a user-friendly interface so far.

I've marked Filter as experimental in class comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods need javadoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this "Iterator"?

Copy link
Member Author

@zhztheplayer zhztheplayer Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Iterator is OK but it may collide with java.util.Iterator in scenarios. Should it be "BatchIterator" or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect ScanTask.Iterator would be used consistently to avoid collision. BatchIterator is also code with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now it is changed to iterate ArrowRecordBatch rather than VectorSchemaRoot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be in a different common package if we want to wrap the C++ allocator. It isn't clear to me that we should do this because we lose java native heap caps IIUC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummm if I am not wrong by "java native heap" you meant jvm direct memory right? I think it's possible to new a new type of c++ allocator to allocate for direct mem but I suggest to do so in a separate ticket. We have orc adaptor doing things worse in this part and we can optimize them together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds ok to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably hold off on merging until we resolve this. Just because another part of the codebase has debt or weaknesses doesn't mean we should introduce more of that.

Copy link
Member Author

@zhztheplayer zhztheplayer Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jacques-n,

Thanks for your suggestion but I think the "weaknesses" of current implementation might be somehow overrated. Specifically for orc adaptor it's of course urgent to improve its memory allocation process because all of the allocated memory by orc is escaping from Java allocator. However we don't have the same problem here in Datasets as allocator is going to manage all the native memory. The only issue we may have is that OOM error might be thrown after a buffer is allocated rather than before we create it. But because OOM will be finally thrown and over-allocated memory will be finally released, the impact will be very very small.

And I knew we can possibly change to allocate Java memory directly from C++ via JNI, but codes will become messy and performance will turn to worse. That is why I didn't propose such a solution in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality, this is not at all the case. There are two ways that you run out of memory in Java: you run out of available allocation space OR you hit the limit at the JVM level. The limit at the JVM level can be influenced by things like fragmentation so in nearly all cases you hit that limit before the allocator managed limit. In your patch, you only are respecting the allocator set limit, not the jvm limit. This means a user could configure java to allow X of memory and the process actually will use 2X of memory, leading to issues in many systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality, this is not at all the case. There are two ways that you run out of memory in Java: you run out of available allocation space OR you hit the limit at the JVM level. The limit at the JVM level can be influenced by things like fragmentation so in nearly all cases you hit that limit before the allocator managed limit. In your patch, you only are respecting the allocator set limit, not the jvm limit. This means a user could configure java to allow X of memory and the process actually will use 2X of memory, leading to issues in many systems.

Making sense overall. So we may have to use a C++-to-Java/Netty allocation path in the first version right? I'll try it out.

Copy link
Contributor

@jacques-n jacques-n Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. When you allocate and deallocate memory in c++ you can call (via reflection) java.nio.Bits.reserveMemory() and unreserveMemory(). If you wanted to be extra fancy you could actually just do the reads/writes to the underlying atomics. Alternatively, you could build a pooling impl for C++ that uses the JNIEnv.NewDirectByteBuffer().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably belongs in the vector package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make these static methods? instead of using a singleton pattern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the way I got used to. Singleton makes it easier to extend things. As I noticed we use static pattern in other places I have changed to static now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, I think we static import assertEquals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

java/pom.xml Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, reverted. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't check the binary file into this repo. It would be better to generate a file on the fly, so readers of the test can understand what data exists there. If a standalone parquet file is really needed arrow-testing is the place to put it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did not mark the PR reviewable mainly because test codes are messy. I am working on refactoring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like a a bad idea to collect an iterator if the intention is for the iterator to provide laziness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is now moved into DisposableScannerAdaptor, which doesn't collect these tasks anymore. Also removed the collect function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: I think Pure C++ methods should be TranslateFilter

Copy link
Member Author

@zhztheplayer zhztheplayer Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done but I'll remove the entire function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

@wesm
Copy link
Member

wesm commented Apr 27, 2020

As a high level advisory: the Datasets C++ API should still be regarded as alpha-stage, so I recommend making the JNI bindings as minimal as possible (to satisfy required functionality) so that refactoring is not too painful.

Additionally, I recommend that you do not build JNI bindings for the Expression classes in arrow/dataset/filter.h -- this code is likely to change quite a bit as the associated query engine project gets off the ground this year.

@wesm wesm changed the title ARROW-7808: [Java][Dataset] Implement Datasets Java API ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++ Apr 27, 2020
@zhztheplayer
Copy link
Member Author

Hi @wesm @emkornfield , based on the discussion in the latest commit I removed some of the JNI mappings of c++ components. Now only Dataset(Factory) and Scanner are there to be bridged. Would you think of this a feasible direction?

@emkornfield
Copy link
Contributor

I'll let @wesm comment on the specifics of should be JNI bound.

What I would prefer not to see is a custom proto living on the java side solely for describing filters.

@zhztheplayer
Copy link
Member Author

What I would prefer not to see is a custom proto living on the java side solely for describing filters.

Yea the current approach is just working but proto-generated classes are totally unfriendly to use and evolve. I may intend to mark them all experimental, or just develop a new way in this PR, I don't know. Let me know if you have more suggestions on this.

@pitrou
Copy link
Member

pitrou commented Jan 7, 2021

@zhztheplayer Yes, all Arrow C++ code should be inside our own namespaces to avoid potential name clashes.
Also, for private functions and classes it is recommended to put them inside the anonymous namespace (in addition to the arrow::... namespace) to hide them from other modules being linked.

@zhztheplayer
Copy link
Member Author

zhztheplayer commented Jan 25, 2021

Hi @emkornfield @pitrou

I think I have addressed existing comments so would you like to take another look? Thanks a lot.

Based on some recent comments I think we may don't have to make 100% consensus on the current design of ReservationListenableMemoryPool or relevant things instantly, if this part is in a way controversial I could split out the part from this PR then create a new one for it if we can possibly get this merged.

And if you have other suggestions on the whole design or architecture of the basic functionality please be free to add comments and I am happy to make further refactors accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format_id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this causes an extra copy just construct directly with chars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Didn't know the constructor implies a copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since copied is unused just pass a nullptr (according to JNI docs it seems like this is allowed. /copied=/nullptr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Done

Copy link
Contributor

@emkornfield emkornfield Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked_cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reinterpret_cast. Correct me if I am wrong we cannot use arrow::internal::checked_cast (if you meant to use this one) here since class _jobject doesn't include a virtual members for dynamic_cast: https://github.com/AdoptOpenJDK/openjdk-jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/javavm/export/jni.h#L67-L80

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no reason for the pimpl pattern if both interface and implementation are bot in the .cpp file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if there aren't any tests that test this functionality, then I would recommend splitting this out to another .h/.cc file under this directory so tests can be written.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some refactors on jni_wrapper.cpp. Now there is a new source file jni_util.cpp/jni_util.h for storing util members in namespace arrow::dataset::jni.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some unit tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Ithink this can be an anonymous namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. Do we still have to do this after the refactors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for protected an not private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is misleading, it should be less then int32 max value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we separate out allocation tests to their own file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zhztheplayer
Copy link
Member Author

zhztheplayer commented Feb 3, 2021

Hi @emkornfield. I've addressed the new comments within the latest commit, please review. Previous commits have been merged into a single one.

namespace dataset {
namespace jni {

class JniPendingException : public std::runtime_error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be in the header? Please add docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these error handling tools back to jni_wrapper.cc.

I was thinking of preparing some common utilities for both jni/dataset jni/orc (or maybe gandiva/jni). But we can actually do this later in some other tickets. The utilities also can not work well without the macros in jni_wrapper.cc.

Added some docs to other interfaces in this header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think this is a good thought and it would be great to see some refactoring here to share common code. Thank you for taking the time to address all the concerns around this.

* Schema utility class including serialization and deserialization.
*/
public class SchemaUtility {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private constructor to prevent initialization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments and then I think we can merge this.

@zhztheplayer
Copy link
Member Author

Thanks @emkornfield. I've made updates accordingly.

@ZhaoNeil
Copy link

Hi, I have a question about why Java implementation still uses C++ memory allocation?

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Add following Datasets APIs to Java:

- DatasetFactory
- Dataset
- Scanner
- ScanTask
- ScanTask.BatchIterator

Add a native dataset path to bridge c++ Datasets components to Java:

- NativeDatasetFactory (c++ class : DatasetFactory)
- NativeDataset (c++ class: Dataset)
- NativeScanner (c++ class: DisposableScannerAdaptor)

Following c++ components are not JNI-mapped to keep the initial implementation simple:

- Fragment
- ScanTask
- (arrow::)RecordBatchIterator

Add following API to `FileSystemDatasetFactory` to avoid passing file system objects via JNI bridge:

- `FileSystemDatasetFactory::Make( std::string uri, std::shared_ptr<FileFormat> format, FileSystemFactoryOptions options)`

Unit tests are based on `FileSystemDatasetFactory`.

Closes apache#7030 from zhztheplayer/ARROW-7808

Authored-by: Hongze Zhang <[email protected]>
Signed-off-by: Micah Kornfield <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants