-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-37770: [MATLAB] Add CSV TableReader and TableWriter MATLAB classes
#37773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
928f9c9 to
bedaf98
Compare
sgilmore10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Thanks for adding the base test class and test utilities. We can re-use these in the future for other file types!
|
It looks like the I just pushed a commit that enables this. |
|
Enabling building of the We could try to fix these build failures by explicitly installing I have already confirmed that not building the Arrow C++ library tests when https://github.com/mathworks/arrow/actions/runs/6226385022 I'll start working on a separate PR to remove |
|
For reference, I am working on the changes required to remove |
|
For reference - I've opened #37784 for removing |
…ke build system for the MATLAB interface (#37784) ### Rationale for this change This pull request removes `GoogleTest` support from the CMake build system for the MATLAB interface. 1. `GoogleTest` support adds a lot of additional complexity to the CMake build system for the MATLAB interface, and we currently don't have any standalone C++ tests for the MATLAB interface code. 2. In order to use `GoogleTest` in the MATLAB CI workflows, we are currently relying on building the tests for the Arrow C++ libraries in order to "re-use" the `GoogleTest binaries. This adds additional overhead to the MATLAB CI workflows. 3. If we want to test some internal C++ code for the MATLAB interface in the future, we can instead use a MEX function to call the code from a MATLAB test as suggested by @ kou in #37532 (comment). 4. There is [precedent for testing internal C++ code without GoogleTest for the Python bindings](#14117). 5. On a somewhat related note - removing `GoogleTest` support will help unblock #37773 as discussed in #37773 (comment). ### What changes are included in this PR? 1. Removed the `MATLAB_BUILD_TESTS` flag from the CMake build system for the MATLAB interface since there are no longer any C++ tests for the MATLAB interface to build. 2. Updated the `matlab_build.sh` CI workflow script to avoid building the tests for the Arrow C++ libraries and to no longer call `ctest`. 3. Updated the `README.md` for the MATLAB interface to no longer mention building or running C++ tests. 4. Updated the design document for the MATLAB Interface to no longer mention `GoogleTest` since we may end up testing internal C++ code using MEX function calls from MATLAB instead. 5. Removed placeholder C++ test (i.e. `placeholder_test.cc`). ### Are these changes tested? Yes. The MATLAB CI workflow is passing on all platforms. ### Are there any user-facing changes? Yes. There are no longer any C++ tests for the MATLAB interface. The `MATLAB_BUILD_TESTS` flag has been removed from the CMake build system to reflect this change. If a user supplies a value for `MATLAB_BUILD_TESTS` when building the MATLAB interface, the flag will be ignored by CMake. ### Future Directions 1. Add more developer-focused documentation on how to test C++ code via MEX function calls from MATLAB. ### Notes 1. In the future, we can consider testing internal C++ code using MEX function calls from MATLAB tests as suggested by @ kou in #37532 (comment). Currently, we don't have any C++ tests that need to be adapted to use this approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: #37532 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
a6055fb to
5149465
Compare
|
Update: the code to remove Now that the Arrow C++ tests are no longer being built, enabling At this point, these changes should be ready to merge. |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
The failures for the |
|
CI failures are related to #37803. |
|
+1 |
2. Change `write` method to take in an `arrow.tabular.Table`, rather than an `arrow.tabular.RecordBatch`. 3. Add basic implementation of `arrow.io.csv.TableReader` class. Co-authored-by: Sarah Gilmore <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
2. Create CSV test superclass. 3. Add tError test class. Co-authored-by: Sarah Gilmore <[email protected]>
…iter filename argument. 2. Add basic error tests for TableReader and TableWriter. Co-authored-by: Sarah Gilmore <[email protected]>
2. Add more error tests. Co-authored-by: Sarah Gilmore <[email protected]>
…rkflow script. Co-authored-by: Sarah Gilmore <[email protected]>
cdaa51a to
aea2f39
Compare
|
#37808 has been merged. I just rebased these changes on top of |
|
The MATLAB CI workflows are passing now. I will merge this PR. |
|
+1 |
|
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2b34e37. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…he CMake build system for the MATLAB interface (apache#37784) ### Rationale for this change This pull request removes `GoogleTest` support from the CMake build system for the MATLAB interface. 1. `GoogleTest` support adds a lot of additional complexity to the CMake build system for the MATLAB interface, and we currently don't have any standalone C++ tests for the MATLAB interface code. 2. In order to use `GoogleTest` in the MATLAB CI workflows, we are currently relying on building the tests for the Arrow C++ libraries in order to "re-use" the `GoogleTest binaries. This adds additional overhead to the MATLAB CI workflows. 3. If we want to test some internal C++ code for the MATLAB interface in the future, we can instead use a MEX function to call the code from a MATLAB test as suggested by @ kou in apache#37532 (comment). 4. There is [precedent for testing internal C++ code without GoogleTest for the Python bindings](apache#14117). 5. On a somewhat related note - removing `GoogleTest` support will help unblock apache#37773 as discussed in apache#37773 (comment). ### What changes are included in this PR? 1. Removed the `MATLAB_BUILD_TESTS` flag from the CMake build system for the MATLAB interface since there are no longer any C++ tests for the MATLAB interface to build. 2. Updated the `matlab_build.sh` CI workflow script to avoid building the tests for the Arrow C++ libraries and to no longer call `ctest`. 3. Updated the `README.md` for the MATLAB interface to no longer mention building or running C++ tests. 4. Updated the design document for the MATLAB Interface to no longer mention `GoogleTest` since we may end up testing internal C++ code using MEX function calls from MATLAB instead. 5. Removed placeholder C++ test (i.e. `placeholder_test.cc`). ### Are these changes tested? Yes. The MATLAB CI workflow is passing on all platforms. ### Are there any user-facing changes? Yes. There are no longer any C++ tests for the MATLAB interface. The `MATLAB_BUILD_TESTS` flag has been removed from the CMake build system to reflect this change. If a user supplies a value for `MATLAB_BUILD_TESTS` when building the MATLAB interface, the flag will be ignored by CMake. ### Future Directions 1. Add more developer-focused documentation on how to test C++ code via MEX function calls from MATLAB. ### Notes 1. In the future, we can consider testing internal C++ code using MEX function calls from MATLAB tests as suggested by @ kou in apache#37532 (comment). Currently, we don't have any C++ tests that need to be adapted to use this approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: apache#37532 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…AB classes (apache#37773) ### Rationale for this change To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface. ### What changes are included in this PR? 1. Added a new `arrow.io.csv.TableReader` class 2. Added a new `arrow.io.csv.TableWriter` class **Example** ```matlab >> matlabTableWrite = array2table(rand(3)) matlabTableWrite = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> arrowTableWrite = arrow.table(matlabTableWrite) arrowTableWrite = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> writer = arrow.io.csv.TableWriter("example.csv") writer = TableWriter with properties: Filename: "example.csv" >> writer.write(arrowTableWrite) >> reader = arrow.io.csv.TableReader("example.csv") reader = TableReader with properties: Filename: "example.csv" >> arrowTableRead = reader.read() arrowTableRead = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> matlabTableRead = table(arrowTableRead) matlabTableRead = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> isequal(arrowTableRead, arrowTableWrite) ans = logical 1 >> isequal(matlabTableRead, matlabTableWrite) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`. 2. Both of these test classes inherit from a `CSVTest` superclass. ### Are there any user-facing changes? Yes. 1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`. ### Future Directions 1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB. 2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values. 4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV. 5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc. ### Notes 1. Thank you @ sgilmore10 for your help with this pull request! 2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request. * Closes: apache#37770 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…he CMake build system for the MATLAB interface (apache#37784) ### Rationale for this change This pull request removes `GoogleTest` support from the CMake build system for the MATLAB interface. 1. `GoogleTest` support adds a lot of additional complexity to the CMake build system for the MATLAB interface, and we currently don't have any standalone C++ tests for the MATLAB interface code. 2. In order to use `GoogleTest` in the MATLAB CI workflows, we are currently relying on building the tests for the Arrow C++ libraries in order to "re-use" the `GoogleTest binaries. This adds additional overhead to the MATLAB CI workflows. 3. If we want to test some internal C++ code for the MATLAB interface in the future, we can instead use a MEX function to call the code from a MATLAB test as suggested by @ kou in apache#37532 (comment). 4. There is [precedent for testing internal C++ code without GoogleTest for the Python bindings](apache#14117). 5. On a somewhat related note - removing `GoogleTest` support will help unblock apache#37773 as discussed in apache#37773 (comment). ### What changes are included in this PR? 1. Removed the `MATLAB_BUILD_TESTS` flag from the CMake build system for the MATLAB interface since there are no longer any C++ tests for the MATLAB interface to build. 2. Updated the `matlab_build.sh` CI workflow script to avoid building the tests for the Arrow C++ libraries and to no longer call `ctest`. 3. Updated the `README.md` for the MATLAB interface to no longer mention building or running C++ tests. 4. Updated the design document for the MATLAB Interface to no longer mention `GoogleTest` since we may end up testing internal C++ code using MEX function calls from MATLAB instead. 5. Removed placeholder C++ test (i.e. `placeholder_test.cc`). ### Are these changes tested? Yes. The MATLAB CI workflow is passing on all platforms. ### Are there any user-facing changes? Yes. There are no longer any C++ tests for the MATLAB interface. The `MATLAB_BUILD_TESTS` flag has been removed from the CMake build system to reflect this change. If a user supplies a value for `MATLAB_BUILD_TESTS` when building the MATLAB interface, the flag will be ignored by CMake. ### Future Directions 1. Add more developer-focused documentation on how to test C++ code via MEX function calls from MATLAB. ### Notes 1. In the future, we can consider testing internal C++ code using MEX function calls from MATLAB tests as suggested by @ kou in apache#37532 (comment). Currently, we don't have any C++ tests that need to be adapted to use this approach. 2. Thank you @ sgilmore10 for your help with this pull request! * Closes: apache#37532 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…AB classes (apache#37773) ### Rationale for this change To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface. ### What changes are included in this PR? 1. Added a new `arrow.io.csv.TableReader` class 2. Added a new `arrow.io.csv.TableWriter` class **Example** ```matlab >> matlabTableWrite = array2table(rand(3)) matlabTableWrite = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> arrowTableWrite = arrow.table(matlabTableWrite) arrowTableWrite = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> writer = arrow.io.csv.TableWriter("example.csv") writer = TableWriter with properties: Filename: "example.csv" >> writer.write(arrowTableWrite) >> reader = arrow.io.csv.TableReader("example.csv") reader = TableReader with properties: Filename: "example.csv" >> arrowTableRead = reader.read() arrowTableRead = Var1: double Var2: double Var3: double ---- Var1: [ [ 0.9113083542736461, 0.5131490075412158, 0.42942202968065213 ] ] Var2: [ [ 0.09159480217154525, 0.27367730380496647, 0.8866478145458545 ] ] Var3: [ [ 0.2459443412735529, 0.6211893868708748, 0.49500739584280073 ] ] >> matlabTableRead = table(arrowTableRead) matlabTableRead = 3×3 table Var1 Var2 Var3 _______ ________ _______ 0.91131 0.091595 0.24594 0.51315 0.27368 0.62119 0.42942 0.88665 0.49501 >> isequal(arrowTableRead, arrowTableWrite) ans = logical 1 >> isequal(matlabTableRead, matlabTableWrite) ans = logical 1 ``` ### Are these changes tested? Yes. 1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`. 2. Both of these test classes inherit from a `CSVTest` superclass. ### Are there any user-facing changes? Yes. 1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`. ### Future Directions 1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB. 2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values. 4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV. 5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc. ### Notes 1. Thank you @ sgilmore10 for your help with this pull request! 2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request. * Closes: apache#37770 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
Rationale for this change
To enable initial CSV I/O support, this PR adds
arrow.io.csv.TableReaderandarrow.io.csv.TableWriterMATLAB classes to the MATLAB interface.What changes are included in this PR?
arrow.io.csv.TableReaderclassarrow.io.csv.TableWriterclassExample
Are these changes tested?
Yes.
test/arrow/io/csv/tRoundTrip.mandtest/arrow/io/csv/tError.m.CSVTestsuperclass.Are there any user-facing changes?
Yes.
arrow.io.csv.TableReaderandarrow.io.csv.TableWriter.Future Directions
RecordBatchReaderandRecordBatchWriterfor CSV.Notes
TableReaderandTableWriterin one pull request because it simplified testing. My apologies for the slightly lengthy pull request.TableReaderandTableWriterMATLAB classes #37770