Skip to content

Conversation

@zhenlineo
Copy link
Contributor

What changes were proposed in this pull request?

Fixes DataFrameWriter.save to work without path or table parameter.
Added support of jdbc method in the writer as it is one of the impl that does not contains a path or table.

Why are the changes needed?

DataFrameWriter.save should work without path parameter because some data sources, such as jdbc, noop, works without those parameters.
The follow up fix for scala client of #40356

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit and E2E test

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.4.

HyukjinKwon pushed a commit that referenced this pull request Mar 14, 2023
Fixes `DataFrameWriter.save` to work without path or table parameter.
Added support of jdbc method in the writer as it is one of the impl that does not contains a path or table.

DataFrameWriter.save should work without path parameter because some data sources, such as jdbc, noop, works without those parameters.
The follow up fix for scala client of #40356

No

Unit and E2E test

Closes #40358 from zhenlineo/write-without-path-table.

Authored-by: Zhen Li <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 93334e2)
Signed-off-by: Hyukjin Kwon <[email protected]>
spark.range(10).write.format("noop").mode("append").save()
}

test("write jdbc") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to break branch-3.4 somehow. I'm checking it now.

[info] - write jdbc *** FAILED *** (527 milliseconds)
[info]   io.grpc.StatusRuntimeException: INTERNAL: No suitable driver
[info]   at io.grpc.Status.asRuntimeException(Status.java:535)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun I checked branch-3.4 locally and I can run the following without error:

build/sbt -Phive -Pconnect package
build/sbt "connect-client-jvm/test"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @zhenlineo , I reproduced the error locally in this way on branch-3.4 while the same command works in master.

$ build/sbt -Phive -Phadoop-3 assembly/package "protobuf/test" "connect-common/test" "connect/test" "connect-client-jvm/test"
...
[info] ClientE2ETestSuite:
[info] - spark result schema (319 milliseconds)
[info] - spark result array (350 milliseconds)
[info] - eager execution of sql (18 seconds, 3 milliseconds)
[info] - simple dataset (1 second, 194 milliseconds)
[info] - SPARK-42665: Ignore simple udf test until the udf is fully implemented. !!! IGNORED !!!
[info] - read and write (1 second, 32 milliseconds)
[info] - read path collision (32 milliseconds)
[info] - write table (5 seconds, 349 milliseconds)
[info] - write without table or path (170 milliseconds)
[info] - write jdbc *** FAILED *** (325 milliseconds)
[info]   io.grpc.StatusRuntimeException: INTERNAL: No suitable driver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, please see the CI. It's broken consistently.

Screenshot 2023-03-14 at 10 26 22 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Be free to revert it in 3.4 and I will take a better look late today or tomorrow. Thanks. I can send a PR target at 3.4 directly. Or whatever is the easiest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @zhenlineo . Let me check more.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, #40358 (comment) also failed.

$ build/sbt -Phive -Pconnect package
$ build/sbt "connect-client-jvm/test"
...
[info] ClientE2ETestSuite:
[info] - spark result schema (290 milliseconds)
[info] - spark result array (290 milliseconds)
[info] - eager execution of sql (15 seconds, 819 milliseconds)
[info] - simple dataset (1 second, 28 milliseconds)
[info] - SPARK-42665: Ignore simple udf test until the udf is fully implemented. !!! IGNORED !!!
[info] - read and write (929 milliseconds)
[info] - read path collision (31 milliseconds)
[info] - write table (4 seconds, 540 milliseconds)
[info] - write without table or path (348 milliseconds)
[info] - write jdbc *** FAILED *** (365 milliseconds)
[info]   io.grpc.StatusRuntimeException: INTERNAL: No suitable driver
...

In this case, the usual suspect is Java. GitHub Action CI and I'm using Java 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun I saw the 3.4 build went back to green. Is this bf9c4b9 the fix? Is there still a problem that I shall fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Aha, I saw the fix ab7c4f8. Thanks a lot!

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's resolved for now. Thank you for checking, @zhenlineo !

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
Fixes `DataFrameWriter.save` to work without path or table parameter.
Added support of jdbc method in the writer as it is one of the impl that does not contains a path or table.

DataFrameWriter.save should work without path parameter because some data sources, such as jdbc, noop, works without those parameters.
The follow up fix for scala client of apache#40356

No

Unit and E2E test

Closes apache#40358 from zhenlineo/write-without-path-table.

Authored-by: Zhen Li <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 93334e2)
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants