Skip to content

Conversation

@kevincmchen
Copy link
Contributor

@kevincmchen kevincmchen commented Feb 14, 2021

What changes were proposed in this pull request?

This is a followup of #19269

In #19269 , there is only a scala implementation of simple writable data source in DataSourceV2Suite.

This PR adds a java implementation of it.

Why are the changes needed?

To improve test coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

…e writable data source in `DataSourceV2Suite`

### What changes were proposed in this pull request?

This is a followup of #19269

In #19269 , there is only a scala implementation of simple writable data source in `DataSourceV2Suite`.

This PR adds a java implementation of it.

### Why are the changes needed?

To improve test coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing testsuites
@github-actions github-actions bot added the SQL label Feb 14, 2021
@HyukjinKwon HyukjinKwon changed the title [SPARK-34432][SQL][TEST][FOLLOWUP] Add a java implementation of simpl… [SPARK-34432][SQL][TESTS] Add a java implementation of simple writable data source in DataSourceV2Suite Feb 16, 2021
@HyukjinKwon
Copy link
Member

cc @rdblue @cloud-fan

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making your first contribution. I left a few general comments.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-34432][SQL][TESTS] Add a java implementation of simple writable data source in DataSourceV2Suite [SPARK-34432][SQL][TESTS] Add JavaSimpleWritableDataSource Feb 17, 2021
@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39780/

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39780/

@SparkQA
Copy link

SparkQA commented Feb 17, 2021

Test build #135199 has finished for PR 31560 at commit d5dc307.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class JavaSimpleWritableDataSource implements TestingV2Source, SessionConfigSupport
  • static class JavaCSVInputPartitionReader implements InputPartition
  • static class JavaCSVReaderFactory implements PartitionReaderFactory
  • static class JavaSimpleCounter
  • static class JavaCSVDataWriterFactory implements DataWriterFactory
  • static class JavaCSVDataWriter implements DataWriter<InternalRow>
  • class MyScanBuilder extends JavaSimpleScanBuilder
  • static class MyWriteBuilder implements WriteBuilder, SupportsTruncate
  • static class MyWrite implements Write
  • static class MyBatchWrite implements BatchWrite
  • class MyTable extends JavaSimpleBatchTable implements SupportsWrite

kevincmchen and others added 2 commits February 18, 2021 21:33
	1. add a class description for JavaSimpleWritableDataSource

OPTIMIZE:
	1. re-order the import
	2. match the class layout with the existing SimpleWritableDataSource

UPDATE:
    1. catch the specific exception(IOException) instead of Exception.
    2. use SimpleCounter
delete duplicated blank Lines in `JavaSimpleWritableDataSource`
@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39811/

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39811/

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39812/

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39812/

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Test build #135230 has finished for PR 31560 at commit 88c6edf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • static class JavaCSVInputPartitionReader implements InputPartition
  • static class JavaCSVReaderFactory implements PartitionReaderFactory
  • static class JavaCSVDataWriterFactory implements DataWriterFactory
  • static class JavaCSVDataWriter implements DataWriter<InternalRow>

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Test build #135231 has finished for PR 31560 at commit 8c080ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

return new InputPartition[0];
}
} catch (IOException e) {
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not java friendly, I think we should update Batch#planInputPartitions to add throws IOException, the same as PartitionReader#next.

The throws clause is a compile-time check and won't break binary compatibility. Also cc @rdblue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the same.

Copy link
Contributor Author

@kevincmchen kevincmchen Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some different opinions recently.

  1. for Batch#planInputPartitions, If this method fails (by throwing an exception), the underlying data source scan may require manual cleanup .
  2. for BatchWrite#commit, If this method fails (by throwing an exception), this writing job is considered to have been failed, and BatchWrite#abort would be called,but BatchWrite#abort may not be able to deal with it. more details, pls see the comments of BatchWrite#abort

so its better to deal with these exceptions in Batch#planInputPartitions and BatchWrite#commit;

@cloud-fan

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39890/

…rom JavaSimpleBatchTable and override the function capabilities in MyTable
@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39890/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39893/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39893/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Test build #135307 has finished for PR 31560 at commit 9037d5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Test build #135310 has finished for PR 31560 at commit 9178240.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Test build #135313 has finished for PR 31560 at commit 3fc7288.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39902/

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39902/

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Test build #135320 has finished for PR 31560 at commit c665f68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Test build #135322 has finished for PR 31560 at commit 0d85f08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


@Override
public String keyPrefix() {
return "javaSimpleWritableDataSource";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't need to implement SessionConfigSupport. Can you open a follow-up PR to remove it from both the Scala and Java versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all right

Object[] objects =
Arrays.stream(currentLine.split(","))
.map(String::trim)
.map(Integer::parseInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change the Scala version as well in a follow-up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,i'll change the scala version

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9767041 Feb 22, 2021
@cloud-fan
Copy link
Contributor

@kevincmchen can you open a new PR to add throws clause to planInputPartitions, commit, etc., and trigger the discussion there?

@kevincmchen
Copy link
Contributor Author

@kevincmchen can you open a new PR to add throws clause to planInputPartitions, commit, etc., and trigger the discussion there?

ok, all right.

@kevincmchen kevincmchen deleted the SPARK-34432 branch February 23, 2021 03:00
cloud-fan pushed a commit that referenced this pull request Mar 2, 2021
### What changes were proposed in this pull request?

This is a followup of #31560,
In  #31560,  we added `JavaSimpleWritableDataSource ` and left some little problems like unused interface `SessionConfigSupport` 、 inconsistent schema between `JavaSimpleWritableDataSource ` and `SimpleWritableDataSource`.
This PR fixes the remaining problems in #31560.

### Why are the changes needed?

1. `SessionConfigSupport` in `JavaSimpleWritableDataSource ` and `SimpleWritableDataSource` is never used, so we don't need to implement it.
2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source`

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

existing testsuites

Closes #31621 from kevincmchen/SPARK-34498.

Authored-by: kevincmchen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants