Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Jan 29, 2018

What changes were proposed in this pull request?

All other classes in the reader/writer package doesn't have V2 in their names, and the streaming reader/writer don't have V2 either. It's more consistent to remove V2 from DataSourceV2Reader and DataSourceVWriter.

Also rename DataSourceV2Option to remote the V2, we should only have V2 in the root interface: DataSourceV2.

This PR also fixes some places that the mix-in interface doesn't extend the interface it aimed to mix in.

How was this patch tested?

existing tests.

@cloud-fan
Copy link
Contributor Author

* case-insensitive string-to-string map.
*/
DataSourceV2Reader createReader(DataSourceV2Options options);
DataSourceReader createReader(DataSourceV2Options options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not rename options as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on @rdblue's comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, let me rename it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need V2 in the root interface: DataSourceV2

*/
@InterfaceStability.Evolving
public interface SessionConfigSupport {
public interface SessionConfigSupport extends DataSourceV2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to extend DataSourceV2? Why add this in a commit that appears to be nothing more than a rename?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @rdblue 's question.
@cloud-fan . Could you add more explanation about this change at PR description? According to the following, do we need to update Apache JIRA issue types from IMPROVEMENT to BUG?

Also this PR fixes some places that the mix-in interface doesn't extend the interface it aimed to mix in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we left behind. For a mix-in interface, it should extend the interface it aimed to mix in, so that we can guarantee, for example, classes implement SessionConfigSupport must also implement DataSourceV2.

We already did it in some mix-in interfaces, e.g. the streaming read support, SupportsScanUnsafeRow, SupportsScanColumnarBatch, etc. In this PR I just wanna fix the remaining ones, to make them consistent.

If you feel strongly about it, I can create a JIRA ticket for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to keep commits clean and focused. I'd say create a new JIRA for it and do all of the mix-ins at once.

+1 when this is separated to its own PR. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created https://issues.apache.org/jira/browse/SPARK-23262 .

I'd like to keep the changes in this PR, as it's just several lines. It's not ideal but we are rushing for the 2.3 release, I wanna speed this up a little, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing large migration commits like this one with unrelated changes makes it harder to pick or revert changes without unintended side-effects. What happens if we realize that this rename was a bad idea? Reverting this commit would also revert the constraint that SessionConfigSupport extends DataSourceV2. Similarly, if we realize that these mix-ins don't need to extend DataSourceV2, then we would have to find and remove them all instead of reverting a commit. That might even sound okay, but when you're picking commits deliberately to patch branches, you need to make as few changes as possible and cherry-pick conflicts make that much harder.

The fact that you're rushing to get commits into 2.3 is even more concerning and reason to be careful, not a reason to relax our standards. Please move this to its own PR and fix all of the interfaces at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping me on the new PR. I'm happy to review it (though it is non-binding).

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (with some minor comments)

@SparkQA
Copy link

SparkQA commented Jan 29, 2018

Test build #86779 has finished for PR 20427 at commit ff67359.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-23260][SQL] remove V2 from the class name of data source reader/writer [SPARK-23260][SPARK-23262][SQL] several data source v2 naming cleanup Jan 30, 2018
@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86789 has finished for PR 20427 at commit b4fdbbe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Jan 30, 2018

Retest this please

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -23,7 +23,7 @@ import org.apache.spark.sql.sources.v2.reader._

case class DataSourceV2Relation(
Copy link
Member

@gengliangwang gengliangwang Jan 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing V2 in DataSourceV2Relation and StreamingDataSourceV2Relation as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is internal, we can clean it up at any time. I wanna focus on public APIs in this PR.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86793 has finished for PR 20427 at commit b4fdbbe.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

retest this please.

@sameeragarwal
Copy link
Member

LGTM

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86805 has finished for PR 20427 at commit b4fdbbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jan 30, 2018
## What changes were proposed in this pull request?

All other classes in the reader/writer package doesn't have `V2` in their names, and the streaming reader/writer don't have `V2` either. It's more consistent to remove `V2` from `DataSourceV2Reader` and `DataSourceVWriter`.

Also rename `DataSourceV2Option` to remote the `V2`, we should only have `V2` in the root interface: `DataSourceV2`.

This PR also fixes some places that the mix-in interface doesn't extend the interface it aimed to mix in.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <[email protected]>

Closes #20427 from cloud-fan/ds-v2.

(cherry picked from commit 0a9ac02)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 0a9ac02 Jan 30, 2018
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
All other classes in the reader/writer package doesn't have `V2` in their names, and the streaming reader/writer don't have `V2` either. It's more consistent to remove `V2` from `DataSourceV2Reader` and `DataSourceVWriter`.

Also rename `DataSourceV2Option` to remote the `V2`, we should only have `V2` in the root interface: `DataSourceV2`.

This PR also fixes some places that the mix-in interface doesn't extend the interface it aimed to mix in.

existing tests.

Author: Wenchen Fan <[email protected]>

Closes apache#20427 from cloud-fan/ds-v2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants