Skip to content

[SPARK-23691][PYTHON] Use sql_conf util in PySpark tests where possible#20830

Closed
HyukjinKwon wants to merge 2 commits intoapache:masterfrom
HyukjinKwon:cleanup-sql-conf
Closed

[SPARK-23691][PYTHON] Use sql_conf util in PySpark tests where possible#20830
HyukjinKwon wants to merge 2 commits intoapache:masterfrom
HyukjinKwon:cleanup-sql-conf

Conversation

@HyukjinKwon
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

d6632d1 added an useful util

@contextmanager
def sql_conf(self, pairs):
    ...

to allow configuration set/unset within a block:

with self.sql_conf({"spark.blah.blah.blah", "blah"})
    # test codes

This PR proposes to use this util where possible in PySpark tests.

Note that there look already few places affecting tests without restoring the original value back in unittest classes.

How was this patch tested?

Manually tested via:

./run-tests --modules=pyspark-sql --python-executables=python2
./run-tests --modules=pyspark-sql --python-executables=python3

@HyukjinKwon
Copy link
Copy Markdown
Member Author

@ueshin and @BryanCutler, could you take a look when you are available?

@HyukjinKwon
Copy link
Copy Markdown
Member Author

BTW, I double checked it produces the stack trace fine by manually changing some tests locally.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 15, 2018

Test build #88254 has finished for PR 20830 at commit 89cf69b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one comment, but I'd leave it to @HyukjinKwon whether we omit the default value block or not.

self.spark.conf.set("spark.sql.execution.arrow.enabled", "true")
pdf_arrow = df.toPandas()

with self.sql_conf({"spark.sql.execution.arrow.enabled": True}):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit this when we use the default value or set the value in setup method, but I'm okay if we want to show the value explicitly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. I am fine. will omit this.

Copy link
Copy Markdown
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HyukjinKwon , LGTM!

self.assertRaises(AnalysisException, lambda: df1.join(df2, how="inner").collect())

self.spark.conf.set("spark.sql.crossJoin.enabled", "true")
with self.sql_conf({"spark.sql.crossJoin.enabled": True}):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the sql_conf context will change this back to be unset right?

Copy link
Copy Markdown
Member Author

@HyukjinKwon HyukjinKwon Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it originally unset spark.sql.crossJoin.enabled but now it set to the original value back.
If spark.sql.crossJoin.enabled is unset in this test, it will change this back to be like it's unset.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 16, 2018

Test build #88295 has finished for PR 20830 at commit b7a4a91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@HyukjinKwon
Copy link
Copy Markdown
Member Author

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 18, 2018

Test build #88349 has finished for PR 20830 at commit b7a4a91.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member Author

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 18, 2018

Test build #88350 has finished for PR 20830 at commit b7a4a91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member Author

@BryanCutler, could I have the very first PR merged by you as a new fresh committer :-)?

I personally think it might be good to merge to branch-2.3 if it doesn't have conflicts. If it has, I think we are fine to get this into master only for now.

@BryanCutler
Copy link
Copy Markdown
Member

BryanCutler commented Mar 19, 2018 via email

@asfgit asfgit closed this in 5663218 Mar 20, 2018
@BryanCutler
Copy link
Copy Markdown
Member

Merged to master! (I think it went ok..) Thanks @HyukjinKwon !!

@HyukjinKwon
Copy link
Copy Markdown
Member Author

Thanks for reviewing and merging this @ueshin, @felixcheung, @BryanCutler and @dongjoon-hyun.

(Just FYI, I usually manually resolve JIRAs when I accidentally failed to take an action with the merge script. I think that's fine.)

@BryanCutler
Copy link
Copy Markdown
Member

The cherry pick to branch-2.3 did have some conflicts. Just to check for the reason to backport, even though this isn't a bug it's pretty safe and will help keep things inline so less conflicts for future backports?

@HyukjinKwon
Copy link
Copy Markdown
Member Author

Yup, that was exactly what I thought. I think it's fine to not bother backport too since it has conflicts.

@HyukjinKwon
Copy link
Copy Markdown
Member Author

But I am willing to do this if you think it's better to do this. No objection.

@BryanCutler
Copy link
Copy Markdown
Member

Hmm, it looks like the conflict is just one block with group agg tests, probably not a big deal - you want to take a look?

@HyukjinKwon
Copy link
Copy Markdown
Member Author

Sure, will open a PR soon.

HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Mar 20, 2018
apache@d6632d1 added an useful util

```python
contextmanager
def sql_conf(self, pairs):
    ...
```

to allow configuration set/unset within a block:

```python
with self.sql_conf({"spark.blah.blah.blah", "blah"})
    # test codes
```

This PR proposes to use this util where possible in PySpark tests.

Note that there look already few places affecting tests without restoring the original value back in unittest classes.

Manually tested via:

```
./run-tests --modules=pyspark-sql --python-executables=python2
./run-tests --modules=pyspark-sql --python-executables=python3
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#20830 from HyukjinKwon/cleanup-sql-conf.
@HyukjinKwon
Copy link
Copy Markdown
Member Author

Ahhhh .. d6632d1 added the util into master only ...

@HyukjinKwon
Copy link
Copy Markdown
Member Author

Let me open a PR and cc you guys to show the diff.

asfgit pushed a commit that referenced this pull request Mar 20, 2018
…where possible

## What changes were proposed in this pull request?

This PR backports #20830 to reduce the diff against master and restore the default value back in PySpark tests.

d6632d1 added an useful util. This backport extracts and brings this util:

```python
contextmanager
def sql_conf(self, pairs):
    ...
```

to allow configuration set/unset within a block:

```python
with self.sql_conf({"spark.blah.blah.blah", "blah"})
    # test codes
```

This PR proposes to use this util where possible in PySpark tests.

Note that there look already few places affecting tests without restoring the original value back in unittest classes.

## How was this patch tested?

Likewise, manually tested via:

```
./run-tests --modules=pyspark-sql --python-executables=python2
./run-tests --modules=pyspark-sql --python-executables=python3
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #20863 from HyukjinKwon/backport-20830.
@HyukjinKwon HyukjinKwon deleted the cleanup-sql-conf branch October 16, 2018 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants