Skip to content

[SPARK-31767][PYTHON][CORE] Remove ResourceInformation in pyspark module's namespace#28589

Closed
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon:SPARK-31767
Closed

[SPARK-31767][PYTHON][CORE] Remove ResourceInformation in pyspark module's namespace#28589
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon:SPARK-31767

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 20, 2020

What changes were proposed in this pull request?

This PR proposes to only allow the import of ResourceInformation as below:

pyspark.resource.ResourceInformation

instead of

pyspark.ResourceInformation
pyspark.resource.ResourceInformation

because pyspark.resource is a separate module, and it is documented so.
The constructor of ResourceInformation isn't supposed to directly call anyway.

Why are the changes needed?

To keep the code structure coherent.

Does this PR introduce any user-facing change?

No, it will be in the unreleased branches.

How was this patch tested?

Manually tested via importing:

Before:

>>> import pyspark
>>> pyspark.ResourceInformation
<class 'pyspark.resource.information.ResourceInformation'>
>>> pyspark.resource.ResourceInformation
<class 'pyspark.resource.information.ResourceInformation'>

After:

>>> import pyspark
>>> pyspark.ResourceInformation
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pyspark' has no attribute 'ResourceInformation'
>>> pyspark.resource.ResourceInformation
<class 'pyspark.resource.information.ResourceInformation'>

Also tested via

cd python
./run-tests --python-executables=python3 --modules=pyspark-core,pyspark-resource

Jenkins will test and existing tests should cover.

@HyukjinKwon
Copy link
Member Author

This can be backported to branch-3.0 as well.

@SparkQA
Copy link

SparkQA commented May 20, 2020

Test build #122872 has finished for PR 28589 at commit b4d215e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request May 20, 2020
…ule's namespace

This PR proposes to only allow the import of `ResourceInformation` as below:

```
pyspark.resource.ResourceInformation
```

instead of

```
pyspark.ResourceInformation
pyspark.resource.ResourceInformation
```

because `pyspark.resource` is a separate module, and it is documented so.
The constructor of `ResourceInformation` isn't supposed to directly call anyway.

To keep the code structure coherent.

No, it will be in the unreleased branches.

Manually tested via importing:

Before:

```python
>>> import pyspark
>>> pyspark.ResourceInformation
<class 'pyspark.resource.information.ResourceInformation'>
>>> pyspark.resource.ResourceInformation
<class 'pyspark.resource.information.ResourceInformation'>
```

After:

```python
>>> import pyspark
>>> pyspark.ResourceInformation
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pyspark' has no attribute 'ResourceInformation'
>>> pyspark.resource.ResourceInformation
<class 'pyspark.resource.information.ResourceInformation'>
```

Also tested via

```bash
cd python
./run-tests --python-executables=python3 --modules=pyspark-core,pyspark-resource
```

Jenkins will test and existing tests should cover.

Closes #28589 from HyukjinKwon/SPARK-31767.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit dc3a606)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@HyukjinKwon
Copy link
Member Author

Thank you @dongjoon-hyun.

@viirya
Copy link
Member

viirya commented May 20, 2020

Yea, LGTM too.

@tgravescs
Copy link
Contributor

@HyukjinKwon ResourceInformation is now in a different module between spark 3.0 and 3.1. Even though users shouldn't be constructing one its still a public api. I left it the way it was to keep it consistent. It seems like we should put them in the same place between 3.0 and 3.1

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented May 20, 2020

@tgravescs, yeah, I understand the initial intention. The problem I would like to address here is that don't expose ResourceInformation under pyspark with a minimised change as it became a separate package in the master - it will be difficult to hide it back once it's exposed.

To end users, there are no differences. We can even move this file into a separate package for consistency after the release. I thought it's not worthwhile to do it, and also given RC period. So I came up with a minimised fix.

If you strongly feel it should be as a package for dev purpose consistently in branch-3.0 too, I wouldn't mind.

@HyukjinKwon HyukjinKwon deleted the SPARK-31767 branch July 27, 2020 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants