Skip to content

[SPARK-33017][PYTHON] Add getCheckpointDir method to PySpark Context#29918

Closed
reidy-p wants to merge 2 commits intoapache:masterfrom
reidy-p:SPARK-33017
Closed

[SPARK-33017][PYTHON] Add getCheckpointDir method to PySpark Context#29918
reidy-p wants to merge 2 commits intoapache:masterfrom
reidy-p:SPARK-33017

Conversation

@reidy-p
Copy link
Copy Markdown
Contributor

@reidy-p reidy-p commented Sep 30, 2020

What changes were proposed in this pull request?

Adding a method to get the checkpoint directory from the PySpark context to match the Scala API

Why are the changes needed?

To make the Scala and Python APIs consistent and remove the need to use the JavaObject

Does this PR introduce any user-facing change?

Yes, there is a new method which makes it easier to get the checkpoint directory directly rather than using the JavaObject

Previous behaviour:

>>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
>>> sc._jsc.sc().getCheckpointDir().get()
'file:/tmp/spark/checkpoint/63f7b67c-e5dc-4d11-a70c-33554a71717a'

This method returns a confusing Scala error if it has not been set

>>> sc._jsc.sc().getCheckpointDir().get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/paul/Desktop/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/home/paul/Desktop/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/home/paul/Desktop/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.get.
: java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:529)
        at scala.None$.get(Option.scala:527)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

New method:

>>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
>>> spark.sparkContext.getCheckpointDir()
'file:/tmp/spark/checkpoint/b38aca2e-8ace-44fc-a4c4-f4e36c2da2a7'

getCheckpointDir() returns None if it has not been set

>>> print(spark.sparkContext.getCheckpointDir())
None

How was this patch tested?

Added to existing unit tests. But I'm not sure how to add a test for the case where getCheckpointDir() should return None since the existing checkpoint tests set the checkpoint directory in the setUp method before any tests are run as far as I can tell.

Comment thread python/pyspark/context.py
@HyukjinKwon
Copy link
Copy Markdown
Member

ok to test

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 2, 2020

Test build #129335 has finished for PR 29918 at commit 3f05f9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33948/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 2, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33948/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 4, 2020

Test build #129392 has finished for PR 29918 at commit 7ff4e88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 4, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33999/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 4, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33999/

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

HyukjinKwon added a commit that referenced this pull request Oct 7, 2020
…documentation

### What changes were proposed in this pull request?

This is a followup of #29918. We should add it into the documentation as well.

### Why are the changes needed?

To show users new APIs.

### Does this PR introduce _any_ user-facing change?

Yes, `SparkContext.getCheckpointDir` will be documented.

### How was this patch tested?

Manually built the PySpark documentation:

```bash
cd python/docs
make clean html
cd build/html
open index.html
```

Closes #29960 from HyukjinKwon/SPARK-33017.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants