[SPARK-20196][PYTHON][SQL] update doc for catalog functions for all languages, add pyspark refreshByPath API#17512
[SPARK-20196][PYTHON][SQL] update doc for catalog functions for all languages, add pyspark refreshByPath API#17512felixcheung wants to merge 4 commits intoapache:masterfrom
Conversation
|
Test build #75466 has finished for PR 17512 at commit
|
|
will update after #17518 + changes to R doc too |
|
updated @gatorsmile |
|
Test build #75515 has finished for PR 17512 at commit
|
| } | ||
|
|
||
| #' Create a SparkDataFrame from a SparkSQL Table | ||
| #' Create a SparkDataFrame from a SparkSQL table or temporary view |
There was a problem hiding this comment.
table or view
Here, actually, it includes both temporary views or persistent views.
| #' | ||
| #' Returns the specified Table as a SparkDataFrame. The Table must have already been registered | ||
| #' in the SparkSession. | ||
| #' Returns the specified table or temporary view as a SparkDataFrame. The temporary view must have |
| """Recover all the partitions of the given table and update the catalog.""" | ||
| """Recovers all the partitions of the given table and update the catalog. | ||
|
|
||
| Only works with a partitioned table, and not a temporary view. |
There was a problem hiding this comment.
a temporary view -> a view
Both temporary and persistent views are not supported. We will detect and issue exceptions.
|
|
||
| /** | ||
| * Recovers all the partitions in the directory of a table and update the catalog. | ||
| * Only works with a partitioned table, and not a temporary view. |
| /** | ||
| * Refreshes the cache entry and the associated metadata for all Dataset (if any), that contain | ||
| * the given data source path. | ||
| * the given data source path. Path matching is by prefix, i.e. "/" would invalidate |
There was a problem hiding this comment.
invalidate -> invalidate and refresh
We also do the re-cache, but the new version cached lazily.
There was a problem hiding this comment.
For some reason in here, CatalogImpl.scala is very different from Catalog.scala - let me know if you want me to change them - for now I've updated the first sentence.
There was a problem hiding this comment.
Yes. I found this sentence is copied from Catalog.scala. Maybe, we can update them to
Path matching is by prefix, i.e. "/" would invalidate all the cached entries and make the new versions cached lazily.
|
|
||
| /** | ||
| * Recovers all the partitions in the directory of a table and update the catalog. | ||
| * Only works with a partitioned table, and not a temporary view. |
There was a problem hiding this comment.
not a temporary view.
->
not a view.
|
Test build #75534 has started for PR 17512 at commit |
|
Jenkins, retest this please |
|
Test build #75536 has finished for PR 17512 at commit
|
| #' | ||
| #' Returns the specified Table as a SparkDataFrame. The Table must have already been registered | ||
| #' in the SparkSession. | ||
| #' Returns the specified table or view as a SparkDataFrame. The table or view must already exists or |
There was a problem hiding this comment.
fixed, thanks for catching this!
|
LGTM except minor comments. |
|
Test build #75555 has finished for PR 17512 at commit
|
|
merged to master, thanks! |
What changes were proposed in this pull request?
Update doc to remove external for createTable, add refreshByPath in python
How was this patch tested?
manual