forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
sync #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
sync #7
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What changes were proposed in this pull request? Pagination code across pages needs to be cleaned. I have tried to clear out these things : * Unused methods * Unused method arguments * remove redundant `if` expressions * fix indentation ### Why are the changes needed? This fix will make code more readable and remove unnecessary methods and variables. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually Closes #28448 from iRakson/refactorPagination. Authored-by: iRakson <[email protected]> Signed-off-by: Sean Owen <[email protected]>
### What changes were proposed in this pull request? In the algorithms that support instance weight, add checks to make sure instance weight is not negative. ### Why are the changes needed? instance weight has to be >= 0.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested Closes #28621 from huaxingao/weight_check. Authored-by: Huaxin Gao <[email protected]> Signed-off-by: Sean Owen <[email protected]>
### What changes were proposed in this pull request? This PR extracts the logic for selecting the planned join type out of the `JoinSelection` rule and moves it to `JoinSelectionHelper` in Catalyst. ### Why are the changes needed? This change both cleans up the code in `JoinSelection` and allows the logic to be in one place and be used from other rules that need to make decision based on the join type before the planning time. ### Does this PR introduce _any_ user-facing change? `BuildSide`, `BuildLeft`, and `BuildRight` are moved from `org.apache.spark.sql.execution` to Catalyst in `org.apache.spark.sql.catalyst.optimizer`. ### How was this patch tested? This is a refactoring, passes existing tests. Closes #28540 from dbaliafroozeh/RefactorJoinSelection. Authored-by: Ali Afroozeh <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ExpressionsSuite ### What changes were proposed in this pull request? This PR modifies some codegen related tests to test escape characters for datetime functions which are time zone aware. If the timezone is absent, the formatter could result in `null` caused by `java.util.NoSuchElementException: None.get` and bypassing the real intention of those test cases. ### Why are the changes needed? fix tests ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? passing the modified test cases. Closes #28653 from yaooqinn/SPARK-31835. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…onal formatter
### What changes were proposed in this pull request?
Currently, the legacy fractional formatter is based on the implementation from Spark 2.4 which formats the input timestamp twice:
```
val timestampString = ts.toString
val formatted = legacyFormatter.format(ts)
```
to strip trailing zeros. This PR proposes to avoid the first formatting by forming the second fraction directly.
### Why are the changes needed?
It makes legacy fractional formatter faster.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By existing test "format fraction of second" in `TimestampFormatterSuite` + added test for timestamps before 1970-01-01 00:00:00Z
Closes #28643 from MaxGekk/optimize-legacy-fract-format.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…a 8 bug of stand-alone form ### What changes were proposed in this pull request? If `LLL`/`qqq` is used in the datetime pattern string, and the current JDK in use has a bug for the stand-alone form (see https://bugs.openjdk.java.net/browse/JDK-8114833), throw an exception with a clear error message. ### Why are the changes needed? to keep backward compatibility with Spark 2.4 ### Does this PR introduce _any_ user-facing change? Yes Spark 2.4 ``` scala> sql("select date_format('1990-1-1', 'LLL')").show +---------------------------------------------+ |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)| +---------------------------------------------+ | Jan| +---------------------------------------------+ ``` Spark 3.0 with Java 11 ``` scala> sql("select date_format('1990-1-1', 'LLL')").show +---------------------------------------------+ |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)| +---------------------------------------------+ | Jan| +---------------------------------------------+ ``` Spark 3.0 with Java 8 ``` // before this PR +---------------------------------------------+ |date_format(CAST(1990-1-1 AS TIMESTAMP), LLL)| +---------------------------------------------+ | 1| +---------------------------------------------+ // after this PR scala> sql("select date_format('1990-1-1', 'LLL')").show org.apache.spark.SparkUpgradeException ``` ### How was this patch tested? manual test with java 8 and 11 Closes #28646 from cloud-fan/format. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? This PR changes JsonProtocol to write RDDInfos#isBarrier. ### Why are the changes needed? JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a testcase. Closes #28583 from sarutak/SPARK-31764. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Xingbo Jiang <[email protected]>
### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang <[email protected]> Signed-off-by: Xingbo Jiang <[email protected]>
…g from Python with Arrow Handle Pandas category type while converting from python with Arrow enabled. The category column will be converted to whatever type the category elements are as is the case with Arrow disabled. ### Does this PR introduce any user-facing change? No ### How was this patch tested? New unit tests were added for `createDataFrame` and scalar `pandas_udf` Closes #26585 from jalpan-randeri/feature-pyarrow-dictionary-type. Authored-by: Jalpan Randeri <[email protected]> Signed-off-by: Bryan Cutler <[email protected]>
…lass
### What changes were proposed in this pull request?
Adds `inputFiles()` method to PySpark `DataFrame`. Using this, PySpark users can list all files constituting a `DataFrame`.
**Before changes:**
```
>>> spark.read.load("examples/src/main/resources/people.json", format="json").inputFiles()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/***/***/spark/python/pyspark/sql/dataframe.py", line 1388, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'inputFiles'
```
**After changes:**
```
>>> spark.read.load("examples/src/main/resources/people.json", format="json").inputFiles()
[u'file:///***/***/spark/examples/src/main/resources/people.json']
```
### Why are the changes needed?
This method is already supported for spark with scala and java.
### Does this PR introduce _any_ user-facing change?
Yes, Now users can list all files of a DataFrame using `inputFiles()`
### How was this patch tested?
UT added.
Closes #28652 from iRakson/SPARK-31763.
Authored-by: iRakson <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
### What changes were proposed in this pull request? Delete duplicate code castsuit ### Why are the changes needed? keep spark code clean ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? no need Closes #28655 from GuoPhilipse/delete-duplicate-code-castsuit. Lead-authored-by: GuoPhilipse <[email protected]> Co-authored-by: GuoPhilipse <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
sync