Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
261 commits
Select commit Hold shift + click to select a range
5669b21
[SPARK-32840][SQL] Invalid interval value can happen to be just adhes…
yaooqinn Sep 10, 2020
a22871f
[SPARK-32777][SQL] Aggregation support aggregate function with multip…
beliefer Sep 10, 2020
5f468cc
[SPARK-32822][SQL] Change the number of partitions to zero when a ran…
sarutak Sep 11, 2020
328d81a
[SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctio…
cloud-fan Sep 11, 2020
fe2ab25
[MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' …
Ted-Jiang Sep 11, 2020
9f4f49c
[SPARK-32853][SQL] Consecutive save/load calls in DataFrame/StreamRea…
cloud-fan Sep 11, 2020
94cac59
[SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right si…
peter-toth Sep 11, 2020
f6322d1
[SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in P…
Sep 11, 2020
b4be6a6
[SPARK-32845][SS][TESTS] Add sinkParameter to check sink options robu…
dongjoon-hyun Sep 11, 2020
4269c2c
[SPARK-32851][SQL][TEST] Tests should fail if errors happen when gene…
maropu Sep 11, 2020
ce566be
[SPARK-32180][FOLLOWUP] Fix .rst error in new Pyspark installation guide
srowen Sep 12, 2020
2009f95
[SPARK-32779][SQL][FOLLOW-UP] Delete Unused code
sandeep-katta Sep 12, 2020
bbbd907
[SPARK-32804][LAUNCHER] Fix run-example command builder bug
KevinSmile Sep 12, 2020
3be552c
[SPARK-30090][SHELL] Adapt Spark REPL to Scala 2.13
karolchmist Sep 12, 2020
3d08084
[SPARK-24994][SQL] Add UnwrapCastInBinaryComparison optimizer to simp…
sunchao Sep 13, 2020
0549c20
[SPARK-32865][DOC] python section in quickstart page doesn't display …
bowenli86 Sep 13, 2020
a6d6ea3
[SPARK-32802][SQL] Avoid using SpecificInternalRow in RunLengthEncodi…
sunchao Sep 13, 2020
fbb0f37
[SPARK-32869][BUILD] Ignore deprecation warnings for build with Scala…
sarutak Sep 14, 2020
e558b8a
[SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framew…
beliefer Sep 14, 2020
742fcff
[SPARK-32839][WINDOWS] Make Spark scripts working with the spaces in …
HyukjinKwon Sep 14, 2020
b121f0d
[SPARK-32873][BUILD] Fix code which causes error when build with sbt …
sarutak Sep 14, 2020
978f531
[SPARK-32854][SS] Minor code and doc improvement for stream-stream join
c21 Sep 14, 2020
5e82548
[SPARK-32844][SQL] Make `DataFrameReader.table` take the specified op…
xuanyuanking Sep 14, 2020
7a17158
[SPARK-32868][SQL] Add more order irrelevant aggregates to EliminateS…
tanelk Sep 14, 2020
0696f04
[SPARK-32876][SQL] Change default fallback versions to 3.0.1 and 2.4.…
HyukjinKwon Sep 14, 2020
72550c3
[SPARK-32872][CORE] Prevent BytesToBytesMap at MAX_CAPACITY from exce…
ankurdave Sep 14, 2020
d58a4a3
[SPARK-32882][K8S] Remove python2 installation in K8s python image
williamhyun Sep 14, 2020
4fac6d5
[SPARK-32871][BUILD] Append toMap to Map#filterKeys if the result of …
sarutak Sep 15, 2020
7a9b066
[SPARK-32715][CORE] Fix memory leak when failed to store pieces of br…
LantaoJin Sep 15, 2020
0811666
[SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pend…
Ngone51 Sep 15, 2020
d8a0d85
[SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
dongjoon-hyun Sep 15, 2020
c8baab1
[SPARK-32879][SQL] Refactor SparkSession initial options
hvanhovell Sep 15, 2020
99384d1
[SPARK-32738][CORE] Should reduce the number of active threads if fat…
wzhfy Sep 15, 2020
316242b
[SPARK-32874][SQL][TEST] Enhance result set meta data check for execu…
yaooqinn Sep 15, 2020
6f36db1
[SPARK-31448][PYTHON] Fix storage level used in persist() in datafram…
abhishekd0907 Sep 15, 2020
888b343
[SPARK-32827][SQL] Add spark.sql.maxMetadataStringLength config
ulysses-you Sep 15, 2020
108c4c8
[SPARK-32481][SQL][TESTS][FOLLOW-UP] Skip the test if trash directory…
HyukjinKwon Sep 15, 2020
b46c730
[SPARK-32704][SQL][TESTS][FOLLOW-UP] Check any physical rule instead …
HyukjinKwon Sep 16, 2020
6051755
[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for f…
tanelk Sep 16, 2020
2e3aa2f
[SPARK-32861][SQL] GenerateExec should require column ordering
allisonwang-db Sep 16, 2020
550c1c9
[SPARK-32888][DOCS] Add user document about header flag and RDD as pa…
viirya Sep 16, 2020
e884290
[SPARK-32835][PYTHON] Add withField method to the pyspark Column class
Sep 16, 2020
c918909
[SPARK-32814][PYTHON] Replace __metaclass__ field with metaclass keyword
zero323 Sep 16, 2020
3bc13e6
[SPARK-32706][SQL] Improve cast string to decimal type
wangyum Sep 16, 2020
355ab6a
[SPARK-32804][LAUNCHER][FOLLOWUP] Fix SparkSubmitCommandBuilderSuite …
KevinSmile Sep 16, 2020
56ae950
[SPARK-32850][CORE] Simplify the RPC message flow of decommission
Ngone51 Sep 16, 2020
40ef5c9
[SPARK-32816][SQL] Fix analyzer bug when aggregating multiple distinc…
linhongliu-db Sep 16, 2020
657e39a
[SPARK-32897][PYTHON] Don't show a deprecation warning at SparkSessio…
HyukjinKwon Sep 16, 2020
7fdb571
[SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13
LuciferYang Sep 16, 2020
d936cb3
[SPARK-26425][SS] Add more constraint checks to avoid checkpoint corr…
HeartSaVioR Sep 17, 2020
bd38e0b
[SPARK-32903][SQL] GeneratePredicate should be able to eliminate comm…
viirya Sep 17, 2020
92b75dc
[SPARK-32508][SQL] Disallow empty part col values in partition spec b…
cxzl25 Sep 17, 2020
e5e54a3
[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there ar…
tomvanbussel Sep 17, 2020
a54a6a0
[SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on…
Ngone51 Sep 17, 2020
482a79a
[SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup
sunchao Sep 17, 2020
88e87bc
[SPARK-32887][DOC] Correct the typo for SHOW TABLE
Udbhav30 Sep 17, 2020
a8442c2
[SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action
dongjoon-hyun Sep 17, 2020
5817c58
[SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Sca…
LuciferYang Sep 17, 2020
ea3b979
[SPARK-32889][SQL] orc table column name supports special characters
Sep 17, 2020
4ced588
[SPARK-32635][SQL] Fix foldable propagation
peter-toth Sep 17, 2020
68e0d5f
[SPARK-32902][SQL] Logging plan changes for AQE
maropu Sep 17, 2020
9d6221b
[SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2
zhengruifeng Sep 18, 2020
75dd864
[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
MaxGekk Sep 18, 2020
b49aaa3
[SPARK-32906][SQL] Struct field names should not change after normali…
maropu Sep 18, 2020
8b09536
[SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer Sep 18, 2020
9e9d4b6
[SPARK-32905][CORE][YARN] ApplicationMaster fails to receive UpdateDe…
yaooqinn Sep 18, 2020
7892887
[SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun Sep 18, 2020
105225d
[SPARK-32911][CORE] Free memory in UnsafeExternalSorter.SpillableIter…
tomvanbussel Sep 18, 2020
e2a7401
[SPARK-32874][SQL][FOLLOWUP][TEST-HIVE1.2][TEST-HADOOP2.7] Fix spark-…
yaooqinn Sep 18, 2020
664a171
[SPARK-32936][SQL] Pass all `external/avro` module UTs in Scala 2.13
LuciferYang Sep 18, 2020
2128c4f
[SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13
LuciferYang Sep 18, 2020
3309a2b
[SPARK-32635][SQL][FOLLOW-UP] Add a new test case in catalyst module
peter-toth Sep 18, 2020
f1dc479
[SPARK-32898][CORE] Fix wrong executorRunTime when task killed before…
Ngone51 Sep 18, 2020
f893a19
[SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more inf…
HyukjinKwon Sep 20, 2020
7fb9f68
[SPARK-32799][R][SQL] Add allowMissingColumns to SparkR unionByName
zero323 Sep 21, 2020
9c653c9
[SPARK-32189][DOCS][PYTHON] Development - Setting up IDEs
itholic Sep 21, 2020
0c66813
Revert "[SPARK-32850][CORE] Simplify the RPC message flow of decommis…
cloud-fan Sep 21, 2020
1ad1f71
[SPARK-32946][R][SQL] Add withColumn to SparkR
zero323 Sep 21, 2020
c336ddf
[SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AngersZhuuuu Sep 21, 2020
d01594e
[SPARK-32886][WEBUI] fix 'undefined' link in event timeline view
zhli1142015 Sep 21, 2020
5440ea8
[SPARK-32312][DOC][FOLLOWUP] Fix the minimum version of PyArrow in th…
ueshin Sep 22, 2020
f03c035
[SPARK-32951][SQL] Foldable propagation from Aggregate
peter-toth Sep 22, 2020
3118c22
[SPARK-32949][R][SQL] Add timestamp_seconds to SparkR
zero323 Sep 22, 2020
790d9ef
[SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a …
sarutak Sep 22, 2020
6145621
[SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubq…
cloud-fan Sep 22, 2020
dd80845
[SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13
LuciferYang Sep 22, 2020
fba5736
[SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible …
cloud-fan Sep 22, 2020
7c14f17
[SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()`
MaxGekk Sep 22, 2020
779f0a8
[SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods
zero323 Sep 23, 2020
942f577
[SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant availab…
HyukjinKwon Sep 23, 2020
b53da23
[MINOR][SQL] Improve examples for `percentile_approx()`
MaxGekk Sep 23, 2020
acfee3c
[SPARK-32870][DOCS][SQL] Make sure that all expressions have their Ex…
tanelk Sep 23, 2020
21b7479
[SPARK-32959][SQL][TEST] Fix an invalid test in DataSourceV2SQLSuite
imback82 Sep 23, 2020
432afac
[SPARK-32907][ML] adaptively blockify instances - revert blockify gmm
zhengruifeng Sep 23, 2020
383bb4a
[SPARK-32892][CORE][SQL] Fix hash functions on big-endian platforms
mundaym Sep 23, 2020
faeb71b
[SPARK-32950][SQL] Remove unnecessary big-endian code paths
mundaym Sep 23, 2020
3c97665
[SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1…
dongjoon-hyun Sep 23, 2020
27f6b5a
[SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add so…
holdenk Sep 23, 2020
527cd3f
[SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s exec…
dongjoon-hyun Sep 23, 2020
b3f0087
[SPARK-32977][SQL][DOCS] Fix JavaDoc on Default Save Mode
RussellSpitzer Sep 24, 2020
0bc0e91
[SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation
dongjoon-hyun Sep 24, 2020
31a16fb
[SPARK-32714][PYTHON] Initial pyspark-stubs port
zero323 Sep 24, 2020
688d016
[SPARK-32982][BUILD] Remove hive-1.2 profiles in PIP installation option
HyukjinKwon Sep 24, 2020
fe6d38d
[SPARK-32987][MESOS] Pass all `mesos` module UTs in Scala 2.13
LuciferYang Sep 24, 2020
4ae0f70
[SPARK-32954][YARN][TEST] Add jakarta.servlet-api test dependency to …
LuciferYang Sep 24, 2020
8ccfbc1
[SPARK-32381][CORE][SQL] Move and refactor parallel listing & non-loc…
sunchao Sep 24, 2020
d7aa3b5
[SPARK-32889][SQL][TESTS][FOLLOWUP] Skip special column names test in…
dongjoon-hyun Sep 24, 2020
e9c98c9
[SPARK-32990][SQL] Migrate REFRESH TABLE to use UnresolvedTableOrView…
imback82 Sep 25, 2020
f2fc966
[SPARK-32877][SQL][TEST] Add test for Hive UDF complex decimal type
ulysses-you Sep 25, 2020
9e6882f
[SPARK-32885][SS] Add DataStreamReader.table API
xuanyuanking Sep 25, 2020
e887c63
[SPARK-32931][SQL] Unevaluable Expressions are not Foldable
gatorsmile Sep 25, 2020
6c80547
[SPARK-32997][K8S] Support dynamic PVC creation and deletion in K8s d…
dongjoon-hyun Sep 25, 2020
934a91f
[SPARK-21481][ML][FOLLOWUP][TRIVIAL] HashingTF use util.collection.Op…
zhengruifeng Sep 26, 2020
9a155d4
[SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed…
rednaxelafx Sep 26, 2020
0c38765
[SPARK-32974][ML] FeatureHasher transform optimization
zhengruifeng Sep 27, 2020
c65b645
[SPARK-32714][FOLLOW-UP][PYTHON] Address pyspark.install typing errors
zero323 Sep 27, 2020
bc77e5b
[SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols i…
zhengruifeng Sep 27, 2020
bb6d5e7
[SPARK-32972][ML] Pass all UTs of `mllib` module in Scala 2.13
LuciferYang Sep 27, 2020
f41ba2a
[SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canoni…
tanelk Sep 28, 2020
a7f84a0
[SPARK-32187][PYTHON][DOCS] Doc on Python packaging
fhoering Sep 28, 2020
d15f504
[SPARK-33011][ML] Promote the stability annotation to Evolving for ML…
HeartSaVioR Sep 28, 2020
173da5b
[SPARK-32996][WEB-UI] Handle empty ExecutorMetrics in ExecutorMetrics…
shrutig Sep 28, 2020
a53fc9b
[SPARK-27951][SQL][FOLLOWUP] Improve the window function nth_value
beliefer Sep 29, 2020
376ede1
[SPARK-33021][PYTHON][TESTS] Move functions related test cases into t…
HyukjinKwon Sep 29, 2020
68cd567
[SPARK-33015][SQL] Compute the current date only once
MaxGekk Sep 29, 2020
6868b40
[SPARK-33020][PYTHON] Add nth_value as a PySpark function
HyukjinKwon Sep 29, 2020
1b60ff5
[MINOR][DOCS] Document when `current_date` and `current_timestamp` ar…
MaxGekk Sep 29, 2020
202115e
[SPARK-32948][SQL] Optimize to_json and from_json expression chain
viirya Sep 29, 2020
90e86f6
[SPARK-32970][SPARK-32019][SQL][TEST] Reduce the runtime of an UT for
tanelk Sep 29, 2020
f167002
[SPARK-32901][CORE] Do not allocate memory while spilling UnsafeExter…
tomvanbussel Sep 29, 2020
7766fd1
[MINOR][DOCS] Fixing log message for better clarity
akshatb1 Sep 29, 2020
711d8dd
[SPARK-33018][SQL] Fix estimate statistics issue if child has 0 bytes
wangyum Sep 29, 2020
cc06266
[SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.al…
dongjoon-hyun Sep 29, 2020
3a299aa
[SPARK-32741][SQL] Check if the same ExprId refers to the unique attr…
maropu Sep 30, 2020
ece8d8e
[SPARK-33006][K8S][DOCS] Add dynamic PVC usage example into K8s doc
dongjoon-hyun Sep 30, 2020
3bdbb55
[SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs
GuoPhilipse Sep 30, 2020
d75222d
[SPARK-33012][BUILD][K8S] Upgrade fabric8 to 4.10.3
Oct 1, 2020
0b5a379
[SPARK-33023][CORE] Judge path of Windows need add condition `Utils…
AngersZhuuuu Oct 1, 2020
28ed3a5
[SPARK-32723][WEBUI] Upgrade to jQuery 3.5.1
peter-toth Oct 1, 2020
5651284
[SPARK-32992][SQL] Map Oracle's ROWID type to StringType in read via …
MaxGekk Oct 1, 2020
d3dbe1a
[SQL][DOC][MINOR] Corrects input table names in the examples of CREAT…
iRakson Oct 1, 2020
0963fcd
[SPARK-33024][SQL] Fix CodeGen fallback issue of UDFSuite in Scala 2.13
LuciferYang Oct 1, 2020
9c618b3
[SPARK-33047][BUILD] Upgrade hive-storage-api to 2.7.2
dongjoon-hyun Oct 1, 2020
e62d247
[SPARK-32585][SQL] Support scala enumeration in ScalaReflection
ulysses-you Oct 1, 2020
0059997
[SPARK-33046][DOCS] Update how to build doc for Scala 2.13 with sbt
sarutak Oct 1, 2020
8657742
[SPARK-32996][WEB-UI][FOLLOWUP] Move ExecutorSummarySuite to proper path
shrutig Oct 1, 2020
d6f3138
[SPARK-32859][SQL] Introduce physical rule to decide bucketing dynami…
c21 Oct 2, 2020
991f7e8
[SPARK-32001][SQL] Create JDBC authentication provider developer API
gaborgsomogyi Oct 2, 2020
9996e25
[SPARK-33026][SQL] Add numRows to metric of BroadcastExchangeExec
wangyum Oct 2, 2020
b205be5
[SPARK-33051][INFRA][R] Uses setup-r to install R in GitHub Actions b…
HyukjinKwon Oct 2, 2020
f7ba952
[SPARK-33048][BUILD] Fix SparkBuild.scala to recognize build settings…
sarutak Oct 2, 2020
aa66579
[SPARK-33050][BUILD] Upgrade Apache ORC to 1.5.12
dongjoon-hyun Oct 2, 2020
9b88aca
[SPARK-33030][R] Add nth_value to SparkR
zero323 Oct 2, 2020
82721ce
[SPARK-32741][SQL][FOLLOWUP] Run plan integrity check only for effect…
maropu Oct 2, 2020
1299c8a
[SPARK-33037][SHUFFLE] Remove knownManagers to support user's custom …
Oct 3, 2020
5af62a2
[SPARK-33052][SQL][TEST] Make all the database versions up-to-date fo…
maropu Oct 3, 2020
f86171a
[SPARK-33043][ML] Handle spark.driver.maxResultSize=0 in RowMatrix he…
srowen Oct 3, 2020
9b21fdd
[SPARK-32949][FOLLOW-UP][R][SQL] Reindent lines in SparkR timestamp_s…
zero323 Oct 3, 2020
37c806a
[SPARK-32958][SQL] Prune unnecessary columns from JsonToStructs
viirya Oct 3, 2020
db420f7
[SPARK-33049][CORE] Decommission shuffle block test is flaky
holdenk Oct 3, 2020
fab5321
[SPARK-33065][TESTS] Expand the stack size of a thread in a test in L…
sarutak Oct 4, 2020
4ab9aa0
[SPARK-33017][PYTHON] Add getCheckpointDir method to PySpark Context
reidy-p Oct 5, 2020
e83d03c
[SPARK-33040][R][ML] Add SparkR wrapper for vector_to_array
zero323 Oct 5, 2020
24f890e
[SPARK-33040][FOLLOW-UP][R] Reorder argument choices and add examples
zero323 Oct 5, 2020
0fb2574
[SPARK-33042][SQL][TEST] Add a test case to ensure changes to spark.s…
yuningzh-db Oct 5, 2020
023eb48
[SPARK-32914][SQL] Avoid constructing dataType multiple times
wangyum Oct 5, 2020
a09747b
[SPARK-33063][K8S] Improve error message for insufficient K8s volume …
Gschiavon Oct 5, 2020
14aeab3
[SPARK-33038][SQL] Combine AQE initial and current plan string when t…
allisonwang-db Oct 5, 2020
008a2ad
[SPARK-20202][BUILD][SQL] Remove references to org.spark-project.hive…
dongjoon-hyun Oct 5, 2020
a0aa8f3
[SPARK-33069][INFRA] Skip test result report if no JUnit XML files ar…
HyukjinKwon Oct 6, 2020
9870cf9
[SPARK-33067][SQL][TESTS] Add negative checks to JDBC v2 Table Catalo…
MaxGekk Oct 6, 2020
4adc282
[SPARK-33035][SQL] Updates the obsoleted entries of attribute mapping…
maropu Oct 6, 2020
2793347
[SPARK-32511][SQL] Add dropFields method to Column class
fqaiser94 Oct 6, 2020
ddc7012
[SPARK-32243][SQL] HiveSessionCatalog call super.makeFunctionExpressi…
AngersZhuuuu Oct 6, 2020
0812d6c
[SPARK-33073][PYTHON] Improve error handling on Pandas to Arrow conve…
BryanCutler Oct 6, 2020
b5e4b8c
[SPARK-27428][CORE][TEST] Increase receive buffer size used in Statsd…
mundaym Oct 6, 2020
ec6fccb
[SPARK-32243][SQL][FOLLOWUP] Fix compilation in HiveSessionCatalog
cloud-fan Oct 6, 2020
17d309d
[SPARK-32963][SQL] empty string should be consistent for schema name …
yaooqinn Oct 6, 2020
3b2a38d
[SPARK-32511][SQL][FOLLOWUP] Fix the broken build for Scala 2.13 with…
sarutak Oct 6, 2020
0b326d5
[SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw e…
Ngone51 Oct 6, 2020
57ed5a8
[SPARK-33007][SQL] Simplify named_struct + get struct field + from_js…
viirya Oct 6, 2020
584f90c
[SPARK-33067][SQL][TESTS][FOLLOWUP] Check error messages in JDBCTable…
MaxGekk Oct 7, 2020
5ce321d
[SPARK-33017][PYTHON][DOCS][FOLLOW-UP] Add getCheckpointDir into API …
HyukjinKwon Oct 7, 2020
aea78d2
[SPARK-33034][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add,…
MaxGekk Oct 7, 2020
7e99fcd
[SPARK-33004][SQL] Migrate DESCRIBE column to use UnresolvedTableOrVi…
imback82 Oct 7, 2020
4e1ded6
[SPARK-32189][DOCS][PYTHON][FOLLOW-UP] Fixed broken link and typo in …
itholic Oct 7, 2020
72da6f8
[SPARK-33002][PYTHON] Remove non-API annotations
zero323 Oct 7, 2020
94d648d
[SPARK-33036][SQL] Refactor RewriteCorrelatedScalarSubquery code to r…
maropu Oct 7, 2020
3099fd9
[SPARK-32067][K8S] Use unique ConfigMap name for executor pod template
stijndehaes Oct 7, 2020
a127387
[SPARK-33082][SQL] Remove hive-1.2 workaround code
dongjoon-hyun Oct 7, 2020
23afc93
[SPARK-26499][SQL][FOLLOWUP] Print the loading provider exception sta…
MaxGekk Oct 7, 2020
6daa2ae
[SPARK-21708][BUILD] Migrate build to sbt 1.x
gemelen Oct 7, 2020
37e1b0c
[SPARK-33086][PYTHON] Add static annotations for pyspark.resource
zero323 Oct 8, 2020
473b3ba
[SPARK-32511][FOLLOW-UP][SQL][R][PYTHON] Add dropFields to SparkR and…
zero323 Oct 8, 2020
39510b0
[SPARK-32793][SQL] Add raise_error function, adds error message param…
karenfeng Oct 8, 2020
bbc887b
[SPARK-33089][SQL] make avro format propagate Hadoop config from DS o…
yuningzh-db Oct 8, 2020
1c781a4
[SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle…
imback82 Oct 8, 2020
7d6e3fb
[SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog
MaxGekk Oct 8, 2020
5effa8e
[SPARK-33091][SQL] Avoid using map instead of foreach to avoid potent…
HyukjinKwon Oct 8, 2020
4a47b3e
[DOC][MINOR] pySpark usage - removed repeated keyword causing confusion
manubatham20 Oct 8, 2020
4987db8
[SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedE…
dongjoon-hyun Oct 8, 2020
c5f6af9
[SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS op…
MaxGekk Oct 8, 2020
a907729
[SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString
ulysses-you Oct 9, 2020
3beab8d
[SPARK-32793][FOLLOW-UP] Minor corrections for PySpark annotations an…
zero323 Oct 9, 2020
1234c66
[SPARK-33101][ML] Make LibSVM format propagate Hadoop config from DS …
MaxGekk Oct 9, 2020
e1909c9
[SPARK-33099][K8S] Respect executor idle timeout conf in ExecutorPods…
dongjoon-hyun Oct 9, 2020
edb140e
[SPARK-32896][SS] Add DataStreamWriter.table API
HeartSaVioR Oct 9, 2020
2e07ed3
[SPARK-33082][SPARK-20202][BUILD][SQL][FOLLOW-UP] Remove Hive 1.2 wor…
HyukjinKwon Oct 9, 2020
018811f
[SPARK-33105][INFRA] Change default R arch from i386 to x64 and param…
zero323 Oct 10, 2020
1e63dcc
[SPARK-33102][SQL] Use stringToSeq on SQL list typed parameters
gaborgsomogyi Oct 10, 2020
dfb7790
[SPARK-33108][BUILD] Remove sbt-dependency-graph SBT plugin
dongjoon-hyun Oct 10, 2020
7696ca5
[SPARK-32881][CORE] Catch some race condition errors and log them mor…
holdenk Oct 10, 2020
5e17014
[SPARK-33107][SQL] Remove hive-2.3 workaround code
wangyum Oct 10, 2020
83f8e13
[SPARK-33086][FOLLOW-UP] Remove unused Optional import from pyspark.r…
zero323 Oct 12, 2020
c78971b
[SPARK-33106][BUILD] Fix resolvers clash in SBT
gemelen Oct 12, 2020
50b2a49
[SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue
williamhyun Oct 12, 2020
4af1ac9
[SPARK-32047][SQL] Add JDBC connection provider disable possibility
gaborgsomogyi Oct 12, 2020
543d59d
[SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bun…
wangyum Oct 12, 2020
9896288
[SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
dongjoon-hyun Oct 12, 2020
78c0967
[SPARK-33092][SQL] Support subexpression elimination in ProjectExec
viirya Oct 12, 2020
a0e3244
[SPARK-32704][SQL][FOLLOWUP] Corrects version values of plan logging …
maropu Oct 12, 2020
ed2fe8d
[SPARK-33111][ML] aft transform optimization
zhengruifeng Oct 12, 2020
b27a287
[SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB …
leanken-zz Oct 12, 2020
819f12e
[SPARK-33118][SQL] CREATE TEMPORARY TABLE fails with location
pablolanga-stratio Oct 12, 2020
86d26b4
[SPARK-32455][ML][FOLLOW-UP] LogisticRegressionModel prediction optim…
zhengruifeng Oct 13, 2020
e34f2d8
[SPARK-33119][SQL] ScalarSubquery should returns the first two rows t…
wangyum Oct 13, 2020
17eebd7
[SPARK-32295][SQL] Add not null and size > 0 filters before inner exp…
tanelk Oct 13, 2020
1b0875b
[SPARK-33115][BUILD][DOCS] Fix javadoc errors in `kvstore` and `unsaf…
gemelen Oct 13, 2020
feee8da
[SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeri…
sunchao Oct 13, 2020
af3e2f7
[SPARK-33081][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: upda…
huaxingao Oct 13, 2020
2b7239e
[SPARK-33125][SQL] Improve the error when Lead and Lag are not allowe…
beliefer Oct 13, 2020
dc697a8
[SPARK-13860][SQL] Change statistical aggregate function to return nu…
leanken-zz Oct 13, 2020
304ca1e
[SPARK-33129][BUILD][DOCS] Updating the build/sbt references to test-…
ScrapCodes Oct 13, 2020
1bfcb51
[SPARK-33132][WEBUI] Make `formatBytes` return `0.0 B` for negative i…
echohlne Oct 13, 2020
05a62dc
[SPARK-33134][SQL] Return partial results only for root JSON objects
MaxGekk Oct 14, 2020
d8c4a47
[SPARK-33061][SQL] Expose inverse hyperbolic trig functions through s…
rwpenney Oct 14, 2020
8e5cb1d
[SPARK-33136][SQL] Fix mistakenly swapped parameter in V2WriteCommand…
HeartSaVioR Oct 14, 2020
f3ad32f
[SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows
cloud-fan Oct 14, 2020
9ab0ec4
[SPARK-33146][CORE] Check for non-fatal errors when loading new appli…
Oct 15, 2020
ec34a00
[SPARK-33153][SQL][TESTS] Ignore Spark 2.4 in HiveExternalCatalogVers…
dongjoon-hyun Oct 15, 2020
77a8efb
[SPARK-32932][SQL] Do not use local shuffle reader at final stage on …
manuzhang Oct 15, 2020
8e7c390
[SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only…
dongjoon-hyun Oct 15, 2020
e85ed8a
[SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04
dongjoon-hyun Oct 15, 2020
513b6f5
[SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in…
sarutak Oct 15, 2020
31f7097
[SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCat…
huaxingao Oct 15, 2020
b089fe5
[SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions
HyukjinKwon Oct 15, 2020
82eea13
[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to su…
Victsm Oct 15, 2020
1e10577
Merge branch 'master' into pr/28618
attilapiros Oct 15, 2020
cac0e9e
apply Attila's review comments
attilapiros Oct 16, 2020
861f089
remove unused import
attilapiros Oct 16, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
48 changes: 33 additions & 15 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ jobs:
# Build: build Spark and run the tests for specified modules.
build:
name: "Build modules: ${{ matrix.modules }} ${{ matrix.comment }} (JDK ${{ matrix.java }}, ${{ matrix.hadoop }}, ${{ matrix.hive }})"
runs-on: ubuntu-latest
# Ubuntu 20.04 is the latest LTS. The next LTS is 22.04.
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -154,12 +155,11 @@ jobs:
- name: Install Python packages (Python 3.6 and PyPy3)
if: contains(matrix.modules, 'pyspark')
# PyArrow is not supported in PyPy yet, see ARROW-2651.
# TODO(SPARK-32247): scipy installation with PyPy fails for an unknown reason.
run: |
python3.6 -m pip install numpy pyarrow pandas scipy xmlrunner
python3.6 -m pip list
# PyPy does not have xmlrunner
pypy3 -m pip install numpy pandas
pypy3 -m pip install numpy pandas scipy
pypy3 -m pip list
- name: Install Python packages (Python 3.8)
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
Expand All @@ -168,12 +168,10 @@ jobs:
python3.8 -m pip list
# SparkR
- name: Install R 4.0
uses: r-lib/actions/setup-r@v1
if: contains(matrix.modules, 'sparkr')
run: |
sudo sh -c "echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' >> /etc/apt/sources.list"
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xE298A3A825C0D65DFD57CBB651716619E084DAB9" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev
with:
r-version: 4.0
- name: Install R packages
if: contains(matrix.modules, 'sparkr')
run: |
Expand Down Expand Up @@ -206,7 +204,7 @@ jobs:
# Static analysis, and documentation build
lint:
name: Linters, licenses, dependencies and documentation generation
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
Expand All @@ -232,11 +230,9 @@ jobs:
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
pip3 install flake8 'sphinx<3.1.0' numpy pydata_sphinx_theme ipython nbsphinx
- name: Install R 4.0
run: |
sudo sh -c "echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' >> /etc/apt/sources.list"
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xE298A3A825C0D65DFD57CBB651716619E084DAB9" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev
uses: r-lib/actions/setup-r@v1
with:
r-version: 4.0
- name: Install R linter dependencies and SparkR
run: |
sudo apt-get install -y libcurl4-openssl-dev
Expand Down Expand Up @@ -275,7 +271,7 @@ jobs:

java11:
name: Java 11 build
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
Expand All @@ -297,3 +293,25 @@ jobs:
mkdir -p ~/.m2
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=11 install
rm -rf ~/.m2/repository/org/apache/spark

scala-213:
name: Scala 2.13 build
runs-on: ubuntu-20.04
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
- name: Cache Ivy local repository
uses: actions/cache@v2
with:
path: ~/.ivy2/cache
key: scala-213-ivy-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
restore-keys: |
scala-213-ivy-
- name: Install Java 11
uses: actions/setup-java@v1
with:
java-version: 11
- name: Build with SBT
run: |
./dev/change-scala-version.sh 2.13
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Djava.version=11 -Pscala-2.13 compile test:compile
9 changes: 9 additions & 0 deletions .github/workflows/test_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,16 @@ jobs:
github_token: ${{ secrets.GITHUB_TOKEN }}
workflow: ${{ github.event.workflow_run.workflow_id }}
commit: ${{ github.event.workflow_run.head_commit.id }}
- name: Check if JUnit report XML files exist
run: |
if ls **/target/test-reports/*.xml > /dev/null 2>&1; then
echo '::set-output name=FILE_EXISTS::true'
else
echo '::set-output name=FILE_EXISTS::false'
fi
id: check-junit-file
- name: Publish test report
if: steps.check-junit-file.outputs.FILE_EXISTS == 'true'
uses: scacap/action-surefire-report@v1
with:
check_name: Report test results
Expand Down
17 changes: 17 additions & 0 deletions .sbtopts
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

-J-Xmx4G
-J-Xss4m
7 changes: 7 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ exportMethods("%<=>%",
"asc",
"ascii",
"asin",
"assert_true",
"atan",
"atan2",
"avg",
Expand Down Expand Up @@ -272,6 +273,7 @@ exportMethods("%<=>%",
"degrees",
"dense_rank",
"desc",
"dropFields",
"element_at",
"encode",
"endsWith",
Expand Down Expand Up @@ -348,6 +350,7 @@ exportMethods("%<=>%",
"negate",
"next_day",
"not",
"nth_value",
"ntile",
"otherwise",
"over",
Expand All @@ -359,6 +362,7 @@ exportMethods("%<=>%",
"posexplode_outer",
"quarter",
"radians",
"raise_error",
"rand",
"randn",
"rank",
Expand Down Expand Up @@ -405,6 +409,7 @@ exportMethods("%<=>%",
"sumDistinct",
"tan",
"tanh",
"timestamp_seconds",
"toDegrees",
"toRadians",
"to_csv",
Expand All @@ -425,9 +430,11 @@ exportMethods("%<=>%",
"variance",
"var_pop",
"var_samp",
"vector_to_array",
"weekofyear",
"when",
"window",
"withField",
"xxhash64",
"year")

Expand Down
14 changes: 12 additions & 2 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2863,11 +2863,18 @@ setMethod("unionAll",
#' \code{UNION ALL} and \code{UNION DISTINCT} in SQL as column positions are not taken
#' into account. Input SparkDataFrames can have different data types in the schema.
#'
#' When the parameter allowMissingColumns is `TRUE`, the set of column names
#' in x and y can differ; missing columns will be filled as null.
#' Further, the missing columns of x will be added at the end
#' in the schema of the union result.
#'
#' Note: This does not remove duplicate rows across the two SparkDataFrames.
#' This function resolves columns by name (not by position).
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
#' @param allowMissingColumns logical
#' @param ... further arguments to be passed to or from other methods.
#' @return A SparkDataFrame containing the result of the union.
#' @family SparkDataFrame functions
#' @rdname unionByName
Expand All @@ -2880,12 +2887,15 @@ setMethod("unionAll",
#' df1 <- select(createDataFrame(mtcars), "carb", "am", "gear")
#' df2 <- select(createDataFrame(mtcars), "am", "gear", "carb")
#' head(unionByName(df1, df2))
#'
#' df3 <- select(createDataFrame(mtcars), "carb")
#' head(unionByName(df1, df3, allowMissingColumns = TRUE))
#' }
#' @note unionByName since 2.3.0
setMethod("unionByName",
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
function(x, y) {
unioned <- callJMethod(x@sdf, "unionByName", y@sdf)
function(x, y, allowMissingColumns=FALSE) {
unioned <- callJMethod(x@sdf, "unionByName", y@sdf, allowMissingColumns)
dataFrame(unioned)
})

Expand Down
100 changes: 100 additions & 0 deletions R/pkg/R/column.R
Original file line number Diff line number Diff line change
Expand Up @@ -356,3 +356,103 @@ setMethod("%<=>%",
#' }
#' @note ! since 2.3.0
setMethod("!", signature(x = "Column"), function(x) not(x))

#' withField
#'
#' Adds/replaces field in a struct \code{Column} by name.
#'
#' @param x a Column
#' @param fieldName a character
#' @param col a Column expression
#'
#' @rdname withField
#' @aliases withField withField,Column-method
#' @examples
#' \dontrun{
#' df <- withColumn(
#' createDataFrame(iris),
#' "sepal",
#' struct(column("Sepal_Width"), column("Sepal_Length"))
#' )
#'
#' head(select(
#' df,
#' withField(df$sepal, "product", df$Sepal_Length * df$Sepal_Width)
#' ))
#' }
#' @note withField since 3.1.0
setMethod("withField",
signature(x = "Column", fieldName = "character", col = "Column"),
function(x, fieldName, col) {
jc <- callJMethod(x@jc, "withField", fieldName, col@jc)
column(jc)
})

#' dropFields
#'
#' Drops fields in a struct \code{Column} by name.
#'
#' @param x a Column
#' @param ... names of the fields to be dropped.
#'
#' @rdname dropFields
#' @aliases dropFields dropFields,Column-method
#' @examples
#' \dontrun{
#' df <- select(
#' createDataFrame(iris),
#' alias(
#' struct(
#' column("Sepal_Width"), column("Sepal_Length"),
#' alias(
#' struct(
#' column("Petal_Width"), column("Petal_Length"),
#' alias(
#' column("Petal_Width") * column("Petal_Length"),
#' "Petal_Product"
#' )
#' ),
#' "Petal"
#' )
#' ),
#' "dimensions"
#' )
#' )
#' head(withColumn(df, "dimensions", dropFields(df$dimensions, "Petal")))
#'
#' head(
#' withColumn(
#' df, "dimensions",
#' dropFields(df$dimensions, "Sepal_Width", "Sepal_Length")
#' )
#' )
#'
#' # This method supports dropping multiple nested fields directly e.g.
#' head(
#' withColumn(
#' df, "dimensions",
#' dropFields(df$dimensions, "Petal.Petal_Width", "Petal.Petal_Length")
#' )
#' )
#'
#' # However, if you are going to add/replace multiple nested fields,
#' # it is preffered to extract out the nested struct before
#' # adding/replacing multiple fields e.g.
#' head(
#' withColumn(
#' df, "dimensions",
#' withField(
#' column("dimensions"),
#' "Petal",
#' dropFields(column("dimensions.Petal"), "Petal_Width", "Petal_Length")
#' )
#' )
#' )
#' }
#' @note dropFields since 3.1.0
setMethod("dropFields",
signature(x = "Column"),
function(x, ...) {
jc <- callJMethod(x@jc, "dropFields", list(...))
column(jc)
})
Loading