forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Synch with apache:master #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## What changes were proposed in this pull request? When constructing a DataFrame from a Java bean, using nested beans throws an error despite [documentation](http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection) stating otherwise. This PR aims to add that support. This PR does not yet add nested beans support in array or List fields. This can be added later or in another PR. ## How was this patch tested? Nested bean was added to the appropriate unit test. Also manually tested in Spark shell on code emulating the referenced JIRA: ``` scala> import scala.beans.BeanProperty import scala.beans.BeanProperty scala> class SubCategory(BeanProperty var id: String, BeanProperty var name: String) extends Serializable defined class SubCategory scala> class Category(BeanProperty var id: String, BeanProperty var subCategory: SubCategory) extends Serializable defined class Category scala> import scala.collection.JavaConverters._ import scala.collection.JavaConverters._ scala> spark.createDataFrame(Seq(new Category("s-111", new SubCategory("sc-111", "Sub-1"))).asJava, classOf[Category]) java.lang.IllegalArgumentException: The value (SubCategory65130cf2) of the type (SubCategory) cannot be converted to struct<id:string,name:string> at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:262) at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:396) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1108) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1108) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1108) at org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.toStream(Iterator.scala:1320) at scala.collection.AbstractIterator.toStream(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toSeq(TraversableOnce.scala:298) at scala.collection.AbstractIterator.toSeq(Iterator.scala:1334) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:423) ... 51 elided ``` New behavior: ``` scala> spark.createDataFrame(Seq(new Category("s-111", new SubCategory("sc-111", "Sub-1"))).asJava, classOf[Category]) res0: org.apache.spark.sql.DataFrame = [id: string, subCategory: struct<id: string, name: string>] scala> res0.show() +-----+---------------+ | id| subCategory| +-----+---------------+ |s-111|[sc-111, Sub-1]| +-----+---------------+ ``` Closes #22527 from michalsenkyr/SPARK-17952. Authored-by: Michal Senkyr <[email protected]> Signed-off-by: Takuya UESHIN <[email protected]>
## What changes were proposed in this pull request? The java `foreachBatch` API in `DataStreamWriter` should accept `java.lang.Long` rather `scala.Long`. ## How was this patch tested? New java test. Closes #22633 from zsxwing/fix-java-foreachbatch. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Shixiong Zhu <[email protected]>
…tSuites ## What changes were proposed in this pull request? Improve the runtime by reducing the number of partitions created in the test. The number of partitions are reduced from 280 to 60. Here are the test times for the `getPartitionsByFilter returns all partitions` test on my laptop. ``` [info] - 0.13: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (4 seconds, 230 milliseconds) [info] - 0.14: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (3 seconds, 576 milliseconds) [info] - 1.0: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (3 seconds, 495 milliseconds) [info] - 1.1: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (6 seconds, 728 milliseconds) [info] - 1.2: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (7 seconds, 260 milliseconds) [info] - 2.0: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (8 seconds, 270 milliseconds) [info] - 2.1: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (6 seconds, 856 milliseconds) [info] - 2.2: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (7 seconds, 587 milliseconds) [info] - 2.3: getPartitionsByFilter returns all partitions when hive.metastore.try.direct.sql=false (7 seconds, 230 milliseconds) ## How was this patch tested? Test only. Closes #22644 from dilipbiswal/SPARK-25626. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: gatorsmile <[email protected]>
… ORC write ## What changes were proposed in this pull request? Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold` are applied for all columns. This has been a big huddle to enable dictionary encoding. From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively in a column-wise manner. This PR aims to add that feature by upgrading ORC from 1.5.2 to 1.5.3. The followings are the patches in ORC 1.5.3 and this feature is the only one related to Spark directly. ``` ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte data (gopalv) ORC-403: [C++] Add checks to avoid invalid offsets in InputStream ORC-405: Remove calcite as a dependency from the benchmarks. ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places. ORC-383: Parallel builds fails with ConcurrentModificationException ORC-382: Apache rat exclusions + add rat check to travis ORC-401: Fix incorrect quoting in specification. ORC-385: Change RecordReader to extend Closeable. ORC-384: [C++] fix memory leak when loading non-ORC files ORC-391: [c++] parseType does not accept underscore in the field name ORC-397: Allow selective disabling of dictionary encoding. Original patch was by Mithun Radhakrishnan. ORC-389: Add ability to not decode Acid metadata columns ``` ## How was this patch tested? Pass the Jenkins with newly added test cases. Closes #22622 from dongjoon-hyun/SPARK-25635. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? The total run time of `HiveSparkSubmitSuite` is about 10 minutes. While the related code is stable, add tag `ExtendedHiveTest` for it. ## How was this patch tested? Unit test. Closes #22642 from gengliangwang/addTagForHiveSparkSubmitSuite. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>
… cache UDF result correctly ## What changes were proposed in this pull request? In this test case, we are verifying that the result of an UDF is cached when the underlying data frame is cached and that the udf is not evaluated again when the cached data frame is used. To reduce the runtime we do : 1) Use a single partition dataframe, so the total execution time of UDF is more deterministic. 2) Cut down the size of the dataframe from 10 to 2. 3) Reduce the sleep time in the UDF from 5secs to 2secs. 4) Reduce the failafter condition from 3 to 2. With the above change, it takes about 4 secs to cache the first dataframe. And subsequent check takes a few hundred milliseconds. The new runtime for 5 consecutive runs of this test is as follows : ``` [info] - cache UDF result correctly (4 seconds, 906 milliseconds) [info] - cache UDF result correctly (4 seconds, 281 milliseconds) [info] - cache UDF result correctly (4 seconds, 288 milliseconds) [info] - cache UDF result correctly (4 seconds, 355 milliseconds) [info] - cache UDF result correctly (4 seconds, 280 milliseconds) ``` ## How was this patch tested? This is s test fix. Closes #22638 from dilipbiswal/SPARK-25610. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: gatorsmile <[email protected]>
The docker file was referencing a path that only existed in the distribution tarball; it needs to be parameterized so that the right path can be used in a dev build. Tested on local dev build. Closes #22634 from vanzin/SPARK-25646. Authored-by: Marcelo Vanzin <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request?
Our lint failed due to the following errors:
```
[INFO] --- scalastyle-maven-plugin:1.0.0:check (default) spark-ganglia-lgpl_2.11 ---
error file=/home/jenkins/workspace/spark-master-maven-snapshots/spark/external/spark-ganglia-lgpl/src/main/scala/org/apache/spark/metrics/sink/GangliaSink.scala message=
Are you sure that you want to use toUpperCase or toLowerCase without the root locale? In most cases, you
should use toUpperCase(Locale.ROOT) or toLowerCase(Locale.ROOT) instead.
If you must use toUpperCase or toLowerCase without the root locale, wrap the code block with
// scalastyle:off caselocale
.toUpperCase
.toLowerCase
// scalastyle:on caselocale
line=67 column=49
error file=/home/jenkins/workspace/spark-master-maven-snapshots/spark/external/spark-ganglia-lgpl/src/main/scala/org/apache/spark/metrics/sink/GangliaSink.scala message=
Are you sure that you want to use toUpperCase or toLowerCase without the root locale? In most cases, you
should use toUpperCase(Locale.ROOT) or toLowerCase(Locale.ROOT) instead.
If you must use toUpperCase or toLowerCase without the root locale, wrap the code block with
// scalastyle:off caselocale
.toUpperCase
.toLowerCase
// scalastyle:on caselocale
line=71 column=32
Saving to outputFile=/home/jenkins/workspace/spark-master-maven-snapshots/spark/external/spark-ganglia-lgpl/target/scalastyle-output.xml
```
See https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/8890/
## How was this patch tested?
N/A
Closes #22647 from gatorsmile/fixLint.
Authored-by: gatorsmile <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
## What changes were proposed in this pull request? Adds support for the setting limit in the sql split function ## How was this patch tested? 1. Updated unit tests 2. Tested using Scala spark shell Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22227 from phegstrom/master. Authored-by: Parker Hegstrom <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…Type while inferring CSV schema. ## What changes were proposed in this pull request? Current the CSV's infer schema code inlines `TypeCoercion.findTightestCommonType`. This is a minor refactor to make use of the common type coercion code when applicable. This way we can take advantage of any improvement to the base method. Thanks to MaxGekk for finding this while reviewing another PR. ## How was this patch tested? This is a minor refactor. Existing tests are used to verify the change. Closes #22619 from dilipbiswal/csv_minor. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…HiveSupportSuite ## What changes were proposed in this pull request? By replacing loops with random possible value. - `read partitioning bucketed tables with bucket pruning filters` reduce from 55s to 7s - `read partitioning bucketed tables having composite filters` reduce from 54s to 8s - total time: reduce from 288s to 192s ## How was this patch tested? Unit test Closes #22640 from gengliangwang/fastenBucketedReadSuite. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
## What changes were proposed in this pull request? Refactor `MiscBenchmark ` to use main method. Generate benchmark result: ```sh SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.MiscBenchmark" ``` ## How was this patch tested? manual tests Closes #22500 from wangyum/SPARK-25488. Lead-authored-by: Yuming Wang <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request? Clean up the joinCriteria parsing in the parser by directly using identifierList ## How was this patch tested? N/A Closes #22648 from gatorsmile/cleanupJoinCriteria. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>
…foreachBatch ## What changes were proposed in this pull request? This PR fixes the Scala-2.12 build error due to ambiguity in `foreachBatch` test cases. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/428/console ```scala [error] /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ForeachBatchSinkSuite.scala:102: ambiguous reference to overloaded definition, [error] both method foreachBatch in class DataStreamWriter of type (function: org.apache.spark.api.java.function.VoidFunction2[org.apache.spark.sql.Dataset[Int],Long])org.apache.spark.sql.streaming.DataStreamWriter[Int] [error] and method foreachBatch in class DataStreamWriter of type (function: (org.apache.spark.sql.Dataset[Int], Long) => Unit)org.apache.spark.sql.streaming.DataStreamWriter[Int] [error] match argument types ((org.apache.spark.sql.Dataset[Int], Any) => Unit) [error] ds.writeStream.foreachBatch((_, _) => {}).trigger(Trigger.Continuous("1 second")).start() [error] ^ [error] /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ForeachBatchSinkSuite.scala:106: ambiguous reference to overloaded definition, [error] both method foreachBatch in class DataStreamWriter of type (function: org.apache.spark.api.java.function.VoidFunction2[org.apache.spark.sql.Dataset[Int],Long])org.apache.spark.sql.streaming.DataStreamWriter[Int] [error] and method foreachBatch in class DataStreamWriter of type (function: (org.apache.spark.sql.Dataset[Int], Long) => Unit)org.apache.spark.sql.streaming.DataStreamWriter[Int] [error] match argument types ((org.apache.spark.sql.Dataset[Int], Any) => Unit) [error] ds.writeStream.foreachBatch((_, _) => {}).partitionBy("value").start() [error] ^ ``` ## How was this patch tested? Manual. Since this failure occurs in Scala-2.12 profile and test cases, Jenkins will not test this. We need to build with Scala-2.12 and run the tests. Closes #22649 from dongjoon-hyun/SPARK-SCALA212. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request? `InMemoryFileIndex` contains a cache of `LocatedFileStatus` objects. Each `LocatedFileStatus` object can contain several `BlockLocation`s or some subclass of it. Filling up this cache by listing files happens recursively either on the driver or on the executors, depending on the parallel discovery threshold (`spark.sql.sources.parallelPartitionDiscovery.threshold`). If the listing happens on the executors block location objects are converted to simple `BlockLocation` objects to ensure serialization requirements. If it happens on the driver then there is no conversion and depending on the file system a `BlockLocation` object can be a subclass like `HdfsBlockLocation` and consume more memory. This PR adds the conversion to the latter case and decreases memory consumption. ## How was this patch tested? Added unit test. Closes #22603 from peter-toth/SPARK-25062. Authored-by: Peter Toth <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ables ## What changes were proposed in this pull request? After the PR, #22592, SQL tab supports collapsing table. However, after refreshing the page, it doesn't store it previous state. This was due to a typo in the argument list in the collapseTablePageLoadCommand(). ## How was this patch tested? bin/spark-shell ``` sql("create table a (id int)") for(i <- 1 to 100) sql(s"insert into a values ($i)") ```  Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22650 from shahidki31/SPARK-25575-followUp. Authored-by: Shahid <[email protected]> Signed-off-by: Sean Owen <[email protected]>
## What changes were proposed in this pull request? Currently, we do not build external/spark-ganglia-lgpl in Jenkins tests when the code is changed. ## How was this patch tested? N/A Closes #22658 from gatorsmile/buildGanglia. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This PR upgrade `lz4-java` to 1.5.0 get speed improvement. **General speed improvements** LZ4 decompression speed has always been a strong point. In v1.8.2, this gets even better, as it improves decompression speed by about 10%, thanks in a large part to suggestion from svpv . For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU 3.10GHz, running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0: Version | v1.8.1 | v1.8.2 | Improvement -- | -- | -- | -- Decompression speed | 2490 MB/s | 2770 MB/s | +11% Compression speeds also receive a welcomed boost, though improvement is not evenly distributed, with higher levels benefiting quite a lot more. Version | v1.8.1 | v1.8.2 | Improvement -- | -- | -- | -- lz4 -1 | 504 MB/s | 516 MB/s | +2% lz4 -9 | 23.2 MB/s | 25.6 MB/s | +10% lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170% More details: https://github.com/lz4/lz4/releases/tag/v1.8.3 **Below is my benchmark result** set `spark.sql.parquet.compression.codec` to `lz4` and disable orc benchmark, then run `FilterPushdownBenchmark`. lz4-java 1.5.0: ``` [success] Total time: 5585 s, completed Sep 26, 2018 5:22:16 PM ``` lz4-java 1.4.0: ``` [success] Total time: 5591 s, completed Sep 26, 2018 5:22:24 PM ``` Some benchmark result: ``` lz4-java 1.5.0 Select 1 row with 500 filters: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 1953 / 1980 0.0 1952502908.0 1.0X Parquet Vectorized (Pushdown) 2541 / 2585 0.0 2541019869.0 0.8X lz4-java 1.4.0 Select 1 row with 500 filters: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 1979 / 2103 0.0 1979328144.0 1.0X Parquet Vectorized (Pushdown) 2596 / 2909 0.0 2596222118.0 0.8X ``` Complete benchmark result: https://issues.apache.org/jira/secure/attachment/12941360/FilterPushdownBenchmark-lz4-java-140-results.txt https://issues.apache.org/jira/secure/attachment/12941361/FilterPushdownBenchmark-lz4-java-150-results.txt ## How was this patch tested? manual tests Closes #22551 from wangyum/SPARK-25539. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…type of Pandas.Series and return type of pandas udf ## What changes were proposed in this pull request? For Pandas UDFs, we get arrow type from defined Catalyst return data type of UDFs. We use this arrow type to do serialization of data. If the defined return data type doesn't match with actual return type of Pandas.Series returned by Pandas UDFs, it has a risk to return incorrect data from Python side. Currently we don't have reliable approach to check if the data conversion is safe or not. We leave some document to notify this to users for now. When there is next upgrade of PyArrow available we can use to check it, we should add the option to check it. ## How was this patch tested? Only document change. Closes #22610 from viirya/SPARK-25461. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…method ## What changes were proposed in this pull request? Refactor `HashByteArrayBenchmark` to use main method. 1. use `spark-submit`: ```console bin/spark-submit --class org.apache.spark.sql.HashByteArrayBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ``` 2. Generate benchmark result: ```console SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "catalyst/test:runMain org.apache.spark.sql.HashByteArrayBenchmark" ``` ## How was this patch tested? manual tests Closes #22652 from wangyum/SPARK-25658. Lead-authored-by: Yuming Wang <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request? Refactor `HashBenchmark` to use main method. 1. use `spark-submit`: ```console bin/spark-submit --class org.apache.spark.sql.HashBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ``` 2. Generate benchmark result: ```console SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "catalyst/test:runMain org.apache.spark.sql.HashBenchmark" ``` ## How was this patch tested? manual tests Closes #22651 from wangyum/SPARK-25657. Lead-authored-by: Yuming Wang <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request? This is the same as #22492 but for master branch. Revert SPARK-14681 to avoid API breaking changes. cc: WeichenXu123 ## How was this patch tested? Existing unit tests. Closes #22618 from mengxr/SPARK-25321.master. Authored-by: WeichenXu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
## What changes were proposed in this pull request? #12980 added Travis CI file mainly for linter because we disabled Java lint check in Jenkins. It's enabled as of #21399 and now SBT runs it. Looks we can now remove the file added before. ## How was this patch tested? N/A Closes #22665 Closes #22667 from HyukjinKwon/SPARK-25673. Authored-by: hyukjinkwon <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
## What changes were proposed in this pull request? If we use accumulators in more than one UDFs, it is possible to overwrite deserialized accumulators and its values. We should check if an accumulator was deserialized before overwriting it in accumulator registry. ## How was this patch tested? Added test. Closes #22635 from viirya/SPARK-25591. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…stdCompressionCodec throwing IllegalArgumentException Exception ## What changes were proposed in this pull request? Documentation is updated with proper classname org.apache.spark.io.ZStdCompressionCodec ## How was this patch tested? we used the spark.io.compression.codec = org.apache.spark.io.ZStdCompressionCodec and verified the logs. Closes #22669 from shivusondur/CompressionIssue. Authored-by: shivusondur <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…hon data and SQL types in normal UDFs ### What changes were proposed in this pull request? We are facing some problems about type conversions between Python data and SQL types in UDFs (Pandas UDFs as well). It's even difficult to identify the problems (see #20163 and #22610). This PR targets to internally document the type conversion table. Some of them looks buggy and we should fix them. ```python import sys import array import datetime from decimal import Decimal from pyspark.sql import Row from pyspark.sql.types import * from pyspark.sql.functions import udf if sys.version >= '3': long = int data = [ None, True, 1, long(1), "a", u"a", datetime.date(1970, 1, 1), datetime.datetime(1970, 1, 1, 0, 0), 1.0, array.array("i", [1]), [1], (1,), bytearray([65, 66, 67]), Decimal(1), {"a": 1}, Row(kwargs=1), Row("namedtuple")(1), ] types = [ BooleanType(), ByteType(), ShortType(), IntegerType(), LongType(), StringType(), DateType(), TimestampType(), FloatType(), DoubleType(), ArrayType(IntegerType()), BinaryType(), DecimalType(10, 0), MapType(StringType(), IntegerType()), StructType([StructField("_1", IntegerType())]), ] df = spark.range(1) results = [] count = 0 total = len(types) * len(data) spark.sparkContext.setLogLevel("FATAL") for t in types: result = [] for v in data: try: row = df.select(udf(lambda: v, t)()).first() ret_str = repr(row[0]) except Exception: ret_str = "X" result.append(ret_str) progress = "SQL Type: [%s]\n Python Value: [%s(%s)]\n Result Python Value: [%s]" % ( t.simpleString(), str(v), type(v).__name__, ret_str) count += 1 print("%s/%s:\n %s" % (count, total, progress)) results.append([t.simpleString()] + list(map(str, result))) schema = ["SQL Type \\ Python Value(Type)"] + list(map(lambda v: "%s(%s)" % (str(v), type(v).__name__), data)) strings = spark.createDataFrame(results, schema=schema)._jdf.showString(20, 20, False) print("\n".join(map(lambda line: " # %s # noqa" % line, strings.strip().split("\n")))) ``` This table was generated under Python 2 but the code above is Python 3 compatible as well. ## How was this patch tested? Manually tested and lint check. Closes #22655 from HyukjinKwon/SPARK-25666. Authored-by: hyukjinkwon <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
While working on another PR, I noticed that there is quite some legacy Java in there that can be beautified. For example the use of features from Java8, such as: - Collection libraries - Try-with-resource blocks No logic has been changed. I think it is important to have a solid codebase with examples that will inspire next PR's to follow up on the best practices. What are your thoughts on this? This makes code easier to read, and using try-with-resource makes is less likely to forget to close something. ## What changes were proposed in this pull request? No changes in the logic of Spark, but more in the aesthetics of the code. ## How was this patch tested? Using the existing unit tests. Since no logic is changed, the existing unit tests should pass. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22637 from Fokko/SPARK-25408. Authored-by: Fokko Driesprong <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…sPercent default to 100 ## What changes were proposed in this pull request? We want to change the default percentage to 100 for spark.shuffle.server.chunkFetchHandlerThreadsPercent. The reason being currently this is set to 0. Which means currently if server.ioThreads > 0, the default number of threads would be 2 * #cores instead of server.io.Threads. We want the default to server.io.Threads in case this is not set at all. Also here a default of 0 would also mean 2 * #cores ## How was this patch tested? Manual (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22628 from redsanket/SPARK-25641. Lead-authored-by: Sanket Chintapalli <[email protected]> Co-authored-by: Sanket Chintapalli <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
## What changes were proposed in this pull request? There was 5 suites extends `HadoopFsRelationTest`, for testing "orc"/"parquet"/"text"/"json" data sources. This PR refactor the base trait `HadoopFsRelationTest`: 1. Rename unnecessary loop for setting parquet conf 2. The test case `SPARK-8406: Avoids name collision while writing files` takes about 14 to 20 seconds. As now all the file format data source are using common code, for creating result files, we can test one data source(Parquet) only to reduce test time. To run related 5 suites: ``` ./build/sbt "hive/testOnly *HadoopFsRelationSuite" ``` The total test run time is reduced from 5 minutes 40 seconds to 3 minutes 50 seconds. ## How was this patch tested? Unit test Closes #22643 from gengliangwang/refactorHadoopFsRelationTest. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>
…ultiple watermarks ## What changes were proposed in this pull request? Added - Python foreach - Scala, Java and Python foreachBatch - Multiple watermark policy - The semantics of what changes are allowed to the streaming between restarts. ## How was this patch tested? No tests Closes #22627 from tdas/SPARK-25639. Authored-by: Tathagata Das <[email protected]> Signed-off-by: Tathagata Das <[email protected]>
…ateDataFrame in PySpark ## What changes were proposed in this pull request? This PR proposes to specify type inference and simple e2e tests. Looks we are not cleanly testing those logics. For instance, see https://github.com/apache/spark/blob/08c76b5d39127ae207d9d1fff99c2551e6ce2581/python/pyspark/sql/types.py#L894-L905 Looks we intended to support datetime.time and None for type inference too but it does not work: ``` >>> spark.createDataFrame([[datetime.time()]]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/session.py", line 751, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File "/.../spark/python/pyspark/sql/session.py", line 432, in _createFromLocal data = [schema.toInternal(row) for row in data] File "/.../spark/python/pyspark/sql/types.py", line 604, in toInternal for f, v, c in zip(self.fields, obj, self._needConversion)) File "/.../spark/python/pyspark/sql/types.py", line 604, in <genexpr> for f, v, c in zip(self.fields, obj, self._needConversion)) File "/.../spark/python/pyspark/sql/types.py", line 442, in toInternal return self.dataType.toInternal(obj) File "/.../spark/python/pyspark/sql/types.py", line 193, in toInternal else time.mktime(dt.timetuple())) AttributeError: 'datetime.time' object has no attribute 'timetuple' >>> spark.createDataFrame([[None]]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/session.py", line 751, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File "/.../spark/python/pyspark/sql/session.py", line 419, in _createFromLocal struct = self._inferSchemaFromList(data, names=schema) File "/.../python/pyspark/sql/session.py", line 353, in _inferSchemaFromList raise ValueError("Some of types cannot be determined after inferring") ValueError: Some of types cannot be determined after inferring ``` ## How was this patch tested? Manual tests and unit tests were added. Closes #22653 from HyukjinKwon/SPARK-25659. Authored-by: hyukjinkwon <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…isticRegressionSuite ...with intercept with L1 regularization ## What changes were proposed in this pull request? In the test, "multinomial logistic regression with intercept with L1 regularization" in the "LogisticRegressionSuite", taking more than a minute due to training of 2 logistic regression model. However after analysing the training cost over iteration, we can reduce the computation time by 50%. Training cost vs iteration for model 1  So, model1 is converging after iteration 150. Training cost vs iteration for model 2  After around 100 iteration, model2 is converging. So, if we give maximum iteration for model1 and model2 as 175 and 125 respectively, we can reduce the computation time by half. ## How was this patch tested? Computation time in local setup : Before change: ~53 sec After change: ~26 sec Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22659 from shahidki31/SPARK-25623. Authored-by: Shahid <[email protected]> Signed-off-by: Sean Owen <[email protected]>
## What changes were proposed in this pull request? Currently the first row of dataset of CSV strings is compared to field names of user specified or inferred schema independently of presence of CSV header. It causes false-positive error messages. For example, parsing `"1,2"` outputs the error: ```java java.lang.IllegalArgumentException: CSV header does not conform to the schema. Header: 1, 2 Schema: _c0, _c1 Expected: _c0 but found: 1 ``` In the PR, I propose: - Checking CSV header only when it exists - Filter header from the input dataset only if it exists ## How was this patch tested? Added a test to `CSVSuite` which reproduces the issue. Closes #22656 from MaxGekk/inferred-header-check. Authored-by: Maxim Gekk <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
…not consume all the inputs ## What changes were proposed in this pull request? This PR is inspired by #22524, but proposes a safer fix. The current limit whole stage codegen has 2 problems: 1. It's only applied to `InputAdapter`, many leaf nodes can't stop earlier w.r.t. limit. 2. It needs to override a method, which will break if we have more than one limit in the whole-stage. The first problem is easy to fix, just figure out which nodes can stop earlier w.r.t. limit, and update them. This PR updates `RangeExec`, `ColumnarBatchScan`, `SortExec`, `HashAggregateExec`. The second problem is hard to fix. This PR proposes to propagate the limit counter variable name upstream, so that the upstream leaf/blocking nodes can check the limit counter and quit the loop earlier. For better performance, the implementation here follows `CodegenSupport.needStopCheck`, so that we only codegen the check only if there is limit in the query. For columnar node like range, we check the limit counter per-batch instead of per-row, to make the inner loop tight and fast. Why this is safer? 1. the leaf/blocking nodes don't have to check the limit counter and stop earlier. It's only for performance. (this is same as before) 2. The blocking operators can stop propagating the limit counter name, because the counter of limit after blocking operators will never increase, before blocking operators consume all the data from upstream operators. So the upstream operators don't care about limit after blocking operators. This is also for performance only, it's OK if we forget to do it for some new blocking operators. ## How was this patch tested? a new test Closes #22630 from cloud-fan/limit. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kazuaki Ishizaki <[email protected]>
It would be nice to have a field in Stage Page UI which would show mapping of the current stage id to the job id's to which that stage belongs to. ## What changes were proposed in this pull request? Added a field in Stage UI to display the corresponding job id for that particular stage. ## How was this patch tested? <img width="448" alt="screen shot 2018-07-25 at 1 33 07 pm" src="https://user-images.githubusercontent.com/22228190/43220447-a8e94f80-900f-11e8-8a20-a235bbd5a369.png"> Closes #21809 from pgandhi999/SPARK-24851. Authored-by: pgandhi <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
The commons-crypto library does some questionable error handling internally, which can lead to JVM crashes if some call into native code fails and cleans up state it should not. While the library is not fixed, this change adds some workarounds in Spark code so that when an error is detected in the commons-crypto side, Spark avoids calling into the library further. Tested with existing and added unit tests. Closes #22557 from vanzin/SPARK-25535. Authored-by: Marcelo Vanzin <[email protected]> Signed-off-by: Imran Rashid <[email protected]>
…ts in Parquet ## What changes were proposed in this pull request? This is a follow up of #22574. Renamed the parameter and added comments. ## How was this patch tested? N/A Closes #22679 from gatorsmile/followupSPARK-25559. Authored-by: gatorsmile <[email protected]> Signed-off-by: DB Tsai <[email protected]>
## What changes were proposed in this pull request? This is a follow-up pr of #18536 and #22545 to update the migration guide. ## How was this patch tested? Build and check the doc locally. Closes #22682 from ueshin/issues/SPARK-20946_25525/migration_guide. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ts for a subset of timezones ## What changes were proposed in this pull request? Try testing timezones in parallel instead in CastSuite, instead of random sampling. See also #22631 ## How was this patch tested? Existing test. Closes #22672 from srowen/SPARK-25605.2. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…essionCodecSuite ## What changes were proposed in this pull request? Reduced the combination of codecs from 9 to 3 to improve the test runtime. ## How was this patch tested? This is a test fix. Closes #22641 from dilipbiswal/SPARK-25611. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…re is an error connecting to master
## What changes were proposed in this pull request?
Cause of the error is wrapped with SparkException, now finding the cause from the wrapped exception and throwing the cause instead of the wrapped exception.
## How was this patch tested?
Verified it manually by checking the cause of the error, it gives the error as shown below.
### Without the PR change
```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output
```
### With the PR change
```
[apache-spark]$ ./bin/spark-submit --verbose --master spark://******
....
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
....
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Failed to connect to devaraj-pc1/10.3.66.65:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
....
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: devaraj-pc1/10.3.66.65:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
....
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
```
Closes #22623 from devaraj-kavali/SPARK-25636.
Authored-by: Devaraj K <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
## What changes were proposed in this pull request? Inspired by #22574 . We can partially push down top level conjunctive predicates to Orc. This PR improves Orc predicate push down in both SQL and Hive module. ## How was this patch tested? New unit test. Closes #22684 from gengliangwang/pushOrcFilters. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: DB Tsai <[email protected]>
## What changes were proposed in this pull request? Remove Hadoop 2.6 references and make 2.7 the default. Obviously, this is for master/3.0.0 only. After this we can also get rid of the separate test jobs for Hadoop 2.6. ## How was this patch tested? Existing tests Closes #22615 from srowen/SPARK-25016. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…urce V2 write path ## What changes were proposed in this pull request? This PR proposes to avoid to make a readsupport and read schema when it writes in other save modes. 5fef6e3 happened to create a readsupport in write path, which ended up with reading schema from readsupport at write path. This breaks `spark.range(1).format("source").write.save("non-existent-path")` case since there's no way to read the schema from "non-existent-path". See also #22009 (comment) See also #22697 See also http://apache-spark-developers-list.1001551.n3.nabble.com/Possible-bug-in-DatasourceV2-td25343.html ## How was this patch tested? Unit test and manual tests. Closes #22688 from HyukjinKwon/append-revert-2. Authored-by: hyukjinkwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
… method ## What changes were proposed in this pull request? 1. Refactor DataSourceReadBenchmark ## How was this patch tested? Manually tested and regenerated results. ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark" ``` Closes #22664 from peter-toth/SPARK-25662. Lead-authored-by: Peter Toth <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: DB Tsai <[email protected]>
## What changes were proposed in this pull request? remove Redundant semicolons in SortMergeJoinExec, thanks. ## How was this patch tested? N/A Closes #22695 from heary-cao/RedundantSemicolons. Authored-by: caoxuewen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
## What changes were proposed in this pull request? Remove SnappyOutputStreamWrapper and other workaround now that new Snappy fixes these. See also #21176 and comments it links to. ## How was this patch tested? Existing tests Closes #22691 from srowen/SPARK-24109. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…streaming write to non-existing topic ## What changes were proposed in this pull request? Specify `kafka.max.block.ms` to 10 seconds while creating the kafka writer. In the absence of this overridden config, by default it uses a default time out of 60 seconds. With this change the test completes in close to 10 seconds as opposed to 1 minute. ## How was this patch tested? This is a test fix. Closes #22671 from dilipbiswal/SPARK-25615. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: Sean Owen <[email protected]>
… time,the number of bytes might rarely ever get updated ## What changes were proposed in this pull request? If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated,because it might skip over the count that is an exact multiple of UPDATE_INPUT_METRICS_INTERVAL_RECORDS. This PR just checks whether the increment causes the value to exceed a higher multiple of UPDATE_INPUT_METRICS_INTERVAL_RECORDS. ## How was this patch tested? existed unit tests Closes #22594 from 10110346/inputMetrics. Authored-by: liuxian <[email protected]> Signed-off-by: Sean Owen <[email protected]>
… Spark 3 ## What changes were proposed in this pull request? Removes all vestiges of Flume in the build, for Spark 3. I don't think this needs Jenkins config changes. ## How was this patch tested? Existing tests. Closes #22692 from srowen/SPARK-25598. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
## What changes were proposed in this pull request?
1. Move `CSVDataSource.makeSafeHeader` to `CSVUtils.makeSafeHeader` (as is).
- Historically and at the first place of refactoring (which I did), I intended to put all CSV specific handling (like options), filtering, extracting header, etc.
- See `JsonDataSource`. Now `CSVDataSource` is quite consistent with `JsonDataSource`. Since CSV's code path is quite complicated, we might better match them as possible as we can.
2. Create `CSVHeaderChecker` and put `enforceSchema` logics into that.
- The checking header and column pruning stuff were added (per #20894 and #21296) but some of codes such as #22123 are duplicated
- Also, checking header code is basically here and there. We better put them in a single place, which was quite error-prone. See (#22656).
3. Move `CSVDataSource.checkHeaderColumnNames` to `CSVHeaderChecker.checkHeaderColumnNames` (as is).
- Similar reasons above with 1.
## How was this patch tested?
Existing tests should cover this.
Closes #22676 from HyukjinKwon/refactoring-csv.
Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
… result
## What changes were proposed in this pull request?
This PR can correctly cause assertion failure when incorrect nullable of DataType in the result is generated by a target function to be tested.
Let us think the following example. In the future, a developer would write incorrect code that returns unexpected result. We have to correctly cause fail in this test since `valueContainsNull=false` while `expr` includes `null`. However, without this PR, this test passes. This PR can correctly cause fail.
```
test("test TARGETFUNCTON") {
val expr = TARGETMAPFUNCTON()
// expr = UnsafeMap(3 -> 6, 7 -> null)
// expr.dataType = (IntegerType, IntegerType, false)
expected = Map(3 -> 6, 7 -> null)
checkEvaluation(expr, expected)
```
In [`checkEvaluationWithUnsafeProjection`](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L208-L235), the results are compared using `UnsafeRow`. When the given `expected` is [converted](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L226-L227)) to `UnsafeRow` using the `DataType` of `expr`.
```
val expectedRow = UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit)
```
In summary, `expr` is `[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]` with and w/o this PR. `expected` is converted to
* w/o this PR, `[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]`
* with this PR, `[0,1800000038,5000000038,18,2,0,700000003,2,2,6,18,2,0,700000003,2,2,6]`
As a result, w/o this PR, the test unexpectedly passes.
This is because, w/o this PR, based on given `dataType`, generated code of projection for `expected` avoids to set nullbit.
```
// tmpInput_2 is expected
/* 155 */ for (int index_1 = 0; index_1 < numElements_1; index_1++) {
/* 156 */ mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1));
/* 157 */ }
```
With this PR, generated code of projection for `expected` always checks whether nullbit should be set by `isNullAt`
```
// tmpInput_2 is expected
/* 161 */ for (int index_1 = 0; index_1 < numElements_1; index_1++) {
/* 162 */
/* 163 */ if (tmpInput_2.isNullAt(index_1)) {
/* 164 */ mutableStateArray_1[1].setNull4Bytes(index_1);
/* 165 */ } else {
/* 166 */ mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1));
/* 167 */ }
/* 168 */
/* 169 */ }
```
## How was this patch tested?
Existing UTs
Closes #22375 from kiszk/SPARK-25388.
Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…ilize and can be applied infinitely ## What changes were proposed in this pull request? The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed. This was fixed in SPARK-24891 and was then broken by SPARK-25044. The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor. ## How was this patch tested? Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044. Closes #22701 from maryannxue/spark-25690. Authored-by: maryannxue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? According to the SQL standard, when a query contains `HAVING`, it indicates an aggregate operator. For more details please refer to https://blog.jooq.org/2014/12/04/do-you-really-understand-sqls-group-by-and-having-clauses/ However, in Spark SQL parser, we treat HAVING as a normal filter when there is no GROUP BY, which breaks SQL semantic and lead to wrong result. This PR fixes the parser. ## How was this patch tested? new test Closes #22696 from cloud-fan/having. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: gatorsmile <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.