Skip to content

Latest commit

 

History

History
266 lines (260 loc) · 28.4 KB

CHANGELOG.md

File metadata and controls

266 lines (260 loc) · 28.4 KB

Change log

Generated on 2023-06-08

Release 23.06

Features

#8079 [FEA] Release Spark 3.4 Support
#7043 [FEA] Support Empty2Null expression on Spark 3.4.0
#8222 [FEA] String Split Unsupported escaped character '.'
#8211 [FEA] Add tencent blob store uri to spark rapids cloudScheme defaults
#4103 [FEA] jdk17 support
#7094 [FEA] Add a shim layer for Spark 3.2.4
#6202 [SPARK-39528][SQL] Use V2 Filter in SupportsRuntimeFiltering
#6034 [FEA] Support offset parameter in TakeOrderedAndProject
#8196 [FEA] Add retry handling to GpuGenerateExec.fixedLenLazyArrayGenerate path
#7891 [FEA] Support StddevSamp with cast(col as double) for input
#62 [FEA] stddevsamp function
#7867 [FEA] support json to struct function
#7883 [FEA] support order by string in windowing function
#7882 [FEA] support StringTranslate function
#7843 [FEA] build with CUDA 12
#8045 [FEA] Support repetition in choice on regular expressions
#6882 [FEA] Regular expressions - support line anchors in choice
#7901 [FEA] better rlike function supported
#7784 [FEA] Add Spark 3.3.3-SNAPSHOT to shims
#7260 [FEA] Create a new Expression execution framework

Performance

#7870 [FEA] Turn on spark.rapids.sql.castDecimalToString.enabled by default
#7321 [FEA] Improve performance of small file ORC reads from blobstores
#7672 Make all buffers/columnar batches spillable by default

Bugs Fixed

#8483 [BUG] test_read_compressed_hive_text fails on CDH
#8330 [BUG] Handle Decimal128 computation with overflow of Remainder on Spark 3.4
#8448 [BUG] GpuRegExpReplaceWithBackref with empty string input produces incorrect result on GPU in Spark 3.1.1
#8323 [BUG] regexp_replace hangs with specific inputs and patterns
#8473 [BUG] Complete aggregation with non-trivial grouping expression fails
#8440 [BUG] the jar with scaladoc overwrites the jar with javadoc
#8469 [BUG] Multi-threaded reader can't be toggled on/off
#8460 [BUG] Compile failure on Databricks 11.3 with GpuHiveTableScanExec.scala
#8114 [BUG] [AUDIT] [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory
#6786 [BUG] NDS q95 fails with OOM at 10TB
#8419 [BUG] Hive Text reader fails for GZIP compressed input
#8409 [BUG] JVM agent crashed SIGFPE cudf::detail::repeat in integration tests
#8411 [BUG] Close called too many times in Gpu json reader
#8400 [BUG] Cloudera IT test failures - test_timesub_from_subquery
#8240 [BUG] NDS power run hits GPU OOM on Databricks.
#8375 [BUG] test_empty_filter[>] failed in 23.06 nightly
#8363 [BUG] ORC reader NullPointerExecption
#8281 [BUG] ParquetCachedBatchSerializer is crashing on count
#8331 [BUG] Filter on dates with subquery results in ArrayIndexOutOfBoundsException
#8293 [BUG] GpuTimeAdd throws UnsupportedOperationException takes column and interval as an argument only
#8161 Add support for Remainder[DecimalType] for Spark 3.4 and DB 11.3
#8321 [BUG] test_read_hive_fixed_length_char integ test fails on Spark 3.4
#8225 [BUG] GpuGetArrayItem only supports ints as the ordinal.
#8294 [BUG] ORC CHAR(N) columns written from Hive unreadable with RAPIDS plugin
#8186 [BUG] integration test test_cast_nested can fail with non-empty nulls
#6190 [SPARK-39731][SQL] Fix issue in CSV data sources when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy
#8185 [BUG] Scala Test md5 can produce non-empty nulls (merge and set validity)
#8235 [BUG] Java agent crashed intermittently running integration tests
#7485 [BUG] stop using mergeAndSetValidity for any nested type
#8263 [BUG] Databricks 11.3 - Task failed while writing rows for Delta table - java.lang.Integer cannot be cast to java.lang.Long
#7898 Override canonicalized method to the Expressions
#8254 [BUG] Unable to determine Databricks version in azure Databricks instances
#6967 [BUG] Parquet List corner cases fail to be parsed
#6991 [BUG] Integration test failures in Spark - 3.4 SNAPSHOT build
#7773 [BUG] udf test failed cudf-py 23.04 ENV setup on databricks 11.3 runtime
#7934 [BUG] User app fails with OOM - GpuOutOfCoreSortIterator
#8214 [BUG] Exception when counting rows in an ORC file that has no column names
#8160 [BUG] Arithmetic_ops_test failure for Spark 3.4
#7495 Update GpuDataSource to match the change in Spark 3.4
#8189 [BUG] test_array_element_at_zero_index_fail test failures in Spark 3.4
#8043 [BUG] Host memory leak in SerializedBatchIterator
#8194 [BUG] JVM agent crash intermittently in CI integration test
#6182 [SPARK-39319][CORE][SQL] Make query contexts as a part of SparkThrowable
#7491 [AUDIT][SPARK-41448][SQL] Make consistent MR job IDs in FileBatchWriter and FileFormatWriter
#8149 [BUG] dataproc init script does not fail clearly with newer versions of CUDA
#7624 [BUG] test_parquet_write_encryption_option_fallback failed
#8019 [BUG] Spark-3.4 - Integration test failures due to GpuCreateDataSourceTableAsSelectCommand
#8017 [BUG]Spark-3.4 Integration tests failure due to InsertIntoHadoopFsRelationCommand not running on GPU
#7492 [AUDIT][SPARK-41468][SQL][FOLLOWUP] Handle NamedLambdaVariables in EquivalentExpressions
#6987 [BUG] Unit Test failures in Spark-3.4 SNAPSHOT build
#8171 [BUG] ORC read failure when reading decimals with different precision/scale from write schema
#7216 [BUG] The PCBS tests fail on Spark 340
#8016 [BUG] Spark-3.4 - Integration tests failure due to missing InsertIntoHiveTable operator in GPU
#8166 Databricks Delta defaults to LEGACY for int96RebaseModeInWrite
#8147 [BUG] test_substring_column failed
#8164 [BUG] failed AnsiCastShim build in datasbricks 11.3 runtime
#7757 [BUG] Unit tests failure in AnsiCastOpSuite on Spark-3.4
#7756 [BUG] Unit test failure in AdaptiveQueryExecSuite on Spark-3.4
#8153 [BUG] get-shim-versions-from-dist workflow failing in CI
#7961 [BUG] understand why unspill can throw an OutOfMemoryError and not a RetryOOM
#7755 [BUG] Unit tests failures in WindowFunctionSuite and CostBasedOptimizerSuite on Spark-3.4
#7752 [BUG] Test in CastOpSuite fails on Spark-3.4
#7754 [BUG] unit test nz timestamp fails on Spark-3.4
#7018 [BUG] The unit test sorted partitioned write fails on Spark 3.4
#8015 [BUG] Spark 3.4 - Integration tests failure due to unsupported KnownNullable operator in Window
#7751 [BUG] Unit test Write encrypted ORC fallback fails on Spark-3.4
#8117 [BUG] Compile error in RapidsErrorUtils when building against Spark 3.4.0 release
#5659 [BUG] Minimize false positives when falling back to CPU for end of line/string anchors and newlines
#8012 [BUG] Integration tests failing due to CreateDataSourceTableAsSelectCommand in Spark-3.4
#8061 [BUG] join_test failed in integration tests
#8018 [BUG] Spark-3.4 - Integration test failures in window aggregations for decimal types
#7581 [BUG] INC AFTER CLOSE for ColumnVector during shutdown in the join code

PRs

#8441 Memoizing DataGens in integration tests
#8516 Avoid calling Table.merge with BinaryType columns
#8515 Fix warning about deprecated parquet config
#8427 [Doc] address Spark RAPIDS NVAIE VDR issues [skip ci]
#8486 Move task completion listener registration to after variables are initialized
#8481 Removed spark.rapids.sql.castDecimalToString.enabled and enabled GPU decimal to string by default
#8485 Disable test_read_compressed_hive_text on CDH.
#8488 Adds note on multi-threaded shuffle targetting <= 200 partitions and on TCP keep-alive for UCX [skip ci]
#8414 Add support for computing remainder with Decimal128 operands with more precision on Spark 3.4
#8433 Add regression test for regexp_replace hanging with some inputs
#8477 Fix input binding of grouping expressions for complete aggregations
#8464 Remove NOP Maven javadoc plugin definition
#8402 Bring back UCX 1.14
#8470 Ensure the MT shuffle reader enables/disables with spark.rapids.shuff…
#8462 Fix compressed Hive text read on
#8458 Add check for negative id when creating new MR job id
#8437 Implement the bug fix for SPARK-41448 and shim it for Spark 3.2.4 and Spark 3.3.{2,3}
#8420 Fix reads for GZIP compressed Hive Text.
#8445 Document errors/warns in the logs during catalog shutdown [skip ci]
#8438 Revert "skip test_array_repeat_with_count_scalar for now (#8424)"
#8385 Reduce memory usage in GpuFileFormatDataWriter and GpuDynamicPartitionDataConcurrentWriter
#8304 Support combining small files for multi-threaded ORC reads
#8413 Stop double closing in json scan + skip test
#8430 Update docs for spark.rapids.filecache.checkStale default change [skip ci]
#8424 skip test_array_repeat_with_count_scalar to wait for fix #8409
#8405 Change TimeAdd/Sub subquery tests to use min/max
#8408 Document conventional dist jar layout for single-shim deployments [skip ci]
#8394 Removed "peak device memory" metric
#8378 Use spillable batch with retry in GpuCachedDoublePassWindowIterator
#8392 Update IDEA dev instructions [skip ci]
#8387 Rename inconsinstent profiles in api_validation
#8374 Avoid processing empty batch in ParquetCachedBatchSerializer
#8386 Fix check to do positional indexing in ORC
#8360 use matrix to combine multiple jdk* jobs in maven-verify CI [skip ci]
#8371 Fix V1 column name match is case-sensitive when dropping partition by columns
#8368 Doc Update: Clarify both line anchors ^ and $ for regular expression compatibility [skip ci]
#8377 Avoid a possible race in test_empty_filter
#8354 [DOCS] Updating tools docs in spark-rapids [skip ci]
#8341 Enable CachedBatchWriterSuite.testCompressColBatch
#8264 Make tables spillable by default
#8364 Fix NullPointerException in ORC multithreaded reader where we access context that could be null
#8322 Avoid out of bounds on GpuInMemoryTableScan when reading no columns
#8342 Elimnate javac warnings
#8334 Add in support for filter on empty batch
#8355 Speed up github verify checks [skip ci]
#8356 Enable auto-merge from branch-23.06 to branch-23.08 [skip ci]
#8339 Fix withResource order in GpuGenerateExec
#8340 Stop calling contiguousSplit without splits from GpuSortExec
#8333 Fix GpuTimeAdd handling both input expressions being GpuScalar
#8302 Add support for DecimalType in Remainder for Spark 3.4 and DB 11.3
#8325 Disable test_read_hive_fixed_length_char on Spark 3.4+.
#8327 Enable spark.sql.legacy.parquet.nanosAsLong for Spark 3.2.4
#8328 Fix Hive text file write to deal with CUDF changes
#8309 Fix GpuTopN with offset for multiple batches
#8306 Update code to deal with new retry semantics
#8307 Full ordinal support in GetArrayItem
#8243 Enable retry for Parquet writes
#8295 Fix ORC reader for CHAR(N) columns written from Hive
#8298 Append new authorized user to blossom-ci whitelist [skip ci]
#8276 Fallback to CPU for enableDateTimeParsingFallback configuration
#8296 Fix Multithreaded Readers working with Unity Catalog on Databricks
#8273 Add support for escaped dot in character class in regexp parser
#8266 Add test to confirm correct behavior for decimal average in Spark 3.4
#8291 Fix delta stats tracker conf
#8287 Fix Delta write stats if data schema is missing columns relative to table schema
#8286 Add Tencent cosn:// to default cloud schemes
#8283 Add split and retry support for filter
#8290 Pre-merge docker build stage to support containerd runtime [skip ci]
#8257 Support cuda12 jar's release [skip CI]
#8274 Add a unit test for reordered canonicalized expressions in BinaryComparison
#8265 Small code cleanup for pattern matching on Decimal type
#8255 Enable locals,patvars,privates unused Scalac checks
#8234 JDK17 build support in CI
#8256 Use env var with version files as fallback for IT DBR version
#8239 Add Spark 3.2.4 shim
#8221 [Doc] update getting started guide based on latest databricks env [skip ci]
#8224 Fix misinterpretation of Parquet's legacy ARRAY schemas.
#8241 Update to filecache API changes
#8244 Remove semicolon at the end of the package statement in Scala files
#8245 Remove redundant open of ORC file
#8252 Fix auto merge conflict 8250 [skip ci]
#8170 Update GpuRunningWindowExec to use OOM retry framework
#8218 Update to add 340 build and unit test in premerge and in JDK 11 build
#8232 Add integration tests for inferred schema
#8223 Use SupportsRuntimeV2Filtering in Spark 3.4.0
#8233 cudf-udf integration test against python3.9 [skip ci]
#8226 Offset support for TakeOrderedAndProject
#8237 Use weak keys in executor broadcast plan cache
#8229 Upgrade to jacoco 0.8.8 for JDK 17 support
#8216 Add oom retry handling for GpuGenerate.fixedLenLazyArrayGenerate
#8191 Add in retry-work to GPU OutOfCore Sort
#8228 Partial JDK 17 support
#8227 Adjust defaults for better performance out of the box
#8212 Add file caching
#8179 Fall back to CPU for try_cast in Spark 3.4.0
#8220 Batch install-file executions in a single JVM
#8215 Fix count from ORC files with no column names
#8192 Handle PySparkException in case of literal expressions
#8190 Fix element_at_index_zero integration test by using newer error message from Spark 3.4.0
#8203 Clean up queued batches on task failures in RapidsShuffleThreadedBlockIterator
#8207 Support std aggregation in reduction
#8174 [FEA] support json to struct function
#8195 Bump mockito to 3.12.4
#8193 Increase databricks cluster autotermination to 6.5 hours [skip ci]
#8182 Support STRING order-by columns for RANGE window functions
#8167 Add oom retry handling to GpuGenerateExec.doGenerate path
#8183 Disable asserts for non-empty nulls
#8177 Fix 340 shim of GpuCreateDataSourceTableAsSelectCommand and shim GpuDataSource for 3.4.0
#8159 Verify CPU fallback class when creating HIVE table [Databricks]
#8180 Follow-up for ORC Decimal read failure (#8172)
#8172 Fix ORC decimal read when precision/scale changes
#7227 Fix PCBS integration tests for Spark-3.4
#8175 Restore test_substring_column
#8162 Support Java 17 for packaging
#8169 Fix AnsiCastShim for 330db
#8168 [DOC] Updating profiling/qualification docs for usability improvements [skip ci]
#8144 Add 340 shim for GpuInsertIntoHiveTable
#8143 Add handling for SplitAndRetryOOM in nextCbFromGatherer
#8102 Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0
#8152 Fix Spark-3.4 test failure in AdaptiveQueryExecSuite
#8154 Use repo1.maven.org/maven2 instead of default apache central url
#8150 xfail test_substring_column
#8128 Fix CastOpSuite failures with Spark 3.4
#8145 Fix nz timestamp unit tests
#8146 Set version of slf4j for Spark 3.4.0
#8058 Add retry to BatchByKeyIterator
#8142 Enable ParquetWriterSuite test 'sorted partitioned write' on Spark 3.4.0
#8035 [FEA] support StringTranslate function
#8136 Add GPU support for KnownNullable expression (Spark 3.4.0)
#8096 Add OOM retry handling for existence joins
#8139 Fix auto merge conflict 8138 [skip ci]
#8135 Fix Orc writer test failure with Spark 3.4
#8129 Fix compile error with Spark 3.4.0 release and bump to use 3.4.0 release JAR
#8093 Add cuda12 build support [skip ci]
#8108 Make Arm methods static
#8060 Support repetitions in regexp choice expressions
#8081 Re-enable empty repetition near end-of-line anchor for rlike, regexp_extract and regexp_replace
#8075 Update some integration tests so that they are compatible with Spark 3.4.0
#8063 Update docker to support integration tests against JDK17 [skip ci]
#8047 Enable line/string anchors in choice
#7996 Sub-partitioning supports repartitioning the input data multiple times
#8009 Add in some more retry blocks
#8051 MINOR: Improve assertion error in assert_py4j_exception
#8020 [FEA] Add Spark 3.3.3-SNAPSHOT to shims
#8034 Fix the check for dedicated per-shim files [skip ci]
#7978 Update JNI and private deps version to 23.06.0-SNAPSHOT
#7965 Remove stale references to the pre-shimplify dirs
#7948 Init plugin version 23.06.0-SNAPSHOT

Older Releases

Changelog of older releases can be found at docs/archives