Beam 2.52.0 release
We are happy to present the new 2.52.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.52.0, check out the detailed release notes.
Highlights
- Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been finally removed from Java SDK "core" package.
Please, usebeam-sdks-java-extensions-avro
instead. This will allow to easily update Avro version in user code without
potential breaking changes in Beam "core" since the Beam Avro extension already supports the latest Avro versions and
should handle this. (#25252). - Publishing Java 21 SDK container images now supported as part of Apache Beam release process. (#28120)
- Direct Runner and Dataflow Runner support running pipelines on Java21 (experimental until tests fully setup). For other runners (Flink, Spark, Samza, etc) support status depend on runner projects.
New Features / Improvements
- Add
UseDataStreamForBatch
pipeline option to the Flink runner. When it is set to true, Flink runner will run batch
jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed
using the DataSet API. upload_graph
as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK (PR#28621.- state amd side input cache has been enabled to a default of 100 MB. Use
--max_cache_memory_usage_mb=X
to provide cache size for the user state API and side inputs. (Python) (#28770). - Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the README.
Breaking Changes
org.apache.beam.sdk.io.CountingSource.CounterMark
uses customCounterMarkCoder
as a default coder since all Avro-dependent
classes finally moved toextensions/avro
. In case if it's still required to useAvroCoder
forCounterMark
, then,
as a workaround, a copy of "old"CountingSource
class should be placed into a project code and used directly
(#25252).- Renamed
host
tofirestoreHost
inFirestoreOptions
to avoid potential conflict of command line arguments (Java) (#29201).
Bugfixes
- Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's BigtableIO.BigtableSource when you have more cores than bytes to read (Java) #28793.
watch_file_pattern
arg of the RunInference arg had no effect prior to 2.52.0. To use the behavior of argwatch_file_pattern
prior to 2.52.0, follow the documentation at https://beam.apache.org/documentation/ml/side-input-updates/ and useWatchFilePattern
PTransform as a SideInput. (#28948)MLTransform
doesn't output artifacts such as min, max and quantiles. Instead,MLTransform
will add a feature to output these artifacts as human readable format - #29017. For now, to use the artifacts such as min and max that were produced by the earilerMLTransform
, useread_artifact_location
ofMLTransform
, which reads artifacts that were produced earlier in a differentMLTransform
(#29016)- Fixed a memory leak, which affected some long-running Python pipelines: #28246.
Security Fixes
- Fixed CVE-2023-39325 (Java/Python/Go) (#29118).
- Mitigated CVE-2023-47248 (Python) #29392.
List of Contributors
According to git shortlog, the following people contributed to the 2.52.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Aleksandr Dudko
Alexey Romanenko
Anand Inguva
Andrei Gurau
Andrey Devyatkin
BjornPrime
Bruno Volpato
Bulat
Chamikara Jayalath
Damon
Danny McCormick
Devansh Modi
Dominik Dębowczyk
Ferran Fernández Garrido
Hai Joey Tran
Israel Herraiz
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Jeffrey Kinard
Jiangjie Qin
Jing
Joar Wandborg
Johanna Öjeling
Julien Tournay
Kanishk Karanawat
Kenneth Knowles
Kerry Donny-Clark
Luís Bianchin
Minbo Bae
Pranav Bhandari
Rebecca Szper
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
RyuSA
Shunping Huang
Steven van Rossum
Svetak Sundhar
Tony Tang
Vitaly Terentyev
Vivek Sumanth
Vlado Djerek
Yi Hu
aku019
brucearctor
caneff
damccorm
ddebowczyk92
dependabot[bot]
dpcollins-google
edman124
gabry.wu
illoise
johnjcasey
jonathan-lemos
kennknowles
liferoad
magicgoody
martin trieu
nancyxu123
pablo rodriguez defino
tvalentyn