Releases: apache/beam
Beam 2.60.0 release
We are happy to present the new 2.60.0 release of Beam.
This release includes both improvements and new functionality.
For more information on changes in 2.60.0, check out the detailed release notes.
Highlights
- Added support for using vLLM in the RunInference transform (Python) (#32528)
- [Managed Iceberg] Added support for streaming writes (#32451)
- [Managed Iceberg] Added auto-sharding for streaming writes (#32612)
- [Managed Iceberg] Added support for writing to dynamic destinations (#32565)
New Features / Improvements
- Dataflow worker can install packages from Google Artifact Registry Python repositories (Python) (#32123).
- Added support for Zstd codec in SerializableAvroCodecFactory (Java) (#32349)
- Added support for using vLLM in the RunInference transform (Python) (#32528)
- Prism release binaries and container bootloaders are now being built with the latest Go 1.23 patch. (#32575)
- Prism
- Prism now supports Bundle Finalization. (#32425)
- Significantly improved performance of Kafka IO reads that enable commitOffsetsInFinalize by removing the data reshuffle from SDF implementation. (#31682).
- Added support for dynamic writing in MqttIO (Java) (#19376)
- Optimized Spark Runner parDo transform evaluator (Java) (#32537)
- [Managed Iceberg] More efficient manifest file writes/commits (#32666)
Breaking Changes
- In Python, assert_that now throws if it is not in a pipeline context instead of silently succeeding (#30771)
- In Python and YAML, ReadFromJson now override the dtype from None to
an explicit False. Most notably, string values like"123"
are preserved
as strings rather than silently coerced (and possibly truncated) to numeric
values. To retain the old behavior, passdtype=True
(or any other value
accepted bypandas.read_json
). - Users of KafkaIO Read transform that enable commitOffsetsInFinalize might encounter pipeline graph compatibility issues when updating the pipeline. To mitigate, set the
updateCompatibilityVersion
option to the SDK version used for the original pipeline, example--updateCompatabilityVersion=2.58.1
Deprecations
- Python 3.8 is reaching EOL and support is being removed in Beam 2.61.0. The 2.60.0 release will warn users
when running on 3.8. (#31192)
Bugfixes
- (Java) Fixed custom delimiter issues in TextIO (#32249, #32251).
- (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly (#32506).
- (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. (#32388).
Known Issues
N/A
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!
Ahmed Abualsaud, Aiden Grossman, Arun Pandian, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, DKPHUONG, Damon Douglass, Danny McCormick, Dip Patel, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Igor Bernstein, Israel Herraiz, Jack McCluskey, Jaehyeon Kim, Jeff Kinard, Jeffrey Kinard, Joey Tran, Kenneth Knowles, Kirill Berezin, Michel Davit, Minbo Bae, Naireen Hussain, Niel Markwick, Nito Buendia, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, Rohit Sinha, Ryan Fu, Sam Whittle, Shunping Huang, Svetak Sundhar, Udaya Chathuranga, Vitaly Terentyev, Vlado Djerek, Yi Hu, Claude van der Merwe, XQ Hu, Martin Trieu, Valentyn Tymofieiev, twosom
Beam 2.59.0 release
We are happy to present the new 2.59.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.59.0, check out the detailed release notes.
Highlights
- Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
- Initial experimental support for using Prism with the Java and Python SDKs
- Prism is presently targeting local testing usage, or other small scale execution.
- For Java, use 'PrismRunner', or 'TestPrismRunner' as an argument to the
--runner
flag. - For Python, use 'PrismRunner' as an argument to the
--runner
flag. - Go already uses Prism as the default local runner.
I/Os
- Improvements to the performance of BigqueryIO when using withPropagateSuccessfulStorageApiWrites(true) method (Java) (#31840).
- [Managed Iceberg] Added support for writing to partitioned tables (#32102)
- Update ClickHouseIO to use the latest version of the ClickHouse JDBC driver (#32228).
- Add ClickHouseIO dedicated User-Agent (#32252).
New Features / Improvements
- BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java) (#28149).
- Go SDK Minimum Go Version updated to 1.21 (#32092).
- [BigQueryIO] Added support for withFormatRecordOnFailureFunction() for STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE methods (Java) (#31354).
- Updated Go protobuf package to new version (Go) (#21515).
- Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
- Adds OrderedListState support for Java SDK via FnApi.
- Initial support for using Prism from the Python and Java SDKs.
Bugfixes
- Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs (#32030).
- Auto-disable broken and meaningless
upload_graph
feature when using Dataflow Runner V2 (#32159). - (Python) Upgraded google-cloud-storage to version 2.18.2 to fix a data corruption issue (#32135).
- (Go) Fix corruption on State API writes. (#32245).
Known Issues
- Prism is under active development and does not yet support all pipelines. See #29650 for progress.
- In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support. - If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.
- In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.59.0 release. Thank you to all contributors!
Ahmed Abualsaud,Ahmet Altay,Andrew Crites,atask-g,Axel Magnuson,Ayush Pandey,Bartosz Zablocki,Chamikara Jayalath,cutiepie-10,Damon,Danny McCormick,dependabot[bot],Eddie Phillips,Francis O'Hara,Hyeonho Kim,Israel Herraiz,Jack McCluskey,Jaehyeon Kim,Jan Lukavský,Jeff Kinard,Jeffrey Kinard,jonathan-lemos,jrmccluskey,Kirill Berezin,Kiruphasankaran Nataraj,lahariguduru,liferoad,lostluck,Maciej Szwaja,Manit Gupta,Mark Zitnik,martin trieu,Naireen Hussain,Prerit Chandok,Radosław Stankiewicz,Rebecca Szper,Robert Bradshaw,Robert Burke,ron-gal,Sam Whittle,Sergei Lilichenko,Shunping Huang,Svetak Sundhar,Thiago Nunes,Timothy Itodo,tvalentyn,twosom,Vatsal,Vitaly Terentyev,Vlado Djerek,Yifan Ye,Yi Hu
Beam 2.58.1 release
We are happy to present the new 2.58.1 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
New Features / Improvements
- Fixed issue where KafkaIO Records read with
ReadFromKafkaViaSDF
are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, (#32196)
Known Issues
- Large Dataflow graphs using runner v2, or pipelines explicitly enabling the
upload_graph
experiment, will fail at construction time (#32159). - Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.58.1 release. Thank you to all contributors!
Danny McCormick
Sam Whittle
Beam 2.58.0 release
We are happy to present the new 2.58.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information about changes in 2.58.0, check out the detailed release notes.
I/Os
New Features / Improvements
- Multiple RunInference instances can now share the same model instance by setting the model_identifier parameter (Python) (#31665).
- Added options to control the number of Storage API multiplexing connections (#31721)
- [BigQueryIO] Better handling for batch Storage Write API when it hits AppendRows throughput quota (#31837)
- [IcebergIO] All specified catalog properties are passed through to the connector (#31726)
- Removed a third-party LGPL dependency from the Go SDK (#31765).
- Support for
MapState
andSetState
when using Dataflow Runner v1 with Streaming Engine (Java) ([#18200])
Breaking Changes
- [IcebergIO]
IcebergCatalogConfig
was changed to support specifying catalog properties in a key-store fashion (#31726) - [SpannerIO] Added validation that query and table cannot be specified at the same time for
SpannerIO.read()
. PreviouslywithQuery
overrideswithTable
, if set (#24956).
Bug fixes
- [BigQueryIO] Fixed a bug in batch Storage Write API that frequently exhausted concurrent connections quota (#31710)
List of Contributors
According to git shortlog, the following people contributed to the 2.58.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexandre Moueddene
Alexey Romanenko
Andrew Crites
Bartosz Zablocki
Celeste Zeng
Chamikara Jayalath
Clay Johnson
Damon Douglass
Danny McCormick
Dilnaz Amanzholova
Florian Bernard
Francis O'Hara
George Ma
Israel Herraiz
Jack McCluskey
Jaehyeon Kim
James Roseman
Kenneth Knowles
Maciej Szwaja
Michel Davit
Minh Son Nguyen
Naireen
Niel Markwick
Oliver Cardoza
Robert Bradshaw
Robert Burke
Rohit Sinha
S. Veyrié
Sam Whittle
Shunping Huang
Svetak Sundhar
TongruiLi
Tony Tang
Valentyn Tymofieiev
Vitaly Terentyev
Yi Hu
Beam 2.57.0 Release
We are happy to present the new 2.57.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.57.0, check out the detailed release notes.
Highlights
I/Os
- Ensure that BigtableIO closes the reader streams (#31477).
New Features / Improvements
- Added Feast feature store handler for enrichment transform (Python) (#30957).
- BigQuery per-worker metrics are reported by default for Streaming Dataflow Jobs (Java) (#31015)
- Adds
inMemory()
variant of Java List and Map side inputs for more efficient lookups when the entire side input fits into memory. - Beam YAML now supports the jinja templating syntax.
Template variables can be passed with the (json-formatted)--jinja_variables
flag. - DataFrame API now supports pandas 2.1.x and adds 12 more string functions for Series.(#31185).
- Added BigQuery handler for enrichment transform (Python) (#31295)
- Disable soft delete policy when creating the default bucket for a project (Java) (#31324).
- Added
DoFn.SetupContextParam
andDoFn.BundleContextParam
which can be used
as a pythonDoFn.process
,Map
, orFlatMap
parameter to invoke a context
manager per DoFn setup or bundle (analogous to usingsetup
/teardown
orstart_bundle
/finish_bundle
respectively.) - Go SDK Prism Runner
- Pre-built Prism binaries are now part of the release and are available via the Github release page. (#29697).
- Some pipelines will work on Java and Python, but this is in part to prepare for real runner wrappers in 2.58.0
- ProcessingTime is now handled synthetically with TestStream pipelines and Non-TestStream pipelines, for fast test pipeline execution by default. (#30083).
- Prism does NOT yet support "real time" execution for this release.
- Improve processing for large elements to reduce the chances for exceeding 2GB protobuf limits (Python)([https://github.com//issues/31607]).
Breaking Changes
- Java's View.asList() side inputs are now optimized for iterating rather than
indexing when in the global window.
This new implementation still supports all (immutable) List methods as before,
but some of the random access methods like get() and size() will be slower.
To use the old implementation one can use View.asList().withRandomAccess(). - SchemaTransforms implemented with TypedSchemaTransformProvider now produce a
configuration Schema with snake_case naming convention
(#31374). This will make the following
cases problematic:- Running a pre-2.57.0 remote SDK pipeline containing a 2.57.0+ Java SchemaTransform,
and vice versa: - Running a 2.57.0+ remote SDK pipeline containing a pre-2.57.0 Java SchemaTransform
- All direct uses of Python's SchemaAwareExternalTransform
should be updated to use new snake_case parameter names.
- Running a pre-2.57.0 remote SDK pipeline containing a 2.57.0+ Java SchemaTransform,
- Upgraded Jackson Databind to 2.15.4 (Java) (#26743).
jackson-2.15 has known breaking changes. An important one is it imposed a buffer limit for parser.
If your custom PTransform/DoFn are affected, refer to #31580 for mitigation.
For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md
List of Contributors
According to git shortlog, the following people contributed to the 2.57.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexey Romanenko
Andrey Devyatkin
Anody Zhang
Arvind Ram
Ben Konz
Bruno Volpato
Celeste Zeng
Chamikara Jayalath
Claire McGinty
Colm O hEigeartaigh
Damon
Danny McCormick
Evan Galpin
Ferran Fernández Garrido
Florent Biville
Jack Dingilian
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Jeffrey Kinard
John Casey
Justin Uang
Kenneth Knowles
Kevin Zhou
Liam Miller-Cushon
Maarten Vercruysse
Maciej Szwaja
Maja Kontrec Rönn
Marc hurabielle
Martin Trieu
Mattie Fu
Min Zhu
Naireen Hussain
Nick Anikin
Pablo Rodriguez Defino
Paul King
Priyans Desai
Radosław Stankiewicz
Rebecca Szper
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Rodrigo Bozzolo
RyuSA
Sam Rohde
Sam Whittle
Sergei Lilichenko
Shahar Epstein
Shunping Huang
Svetak Sundhar
Tomo Suzuki
Tony Tang
Valentyn Tymofieiev
Vincent Stollenwerk
Vineet Kumar
Vitaly Terentyev
Vlado Djerek
XQ Hu
Yi Hu
akashorabek
bzablocki
kberezin
Beam 2.56.0 release
We are happy to present the new 2.56.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.56.0, check out the detailed release notes.
Highlights
- Added FlinkRunner for Flink 1.17, removed support for Flink 1.12 and 1.13. Previous version of Pipeline running on Flink 1.16 and below can be upgraded to 1.17, if the Pipeline is first updated to Beam 2.56.0 with the same Flink version. After Pipeline runs with Beam 2.56.0, it should be possible to upgrade to FlinkRunner with Flink 1.17. (#29939)
- New Managed I/O Java API (#30830).
- New Ordered Processing PTransform added for processing order-sensitive stateful data (#30735).
I/Os
- Upgraded Avro version to 1.11.3, kafka-avro-serializer and kafka-schema-registry-client versions to 7.6.0 (Java) (#30638).
The newer Avro package is known to have breaking changes. If you are affected, you can keep pinned to older Avro versions which are also tested with Beam. - Iceberg read/write support is available through the new Managed I/O Java API (#30830).
New Features / Improvements
- Profiling of Cythonized code has been disabled by default. This might improve performance for some Python pipelines (#30938).
- Bigtable enrichment handler now accepts a custom function to build a composite row key. (Python) (#30974).
Breaking Changes
- Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary (#30870).
- Python Dataflow users no longer need to manually specify --streaming for pipelines using unbounded sources such as ReadFromPubSub.
Bugfixes
- Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) (#30679).
- Fixed logging issue that caused silecing the pip output when installing of dependencies provided in
--requirements_file
(Python).
List of Contributors
According to git shortlog, the following people contributed to the 2.56.0 release. Thank you to all contributors!
Abacn
Ahmed Abualsaud
Andrei Gurau
Andrey Devyatkin
Aravind Pedapudi
Arun Pandian
Arvind Ram
Bartosz Zablocki
Brachi Packter
Byron Ellis
Chamikara Jayalath
Clement DAL PALU
Damon
Danny McCormick
Daria Bezkorovaina
Dip Patel
Evan Burrell
Hai Joey Tran
Jack McCluskey
Jan Lukavský
JayajP
Jeff Kinard
Julien Tournay
Kenneth Knowles
Luís Bianchin
Maciej Szwaja
Melody Shen
Oleh Borysevych
Pablo Estrada
Rebecca Szper
Ritesh Ghorse
Robert Bradshaw
Sam Whittle
Sergei Lilichenko
Shahar Epstein
Shunping Huang
Svetak Sundhar
Timothy Itodo
Veronica Wasson
Vitaly Terentyev
Vlado Djerek
Yi Hu
akashorabek
bzablocki
clmccart
damccorm
dependabot[bot]
dmitryor
github-actions[bot]
liferoad
martin trieu
tvalentyn
xianhualiu
Beam 2.55.1 release
Bugfixes
- Fixed issue that broke WriteToJson in languages other than Java (X-lang) (#30776).
Beam 2.55.0 release
We are happy to present the new 2.55.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.55.0, check out the detailed release notes.
Highlights
- The Python SDK will now include automatically generated wrappers for external Java transforms! (#29834)
I/Os
- Added support for handling bad records to BigQueryIO (#30081).
- Full Support for Storage Read and Write APIs
- Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported)
- No Support for Extract or Streaming Inserts
- Added support for handling bad records to PubSubIO (#30372).
- Support is not available for handling schema mismatches, and enabling error handling for writing to Pub/Sub topics with schemas is not recommended
--enableBundling
pipeline option for BigQueryIO DIRECT_READ is replaced by--enableStorageReadApiV2
. Both were considered experimental and subject to change (Java) (#26354).
New Features / Improvements
- Allow writing clustered and not time-partitioned BigQuery tables (Java) (#30094).
- Redis cache support added to RequestResponseIO and Enrichment transform (Python) (#30307)
- Merged
sdks/java/fn-execution
andrunners/core-construction-java
into the main SDK. These artifacts were never meant for users, but noting
that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality. - Added Vertex AI Feature Store handler for Enrichment transform (Python) (#30388)
Breaking Changes
- Arrow version was bumped to 15.0.0 from 5.0.0 (#30181).
- Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes).
- The issue stems from distroless containers lacking additional tools, which current custom container processes may rely on.
- See https://beam.apache.org/documentation/runtime/environments/#from-scratch-go for instructions on building and using a custom container.
- Python SDK has changed the default value for the
--max_cache_memory_usage_mb
pipeline option from 100 to 0. This option was first introduced in the 2.52.0 SDK version. This change restores the behavior of the 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. (#30360).
Deprecations
- N/A
Bug fixes
- Fixed
SpannerIO.readChangeStream
to support propagating credentials from pipeline options
to thegetDialect
calls for authenticating with Spanner (Java) (#30361). - Reduced the number of HTTP requests in GCSIO function calls (Python) (#30205)
Security Fixes
- Go SDK base container image moved to distroless/base-nossl-debian12, reducing vulnerable container surface to kernel and glibc (#30011).
Known Issues
- In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 (#30679).
List of Contributors
According to git shortlog, the following people contributed to the {$RELEASE_VERSION} release. Thank you to all contributors!
Ahmed Abualsaud
Anand Inguva
Andrew Crites
Andrey Devyatkin
Arun Pandian
Arvind Ram
Chamikara Jayalath
Chris Gray
Claire McGinty
Damon Douglas
Dan Ellis
Danny McCormick
Daria Bezkorovaina
Dima I
Edward Cui
Ferran Fernández Garrido
GStravinsky
Jan Lukavský
Jason Mitchell
JayajP
Jeff Kinard
Jeffrey Kinard
Kenneth Knowles
Mattie Fu
Michel Davit
Oleh Borysevych
Ritesh Ghorse
Ritesh Tarway
Robert Bradshaw
Robert Burke
Sam Whittle
Scott Strong
Shunping Huang
Steven van Rossum
Svetak Sundhar
Talat UYARER
Ukjae Jeong (Jay)
Vitaly Terentyev
Vlado Djerek
Yi Hu
akashorabek
case-k
clmccart
dengwe1
dhruvdua
hardshah
johnjcasey
liferoad
martin trieu
tvalentyn
Beam 2.54.0 release
We are happy to present the new 2.54.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.54.0, check out the detailed release notes.
Highlights
- Enrichment Transform along with GCP BigTable handler added to Python SDK (#30001).
- Beam Java Batch pipelines run on Google Cloud Dataflow will default to the Portable (Runner V2)[https://cloud.google.com/dataflow/docs/runner-v2] starting with this version. (All other languages are already on Runner V2.)
- This change is still rolling out to the Dataflow service, see (Runner V2 documentation)[https://cloud.google.com/dataflow/docs/runner-v2] for how to enable or disable it intentionally.
I/Os
- Added support for writing to BigQuery dynamic destinations with Python's Storage Write API (#30045)
- Adding support for Tuples DataType in ClickHouse (Java) (#29715).
- Added support for handling bad records to FileIO, TextIO, AvroIO (#29670).
- Added support for handling bad records to BigtableIO (#29885).
New Features / Improvements
- Enrichment Transform along with GCP BigTable handler added to Python SDK (#30001).
Breaking Changes
- N/A
Deprecations
- N/A
Bugfixes
- Fixed a memory leak affecting some Go SDK since 2.46.0. (#28142)
Security Fixes
- N/A
Known Issues
- N/A
List of Contributors
According to git shortlog, the following people contributed to the 2.54.0 release. Thank you to all contributors!
Ahmed Abualsaud
Alexey Romanenko
Anand Inguva
Andrew Crites
Arun Pandian
Bruno Volpato
caneff
Chamikara Jayalath
Changyu Li
Cheskel Twersky
Claire McGinty
clmccart
Damon
Danny McCormick
dependabot[bot]
Edward Cheng
Ferran Fernández Garrido
Hai Joey Tran
hugo-syn
Issac
Jack McCluskey
Jan Lukavský
JayajP
Jeffrey Kinard
Jerry Wang
Jing
Joey Tran
johnjcasey
Kenneth Knowles
Knut Olav Løite
liferoad
Marc
Mark Zitnik
martin trieu
Mattie Fu
Naireen Hussain
Neeraj Bansal
Niel Markwick
Oleh Borysevych
pablo rodriguez defino
Rebecca Szper
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Sam Whittle
Shunping Huang
Svetak Sundhar
S. Veyrié
Talat UYARER
tvalentyn
Vlado Djerek
Yi Hu
Zechen Jian
Beam 2.53.0 release
We are happy to present the new 2.53.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.53.0, check out the detailed release notes.
Highlights
- Python streaming users that use 2.47.0 and newer versions of Beam should update to version 2.53.0, which fixes a known issue: (#27330).
I/Os
- TextIO now supports skipping multiple header lines (Java) (#17990).
- Python GCSIO is now implemented with GCP GCS Client instead of apitools (#25676)
- Adding support for LowCardinality DataType in ClickHouse (Java) (#29533).
- Added support for handling bad records to KafkaIO (Java) (#29546)
- Add support for generating text embeddings in MLTransform for Vertex AI and Hugging Face Hub models.(#29564)
- NATS IO connector added (Go) (#29000).
New Features / Improvements
- The Python SDK now type checks
collections.abc.Collections
types properly. Some type hints that were erroneously allowed by the SDK may now fail. (#29272) - Running multi-language pipelines locally no longer requires Docker.
Instead, the same (generally auto-started) subprocess used to perform the
expansion can also be used as the cross-language worker. - Framework for adding Error Handlers to composite transforms added in Java (#29164).
- Python 3.11 images now include google-cloud-profiler (#29561).
Breaking Changes
- Upgraded to go 1.21.5 to build, fixing CVE-2023-45285 and CVE-2023-39326
Deprecations
- Euphoria DSL is deprecated and will be removed in a future release (not before 2.56.0) (#29451)
Bugfixes
- (Python) Fixed sporadic crashes in streaming pipelines that affected some users of 2.47.0 and newer SDKs (#27330).
- (Python) Fixed a bug that caused MLTransform to drop identical elements in the output PCollection (#29600).
List of Contributors
According to git shortlog, the following people contributed to the 2.53.0 release. Thank you to all contributors!
Ahmed Abualsaud
Ahmet Altay
Alexey Romanenko
Anand Inguva
Arun Pandian
Balázs Németh
Bruno Volpato
Byron Ellis
Calvin Swenson Jr
Chamikara Jayalath
Clay Johnson
Damon
Danny McCormick
Ferran Fernández Garrido
Georgii Zemlianyi
Israel Herraiz
Jack McCluskey
Jacob Tomlinson
Jan Lukavský
JayajP
Jeffrey Kinard
Johanna Öjeling
Julian Braha
Julien Tournay
Kenneth Knowles
Lawrence Qiu
Mark Zitnik
Mattie Fu
Michel Davit
Mike Williamson
Naireen
Naireen Hussain
Niel Markwick
Pablo Estrada
Radosław Stankiewicz
Rebecca Szper
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Sam Rohde
Sam Whittle
Shunping Huang
Svetak Sundhar
Talat UYARER
Tom Stepp
Tony Tang
Vlado Djerek
Yi Hu
Zechen Jiang
clmccart
damccorm
darshan-sj
gabry.wu
johnjcasey
liferoad
lrakla
martin trieu
tvalentyn