You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
path_filter parameter in pw.io.s3.read and pw.io.minio.read functions. It enables post-filtering of object paths using a wildcard pattern (*, ?), allowing exclusion of paths that pass the main path filter but do not match path_filter.
Input connectors now support backpressure control via max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates.
pw.reducers.count_distinct and pw.reducers.count_distinct_approximate to count the number of distinct elements in a table. The pw.reducers.count_distinct_approximate allows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using the precision parameter.
pw.Table.join (and its variants) now has two additional parameters - left_exactly_once and right_exactly_once. If the elements from a side of a join should be joined exactly once, *_exactly_once parameter of the side can be set to True. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.
Changed
Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
Improved initialization speed of pw.io.s3.read and pw.io.minio.read.
pw.io.s3.read and pw.io.minio.read now limit the number and the total size of objects to be predownloaded.
BREAKING optimized the implementation of pw.reducers.min, pw.reducers.max, pw.reducers.argmin, pw.reducers.argmax, pw.reducers.any reducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
BREAKING optimized the implementation of pw.reducers.sum reducer on float and np.ndarray columns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
Improved precision of pw.reducers.sum on float columns by introducing Neumeier summation.