You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initially for python sdk, we will enable the statecache size from 0 MB to 100 MB. Then there are some improvements that could be made on the statecache. For example,
auto size the cache based upon amount of memory dedicated to docker container
add pipeline options to allow users to control % of memory for cache (with default of 20%) instead of relying on an experiment
What needs to happen?
Initially for python sdk, we will enable the
statecache
size from 0 MB to 100 MB. Then there are some improvements that could be made on the statecache. For example,The Java implementation for the cache is in: https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/Caches.java And most of the caching complexity is within: https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java With the views over these caches doing specific view level operations (e.g. merging old view of data with in-memory updates). Generally understanding the code in https://github.com/apache/beam/tree/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state should provide most answers.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
The text was updated successfully, but these errors were encountered: