Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PubsubIO used in batch incorrect batch cutoff size #28011

Open
1 of 15 tasks
slilichenko opened this issue Aug 15, 2023 · 3 comments
Open
1 of 15 tasks

[Bug]: PubsubIO used in batch incorrect batch cutoff size #28011

slilichenko opened this issue Aug 15, 2023 · 3 comments

Comments

@slilichenko
Copy link
Contributor

What happened?

This is due to incorrect initialization of a transform [1] - instead of the max batch size in bytes the max number of batch records is passed. Addition issues with the message validation: Error message here [2] is misleading. It should state that a single record size exceeds maximum batch size, rather than just the number of bytes referenced in the generic Pub/Sub limits.

[1]

[2]

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@liferoad
Copy link
Collaborator

is this same as #27000?

@slilichenko
Copy link
Contributor Author

Yes, I was using 2.48.0 and validated that things work after upgrading to 2.49.0.

Technically, there is still a bug - total message size including attribute data returned by the "validate" method is ignored and instead the message data size is used to calculate the batch cutoff:

PreparePubsubWriteDoFn.validatePubsubMessageSize(message, maxPublishBatchByteSize);

@Abacn Abacn changed the title [Bug]: PubsubIO used in batch pipelines fails to publish messages larger than 100 bytes [Bug]: PubsubIO used in batch incorrect batch cutoff size Jun 5, 2024
@Abacn
Copy link
Contributor

Abacn commented Jun 5, 2024

Per the latest comment, changed the Issue title and put proper priority tag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants