Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper][batcher] User interface for controling batch measurement #12303

Open
sfc-gh-sili opened this issue Feb 5, 2025 · 4 comments

Comments

@sfc-gh-sili
Copy link
Contributor

sfc-gh-sili commented Feb 5, 2025

As part of the effort to support serialized byte based batching (#3262), we need to provide a way to configure the batch size in a couple of options, including item count, request count, and serialized byte size.

To avoid confusion, we should ensure that the Batching Config API uses the same sizer for max size and min size, and should align with the size limit in Queueing Config API (see discussion #9462)

Option 1 (expanded from discussion in #12154)

type BatchConfig struct {
	...
	SizeConfig
}
type BatchSizeConfig struct {
	Sizer SizerType `mapstructure:"sizer"`
	MinSize int `mapstructure:"mix_size"`
	MaxSize int `mapstructure:"max_size"`
}
type QueueConfig struct {
	...
	Sizer SizerType `mapstructure:"sizer"`
        Size `mapstructure:"queue_size"`
}
type SizerType struct {
       sizerName string
}

In addition to the above definition, we will

  • Use marshaler to ensure check that sizer is one of requests, items, or bytes

I like this option because this implicitly enforces the same sizer for min size and max size.

Option 1.a

  sizer: kbytes
  max_size: 100

Option 1.b

  sizer: bytes
  max_size: 100k
  sizer: items
  max_size: 1k

Option 1.c

  sizer: bytes
  max_size: 1e5

TODO: try

  sizer: items
  max_size: 100_000

Option 2 (derived from the most voted in #9462)

type MaxSizeConfig struct {
	MaxSizeItems int `mapstructure:"max_size_items"`
	MaxSizeBytes int `mapstructure:"max_size_bytes"`
	MaxSizeBytes int `mapstructure:"max_size_kib"`
	MaxSizeBytes int `mapstructure:"max_size_mib"`
}
type MinSizeConfig struct {
	...
}

Validation will check that only one of the above fields are set + min size config and max size config uses the same sizer.


Personally I would vote for option 1.b, because each enum type sizer corresponds to its merge-splitting logic, while max_size and min_size is fully in charge of expressing the largeness.

@open-telemetry/collector-approvers looking forward to hear some thoughts.

@ms-jcorley
Copy link
Contributor

Any reason we are limiting users to only one mechanism? Why not have separate fields for both (items, byte size)
Also I think the bytes, KB, MB limits are redundant. I'd suggest just going with Bytes or KB.
Just my $0.02.

@sfc-gh-sili
Copy link
Contributor Author

sfc-gh-sili commented Feb 6, 2025

@ms-jcorley Thanks for the input!

Why not have separate fields for both (items, byte size)

I think people use byte-size based batching to keep the payload size under outgoing protocol's limit; item-count based batching is more of a compromise because it is less accurate but has lower resource usage. What would be the use case to specify both limit?

As for the unit, I agree that MB would be redundant for batching, but will be useful for queueing.

@dmitryax
Copy link
Member

dmitryax commented Feb 7, 2025

Thanks, @sfc-gh-sili, for bringing this up.

I lean towards the Option 1. However, I suggest some modifications. We don't need the whole SizeConfig in the queue. min_size doesn't make sense there. We can keep queue_size, and just add another field sizer. It won't require any breaking changes to the configuration interface of the queue #9462. We will have different default values for the sizer for the queue (requests) and the batcher (items), but that should be ok. Moving the queue away from requests would make it less performant by default, which is rather undesirable.

Yes, it would prevent us from being able to batch with more than one sizer, but we are not solving that problem right now. I believe that would significantly complicate the batching implementation. There is an initiative to bring exporterhelper to connectors. Once we have that, this ask would be addressed with a bit more complicated pipeline while keeping the batcher simple.

@ms-jcorley
Copy link
Contributor

Now that I understand the background a bit more, I agree. Option 1 with just an added sizer field would be my vote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants