Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify whether BatchSpanProcessor scheduledDelayMillis is between batches or entire processing of queue #849

Closed
anuraaga opened this issue Aug 21, 2020 · 10 comments · Fixed by #3024
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@anuraaga
Copy link
Contributor

Currently we define scheduledDelayMillis, maxQueueSize, and maxExportBatchSize.

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#batching-processor

It's not clear to me whether scheduledDelayMillis is supposed to be time in between exports of chunks of size maxExportBatchSize, or if it's the delay between processing the queue, when processing we export as many batches as we can, and then wait for the delay amount until the next set of batches. The wording in the spec seems closer to the former, but go and java at least seem to implement as the latter.

@anuraaga anuraaga added the spec:trace Related to the specification/trace directory label Aug 21, 2020
@arminru arminru added the area:sdk Related to the SDK label Aug 21, 2020
@andrewhsu andrewhsu added priority:p3 Lowest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Aug 21, 2020
@arminru arminru added release:after-ga Not required before GA release, and not going to work on before GA and removed priority:p3 Lowest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 25, 2020
@anuraaga
Copy link
Contributor Author

Hey - this question came up. Does anyone happen to have any suggestions on how to handle schedule delay?

@Oberon00
Copy link
Member

Oberon00 commented Mar 19, 2021

It's not clear to me whether scheduledDelayMillis is supposed to be time in between exports of chunks of size maxExportBatchSize, or if it's the delay between processing the queue

IMHO it has to be the latter, otherwise you would be severely limiting throughput.

EDIT:
The first time you see more than one batch worth of spans in the queue, then it means that in the last scheduledDelayMillis, more than one batch worth of spans was added. Assuming this was not a temporary spike, that list will only keep growing, as waiting between every batch does little to make it shrink.

@anuraaga
Copy link
Contributor Author

anuraaga commented Mar 19, 2021

@Oberon00 Yeah I think that makes sense, it actually was a silly question. I think what I meant to ask, was this other one, should we eagerly export batches? Currently Java exports as soon as the queue is over maxExportBatchSize regardless of interval - I think doing so could cause BSP to be more active than one might want when at low throughput. At high throughput, I think there isn't a big difference since you'd be continually processing the queue anyways. But there does seem like there could be a middle ground that has degraded performance, not sure though.

Edit: I guess main concern is that with eager exporting, user seems to lose control. If they want fast exports, they could just reduce schedule delay, but there's no way to have slower exports if we default to eager behavior.

@iNikem
Copy link
Contributor

iNikem commented Mar 21, 2021

+1 in favor of eager exporting

@seemk
Copy link
Contributor

seemk commented Jul 14, 2022

I think the wording should be something akin to:

scheduledDelayMillis - the maximum delay interval in milliseconds between two consecutive exports. The default value is 5000.

The current version is a bit ambiguous (or perhaps even understood as a loop where exports happen after scheduledDelayMillis). Most SDKs implement eager batch exporting: Java, Ruby, Swift, Python

Stumbled upon this in the JS SDK, where exporting happens exactly with scheduleDelayMillis. This means with the default configuration (512 spans each 5s) the span throughput is limited to ~100 spans per second under constant load, which is a very low number. All other spans are silently dropped.

Agree with eager exporting 😄

@dyladan
Copy link
Member

dyladan commented Jul 14, 2022

I am also in support of eager exporting. I see 2 obvious ways to do it:

  1. When the scheduledDelayMillis timer is triggered, export as many batches as required to empty the queue
  2. When there are maxExportBatchSize spans in the queue, trigger an export even if it means exporting early

@dyladan
Copy link
Member

dyladan commented Nov 7, 2022

@open-telemetry/technical-committee I think this is quite an important issue that likely affects many SDKs. As @morigs pointed out in the related js issue open-telemetry/opentelemetry-js#3094 the current situation effectively works as a rate-limiter which I believe is not be the intended effect. JS would like to implement a fix for this, but we don't want to do something against the spec.

@yurishkuro
Copy link
Member

imo the export should happen once the desired batch size is reached or the delay timer ticks, whichever is earlier. The former protects against unnecessarily large batches, the latter protects against sitting on the data that trickles in slowly.

@tigrannajaryan
Copy link
Member

imo the export should happen once the desired batch size is reached or the delay timer ticks, whichever is earlier. The former protects against unnecessarily large batches, the latter protects against sitting on the data that trickles in slowly.

This. This is also exactly what we do in the Collector if I remember correctly.

If the spec does not describe this behavior I would argue it is a spec bug.

@reyang
Copy link
Member

reyang commented Nov 9, 2022

There is also ForceFlush which we might want to cover.
In metrics, this is what we tried https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#periodic-exporting-metricreader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants