Clarify whether BatchSpanProcessor scheduledDelayMillis is between batches or entire processing of queue #849

anuraaga · 2020-08-21T04:56:59Z

Currently we define scheduledDelayMillis, maxQueueSize, and maxExportBatchSize.

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#batching-processor

It's not clear to me whether scheduledDelayMillis is supposed to be time in between exports of chunks of size maxExportBatchSize, or if it's the delay between processing the queue, when processing we export as many batches as we can, and then wait for the delay amount until the next set of batches. The wording in the spec seems closer to the former, but go and java at least seem to implement as the latter.

The text was updated successfully, but these errors were encountered:

anuraaga · 2021-03-15T23:27:33Z

Hey - this question came up. Does anyone happen to have any suggestions on how to handle schedule delay?

Oberon00 · 2021-03-19T08:07:22Z

It's not clear to me whether scheduledDelayMillis is supposed to be time in between exports of chunks of size maxExportBatchSize, or if it's the delay between processing the queue

IMHO it has to be the latter, otherwise you would be severely limiting throughput.

EDIT:
The first time you see more than one batch worth of spans in the queue, then it means that in the last scheduledDelayMillis, more than one batch worth of spans was added. Assuming this was not a temporary spike, that list will only keep growing, as waiting between every batch does little to make it shrink.

anuraaga · 2021-03-19T08:30:46Z

@Oberon00 Yeah I think that makes sense, it actually was a silly question. I think what I meant to ask, was this other one, should we eagerly export batches? Currently Java exports as soon as the queue is over maxExportBatchSize regardless of interval - I think doing so could cause BSP to be more active than one might want when at low throughput. At high throughput, I think there isn't a big difference since you'd be continually processing the queue anyways. But there does seem like there could be a middle ground that has degraded performance, not sure though.

Edit: I guess main concern is that with eager exporting, user seems to lose control. If they want fast exports, they could just reduce schedule delay, but there's no way to have slower exports if we default to eager behavior.

iNikem · 2021-03-21T09:55:31Z

+1 in favor of eager exporting

seemk · 2022-07-14T12:47:01Z

I think the wording should be something akin to:

scheduledDelayMillis - the maximum delay interval in milliseconds between two consecutive exports. The default value is 5000.

The current version is a bit ambiguous (or perhaps even understood as a loop where exports happen after scheduledDelayMillis). Most SDKs implement eager batch exporting: Java, Ruby, Swift, Python

Stumbled upon this in the JS SDK, where exporting happens exactly with scheduleDelayMillis. This means with the default configuration (512 spans each 5s) the span throughput is limited to ~100 spans per second under constant load, which is a very low number. All other spans are silently dropped.

Agree with eager exporting 😄

dyladan · 2022-07-14T13:06:57Z

I am also in support of eager exporting. I see 2 obvious ways to do it:

When the scheduledDelayMillis timer is triggered, export as many batches as required to empty the queue
When there are maxExportBatchSize spans in the queue, trigger an export even if it means exporting early

dyladan · 2022-11-07T14:49:23Z

@open-telemetry/technical-committee I think this is quite an important issue that likely affects many SDKs. As @morigs pointed out in the related js issue open-telemetry/opentelemetry-js#3094 the current situation effectively works as a rate-limiter which I believe is not be the intended effect. JS would like to implement a fix for this, but we don't want to do something against the spec.

yurishkuro · 2022-11-07T15:18:14Z

imo the export should happen once the desired batch size is reached or the delay timer ticks, whichever is earlier. The former protects against unnecessarily large batches, the latter protects against sitting on the data that trickles in slowly.

tigrannajaryan · 2022-11-08T23:52:10Z

imo the export should happen once the desired batch size is reached or the delay timer ticks, whichever is earlier. The former protects against unnecessarily large batches, the latter protects against sitting on the data that trickles in slowly.

This. This is also exactly what we do in the Collector if I remember correctly.

If the spec does not describe this behavior I would argue it is a spec bug.

reyang · 2022-11-09T00:10:58Z

There is also ForceFlush which we might want to cover.
In metrics, this is what we tried https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#periodic-exporting-metricreader.

anuraaga added the spec:trace Related to the specification/trace directory label Aug 21, 2020

anuraaga mentioned this issue Aug 21, 2020

Make sure forceFlush / shutdown can have callers wait for them to be done by returning CompletableResultCode. open-telemetry/opentelemetry-java#1571

Merged

arminru added the area:sdk Related to the SDK label Aug 21, 2020

andrewhsu added priority:p3 Lowest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Aug 21, 2020

arminru added release:after-ga Not required before GA release, and not going to work on before GA and removed priority:p3 Lowest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 25, 2020

anuraaga mentioned this issue Mar 15, 2021

Optimize batch span processor open-telemetry/opentelemetry-java#2983

Merged

seemk mentioned this issue Jul 14, 2022

Eager exporting for BatchSpanProcessor open-telemetry/opentelemetry-js#3094

Closed

2 tasks

This was referenced Dec 7, 2022

fix(sdk-trace-base): eager exporting for batch span processor open-telemetry/opentelemetry-js#3458

Closed

Eager exporting batch span processor #3024

Merged

reyang closed this as completed in #3024 Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify whether BatchSpanProcessor scheduledDelayMillis is between batches or entire processing of queue #849

Clarify whether BatchSpanProcessor scheduledDelayMillis is between batches or entire processing of queue #849

anuraaga commented Aug 21, 2020

anuraaga commented Mar 15, 2021

Oberon00 commented Mar 19, 2021 •

edited

Loading

anuraaga commented Mar 19, 2021 •

edited

Loading

iNikem commented Mar 21, 2021

seemk commented Jul 14, 2022

dyladan commented Jul 14, 2022

dyladan commented Nov 7, 2022

yurishkuro commented Nov 7, 2022

tigrannajaryan commented Nov 8, 2022

reyang commented Nov 9, 2022 •

edited

Loading

Clarify whether BatchSpanProcessor scheduledDelayMillis is between batches or entire processing of queue #849

Clarify whether BatchSpanProcessor scheduledDelayMillis is between batches or entire processing of queue #849

Comments

anuraaga commented Aug 21, 2020

anuraaga commented Mar 15, 2021

Oberon00 commented Mar 19, 2021 • edited Loading

anuraaga commented Mar 19, 2021 • edited Loading

iNikem commented Mar 21, 2021

seemk commented Jul 14, 2022

dyladan commented Jul 14, 2022

dyladan commented Nov 7, 2022

yurishkuro commented Nov 7, 2022

tigrannajaryan commented Nov 8, 2022

reyang commented Nov 9, 2022 • edited Loading

Oberon00 commented Mar 19, 2021 •

edited

Loading

anuraaga commented Mar 19, 2021 •

edited

Loading

reyang commented Nov 9, 2022 •

edited

Loading