Optimize batch span processor #2983

sbandadd · 2021-03-04T20:42:23Z

Description:
Batch span processor currently is aggressive in the sense that any new spans are sent to the exporter,
this involves lots of overhead from signaling under heavy load and overhead from constant polling by exporter thread
under less load. This PR makes exporter thread wait for maxExportBatchSize to avoid busy polling of the queue.

Benchmark results

BatchSpanProcessorMultiThreadBenchmark.java result

BatchSpanProcessorCpuBenchmark.java result

jkwatson · 2021-03-04T20:45:06Z

sdk/trace/build.gradle.kts

@@ -45,6 +45,7 @@ dependencies {
    jmh("io.grpc:grpc-api")
    jmh("io.grpc:grpc-netty-shaded")
    jmh("org.testcontainers:testcontainers") // testContainer for OTLP collector
+    implementation("org.jctools:jctools-core:3.2.0")


We do not want to take additional external dependencies like this for our SDK.

If we want to use this, we'll need to shade it into our project, rather than have it be exposed as a transitive dependency.

jkwatson · 2021-03-04T20:52:22Z

Can you include the benchmarks, before and after in this PR description, rather than just in the middle of a very long discussion on the related issue?

bogdandrutu · 2021-03-04T22:30:35Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

 import java.util.logging.Level;
 import java.util.logging.Logger;
+import org.jctools.queues.MpscArrayQueue;


Is this better than disruptor? Can we run the same benchmark for https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/async-processor ?

@bogdandrutu Is there an existing jmh benchmark with DisruptorAsyncSpanProcessor?

Unfortunately not, but I think you can copy-paste the same one that you have and only initialize that SpanProcessor.

jkwatson · 2021-03-04T22:52:18Z

Can you also include the BatchSpanProcessorBenchmark before & after as well?

sbandadd · 2021-03-04T23:56:42Z

Can you also include the BatchSpanProcessorBenchmark before & after as well?

This is something I couldn't run locally (using docker-desktop on mac). I might be missing some setup required for jmh. Can you run this through the BatchSpanProcessorBenchmark and post the results?

anuraaga · 2021-03-05T00:30:27Z

dependencyManagement/build.gradle.kts

@@ -92,6 +92,7 @@ val DEPENDENCIES = listOf(
        "org.awaitility:awaitility:4.0.3",
        "org.codehaus.mojo:animal-sniffer-annotations:1.20",
        "org.curioswitch.curiostack:protobuf-jackson:1.2.0",
+        "org.jctools:jctools-core:3.2.0",


From looking the benchmarks, I think making the general change of signaling instead of polling makes sense to me. Let's not add this dependency in this PR though, we'll need to figure out how to vendor in only the MpscQueue to keep the size down if we find it to be significant (it seems like we might be but let's think about it separately)

Yeah, lets get the MpscQueue in a different PR. @anuraaga Any help would be great !

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

bogdandrutu · 2021-03-05T03:10:40Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

+      if (flushRequested.compareAndSet(null, flushResult)) {
+        lock.lock();
+        try {
+          needExport.signal();
+        } finally {
+          lock.unlock();
+        }
+      }


now that you grab the lock, I think it will be a big simplification if the flushRequested is protected by the lock.

could you clarify what would it simplify? Are you saying we can get rid of the AtomicReference? Then the exporter thread then have to take locks inside flush() as well as checking the flushRequested flag inside the lock.

jkwatson · 2021-03-05T15:36:51Z

Benchmark results for the BatchSpanProcessorBenchmarks:

Before (main branch):

After (this branch):

I would say that there really isn't a clear winner here. The zero-delay results are within the error bars, and the non-zero-delay results are mixed, at best.

sbandadd · 2021-03-05T21:23:12Z

I don't understand the benchmark BatchSpanProcessorBenchmark . Why is it trying to test with forceFlush() on every iteration. Do we expect to forceFlush() after every iteration? How is this benchmark different from BatchSpanProcessorFlushBenchmark. Diffing the files shows only the class name is different.

diff BatchSpanProcessorBenchmark.java BatchSpanProcessorFlushBenchmark.java
34c34
< public class BatchSpanProcessorBenchmark {
---
> public class BatchSpanProcessorFlushBenchmark {

And of course forceFlush() is a major cost in that benchmark which we don't expect any improvements from this change.

jkwatson · 2021-03-05T21:28:53Z

I don't understand the benchmark BatchSpanProcessorBenchmark . Why is it trying to test with forceFlush() on every iteration. Do we expect to forceFlush() after every iteration? How is this benchmark different from BatchSpanProcessorFlushBenchmark. Diffing the files shows only the class name is different.
diff BatchSpanProcessorBenchmark.java BatchSpanProcessorFlushBenchmark.java
34c34
< public class BatchSpanProcessorBenchmark {
---
> public class BatchSpanProcessorFlushBenchmark {
And of course forceFlush() is a major cost in that benchmark which we don't expect any improvements from this change.

It only does a forceFlush() after it has added all the spans have been sent through the processor to make sure that the BSP has actually handled them and they aren't stuck in the queue at the end of the process.

I have no idea why we have that duplicated benchmark. It should probably be deleted.

sbandadd · 2021-03-05T22:41:37Z

It only does a forceFlush() after it has added all the spans.

Still it does do a forceFlush() on every iteration of the export(). The benchmark is not testing real scenarios.

  @Benchmark
  @Fork(1)
  @Threads(5)
  @Warmup(iterations = 5, time = 1)
  @Measurement(**iterations** = 10, time = 1)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public void export() {

jkwatson · 2021-03-05T22:45:42Z

It only does a forceFlush() after it has added all the spans.

Still it does do a forceFlush() on every iteration of the export(). The benchmark is not testing real scenarios.
  @Benchmark
  @Fork(1)
  @Threads(5)
  @Warmup(iterations = 5, time = 1)
  @Measurement(**iterations** = 10, time = 1)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public void export() {

One forceFlush() per 1000/2000/5000 spans doesn't seem all that significant to me. But, please, by all means, write some better benchmarks and contribute them if you feel ours are not working for you.

sbandadd · 2021-03-08T23:01:29Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

+    private final Collection<SpanData> batch;
+    private final ReentrantLock lock;
+    public AtomicLong droppedSpansCounter = new AtomicLong(0);
+    public AtomicLong exportedSpansCounter = new AtomicLong(0);


This is a temporary add to show the benchmark results as I couldn't get the metric to work correctly.

sbandadd · 2021-03-08T23:10:22Z

Added a benchmark with different thread configuration. Ran the BatchSpanProcessor under different versions. The benchmark clearly shows the overhead of locks/signal/polling.

BlockingQueue: Current version. This is the baseline.
ConcurrentQueue: Using java inbuilt ConcurrentLinkedQueue.
ConcurrentQueue with batch signal: Using java's inbuilt ConcurrentLinkedQueue, but sending signal to exporter every 512 adds to the queue.
Disruptor with BlockingWait: Using Disruptor with TimeOutBlockingWaitStrategy with timeout set to scheduleDelayNanos
Disruptor with SleepingWait: Using Disruptor with SleepingWaitStrategy() with default disruptor options.
MPSCQueue with batch signal: Using MPSCQueue and sending signal to exporter every 512 adds to the queue.

All the benchmarks use the following configuration:

maxQueueSize: 2048
maxExportBatchSize: 512
exportTimeOut: 5000ms
Delay: 0ms. This is intentionally set low to surface the bottleneck, higher delays won't reveal the bottleneck since exporter is too slow. CPU usage would show clear improvement with higher exporter delays (benchmark data upcoming).

jkwatson · 2021-03-09T16:12:16Z

Is a 0ms exporter delay a "real-life" configuration? Sure, if export is free, then we can improve the BSP CPU usage, but if export takes any time at all, is the BSP CPU usage swamped by the export, in which case these tweaks are of limited value?

sbandadd · 2021-03-09T17:27:46Z

I think you are missing the difference between a throughput benchmark and a cpu one. As mentioned in my previous comment, 0ms is meant for throughput benchmark and clearly shows the lock/signal/polling overhead. It also shows that forceFlush() impact on benchnmark.

As mentioned in my previous comment that I am following up with a cpu benchmark. This reflects "real world" scenario. JMH doesn't support measuring CPU time of exporter thread. So one has to use a profiler, in my case I used YJP

Raw Data of CPU (user + kernel) time in ms

Chart

Throughput stays the same

jkwatson · 2021-03-09T18:00:07Z

So, if I read your graphs correctly,

ConcurrentQueue with batch signal: Using java's inbuilt ConcurrentLinkedQueue, but sending signal to exporter every 512 adds to the queue.

Is the best option that doesn't involve us depending on a 3rd party library. Do you have a PR with that implementation? Does it also do time-based exports so it will work correctly in low-throughput situations?

sbandadd · 2021-03-09T18:27:37Z

Yes, it is java's inbuilt ConcurrentLinkedQueue. I have a local branch that use the ConcurrentLinkedQueue and yes it does time based export. We can go with this approach for now and pull in MPSCQueue later since it makes the implementation really clean and most efficient.

I can update this PR if there is consensus

jkwatson · 2021-03-11T16:45:04Z

sdk/trace/src/jmh/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessorBenchmark.java

@@ -33,13 +33,13 @@
 @State(Scope.Benchmark)
 public class BatchSpanProcessorBenchmark {

-  private static class DelayingSpanExporter implements SpanExporter {
+  public static class DelayingSpanExporter implements SpanExporter {


if we want to re-use this, please pull it up out of this class to the top level, so it's clear it's a reusable component for benchmarking.

jkwatson · 2021-03-11T19:23:18Z

I understand all of what you've said. Why are you opposed to having this as a separate incubator implementation for others to try out before we make it into the main default implementation?

sbandadd · 2021-03-11T19:24:58Z

Also please note that throughput alone is a bad metric in the benchmark. One has to really look at the exportedSpans. For example I can write code that simply returns without adding to the queue and the throughput would be super high. So the existing BatchSpanProcessorBenchmark is even more meaningless TBH

jkwatson · 2021-03-11T19:26:52Z

Also please note that throughput alone is a bad metric in the benchmark. One has to really look at the exportedSpans. For example I can write code that simply returns without adding to the queue and the throughput would be super high. So the existing BatchSpanProcessorBenchmark is even more meaningless TBH

If you feel so strongly that our benchmark is meaningless (btw, please watch your language and be aware that there are human beings who wrote this code), then let's do something else. Please put in a PR that makes the benchmarks meaningful as a separate PR from also changing the implementations.

sbandadd · 2021-03-11T19:35:02Z

I am in no way talking about people who contributed to the code. We really appreciate all the work of the people who contributed here. I am only pointing out that the existing benchmark is not helpful for multiple reasons.

BatchSpanProcessorBenchmark and BatchSpanProcessorFlushBenchmark are actually duplicate.
A forceFlush() on every loop creates a bottleneck and not creating stress on the BSP
Measuring only throughput is not helpful since number of spans getting exported is more appropriate. Measuring CPU usage of exporter thread is also important.

And we have contributed two benchmarks that addresses the above issues. I will look into the incubator implementation, thanks for pointing it up.

jkwatson · 2021-03-11T19:39:03Z

I am in no way talking about people who contributed to the code. We really appreciate all the work of the people who contributed here. I am only pointing out that the existing benchmark is not helpful for multiple reasons.

BatchSpanProcessorBenchmark and BatchSpanProcessorFlushBenchmark are actually duplicate.

A forceFlush() on every loop creates a bottleneck and not creating stress on the BSP

Measuring only throughput is not helpful since number of spans getting exported is more appropriate. Measuring CPU usage of exporter thread is also important.

And we have contributed two benchmarks that addresses the above issues. I will look into the incubator implementation, thanks for pointing it up.

So, it will be very helpful to have the work separated. Let's have the benchmarks updated to be better, and have general agreement from the maintainers that it is better and more useful. Let's run the existing implementation against them once that is merged. Then, as a separate step, let's have a PR that changes the implementation and shows the change in the output of the benchmarks.

sbandadd · 2021-03-11T21:25:37Z

Separate PR for benchmarks https://github.com/open-telemetry/opentelemetry-java/pull/3017/files

anuraaga · 2021-03-12T03:36:17Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java


    private Worker(
        SpanExporter spanExporter,
        long scheduleDelayNanos,
        int maxExportBatchSize,
        long exporterTimeoutNanos,
-        BlockingQueue<ReadableSpan> queue) {
+        ConcurrentLinkedQueue<ReadableSpan> queue,


I'm trying to get up to speed on thread - did you compare with ArrayBlockingQueue? I would be surprised if the linked list, with the allocations it needs to do would perform better and anyways we want to avoid the complicated counter increment. There is almost no work in the blocking section so even with contention it should not really put threads to sleep I guess.

I'm particularly worried about the allocations since more garbage means unrelated app code ends up being affected by increased GC

could you pease elaborate on what you meant by compare with ArrayBlockingQueue? My benchmarks took ArrayBlockingQueue the current implementation as the baseline and this approach showed clear improvement on throughput and CPU. Do you mean using ArrayBlockingQueue with batched signal? If so, yes I did ran the benchmark but it is much less throughput than using concurrent queue though there is good CPU overhead improvements.

Regarding the GC, yes there would be some increased allocations. So ideally we should be pulling in MPSCQueue.

Do you mean using ArrayBlockingQueue with batched signal? I

Yeah I meant this - I don't see it in the graphs, it would be nice to see the numbers to know what the real effect is. If it means simplifying the code (meaning not keeping track of the span count, for example) with a small difference, then it's still worth it especially given we may try to replace with MPSCQueue.

Here are the benchmark results. Throughput suffers with BlockingQueue, but exporter CPU overhead is less with batched signal. ConcurrentQueue is better overall though.

I don't see any significant gc.alloc.rate increase. The increase is well within the error boundary.

With blocking queue batch signal

With concurrent queue batch signal

anuraaga · 2021-03-12T03:39:00Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

        if (batch.size() >= maxExportBatchSize || System.nanoTime() >= nextExportTime) {
          exportCurrentBatch();
          updateNextExportTime();
        }
+        if (queue.isEmpty()) {
+          lock.lock();


Can we use a blocking queue for the signalling too? I think a one element queue is mostly equivalent but means we don't have to manage the locks ourselves, simplifying the code drastically.

anuraaga · 2021-03-15T23:31:18Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

+      } else {
+        queue.offer(span);
+        queueSize.incrementAndGet();
+        if (addedSpansCounter.incrementAndGet() % maxExportBatchSize == 0) {


@jkwatson This reminds me of an issue I filed a long long time ago

open-telemetry/opentelemetry-specification#849

Never got clarification on whether we're supposed to eagerly export when we have maxExportBatchSize items. But it's true that what are current code is doing so we should preserve it for now which I think is the intent of this line.

anuraaga · 2021-03-15T23:37:31Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

        droppedSpans.add(1);
+      } else {
+        queue.offer(span);


I'm not comfortable with the thread-safety issues of this, it seems the queue can easily go over maxQueueSize which is an invariant we need to preserve. Since ArrayBlockingQueue is a queue with max capacity, let's just use it, it's defined for our job here - throughput with 0ms delay is sort of interesting theoretically but not an actual use case for this class, we can enjoy the CPU improvements when idle though thanks to the signalling which is what matters.

Agreed queue can go over maxQueueSize constraint temporarily. How about optimistically trying to insert into queue and backing off if maxQueueSize is reached?

if (size.incrementAndGet() >= maxQueueSize) { size.decrementAndGet(); return false; } queue.offer(span); return true

concurrentQueue does have better behavior under contention with reduced exporter cpu overhead and writer thread overhead.

If MPSCQueue is the way forward as an extension, then I feel we are good with a blocking queue and batch signaling. Is it realistic to assume a batch span processor based on MPSCQueue could be supported soon?

@sbandadd Discussed a bit with @jkwatson and let's go ahead and split this into two parts in this order

Replacing constant polling with signaling

Improve concurrency of the queuing

We don't need to bundle them into a single PR. So can we update this one to use signaling but stick with ArrayBlockingQueue, which you've demonstrated does improve CPU significantly? And in a followup we can address concurrency, most likely by vendoring in MPSCQueue as I don't see that being a big problem.

Sure thing. This sounds like a solid plan to me. Thanks for reaching a consensus !

jkwatson · 2021-03-16T22:11:00Z

@sbandadd would it be possible for you to attend one of our SIG meetings to discuss this all in more detail? Or jump onto CNCF slack and discuss in more real-time?

sbandadd · 2021-03-16T23:12:45Z

Sure, I can join the SIG meetings. I joined the CNCF opentelemtety channel as well. Thanks !

anuraaga · 2021-03-17T06:05:00Z

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

+        while (!queue.isEmpty() && batch.size() < maxExportBatchSize) {
+          batch.add(queue.poll().toSpanData());
        }
-
        if (batch.size() >= maxExportBatchSize || System.nanoTime() >= nextExportTime) {
          exportCurrentBatch();
          updateNextExportTime();
        }
+        if (queue.isEmpty()) {
+          try {
+            long pollWaitTime = nextExportTime - System.nanoTime();
+            if (pollWaitTime > 0) {
+              signal.poll(pollWaitTime, TimeUnit.NANOSECONDS);
+            }
+          } catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+            return;
+          }


Does this produce the same result using drain? Then I think it lets us use queue.size() >= maxExportBatchSize() in addSpan instead of maintaining a counter.

while(queue.size() >= maxExportBatchSize)) { batch.clear(); // Can extract method for these three lines. queue.drainTo(batch); export(batch); } if (!queue.isEmpty() && System.nanoTime() >= nextExportTime) { batch.clear(); queue.drainTo(batch); export(batch); updateNextExportTime(); } else { long pollWaitTime = nextExportTime - System.nanoTime(); if (pollWaitTime > 0) { signal.poll(pollWaitTime, TimeUnit.NANOSECONDS); } }

I guess the race would go away anyways if removing signal.clear() from that suggestion - the worst is just that it evaluates the while and if one more time before sleeping.

The logic looks really complicated imho. Repeatedly copying queue into the batch is bit confusing as well as the race. Using a counter we are essentially amortizing the cost of signal, it is intuitive to think that a signal every maxExportBatchSize essentially implies the exporter thread when receiving the signal is guaranteed to do the export.

In you suggestion, isn't export(batch) already clearing the batch? Also in the if condition !queue.isEmpty() && System.nanoTime() >= nextExportTime, it is exporting the whole batch at once instead of in maxExportBatchSize chunks.

it is intuitive to think that a signal every maxExportBatchSize essentially implies the exporter thread when receiving the signal is guaranteed to do the export.

But can't the signal be missed, since it's already exporting or something like that?

In you suggestion, isn't export(batch) already clearing the batch?

Yeah probably, it's somewhat pseudocode

Also in the if condition !queue.isEmpty() && System.nanoTime() >= nextExportTime, it is exporting the whole batch at once instead of in maxExportBatchSize chunks.

We've already exported maxExportBatchSize chunks. However, every time the interval passed, we are supposed to clear the queue fully, so this is where remaining spans get exported - it won't export more than maxExportBatchSize even if the queue suddenly had many spans added, because drainTo won't fill more than batch.size() it's ok.

I think the loop, then if pattern here actually captures the complexity more directly. We have two conditions

Eagerly export as many spans as possible when over maxExportBatchSize

If interval has passed, make sure to export entire queue

So these are handled with two conditionals any time the thread wakes up, since there are actually two conditions. Does it make sense?

Let's not worry about the race, if we never clear the signal queue we're good and I think that will work well.

The current code handles sending chunks with size less than maxExportBatchSize when interval is passed. I am not sure what becomes unclear here?

If I'm not mistaken, then if a chunk is sent that isn't maxExportBatchSize then the % becomes out of sync. The intention of this % is to send chunks with size maxExportBatchSize but if any batch is sent with a different size due to the interval check, then when the signal is sent, we will have less or more than maxExportBatchSize in an unclear way. For example if batch size is 5, and 4 spans are sent due to the interval check, then the signal will happen with only 1 span in the queue. I don't think that's our intention with that signal.

I updated the PR to just use an atomic boolean and queue size. I think it will make things much more clear though I didn't see any change in benchmark results.

drainTo() holds up queue lock for a long time which negatively impacts the writer threads.

I like this latest iteration quite a bit. Unfortunately, it looks like BatchSpanProcessorTest.forceExport() is quite flaky with this implementation. It failed about 1/5 times for me locally. I'm not sure what's going on.

@jkwatson could you paste the error? I ran the test 10 times and couldn't reproduce locally.

It just fails the first assertion with a '0', rather than a '49'.

Description: Batch span processor currently is aggressive in the sense that any new spans are sent to the exporter, this involves lots of overhead from signaling under heavy load and overhead from constant polling by exporter thread under less load. This PR makes exporter thread wait for maxExportBatchSize to avoid busy polling of the queue. BatchSpanProcessorMultiThreadBenchmark.java result ![image](https://user-images.githubusercontent.com/62265954/111420486-893c7300-86a8-11eb-8f87-feb2f86f00fc.png) BatchSpanProcessorCpuBenchmark.java result ![image](https://user-images.githubusercontent.com/62265954/111420492-8e012700-86a8-11eb-800e-7de1fbe2c2b1.png)

anuraaga

This seems like a relatively small change with a significant improvement in signaling strength. Thanks!

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java

sbandadd · 2021-03-19T00:37:41Z

I made a minor update to use a AtomicInteger instead of AtomicLong.

jkwatson

Thanks for the hard work on this!

anuraaga · 2021-03-19T02:48:06Z

Thanks @sbandadd!

sbandadd requested review from anuraaga, arminru, bogdandrutu, carlosalberto, jkwatson, Oberon00, pavolloffay, thisthat and tylerbenson as code owners March 4, 2021 20:42

jkwatson reviewed Mar 4, 2021

View reviewed changes

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch from 0ad85d7 to 046066d Compare March 4, 2021 20:45

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch from 046066d to 8c2471b Compare March 4, 2021 22:23

bogdandrutu reviewed Mar 4, 2021

View reviewed changes

anuraaga reviewed Mar 5, 2021

View reviewed changes

bogdandrutu reviewed Mar 5, 2021

View reviewed changes

sbandadd commented Mar 8, 2021

View reviewed changes

jkwatson reviewed Mar 11, 2021

View reviewed changes

anuraaga reviewed Mar 12, 2021

View reviewed changes

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch from 4e94955 to c102d8f Compare March 15, 2021 21:27

anuraaga reviewed Mar 15, 2021

View reviewed changes

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch from c102d8f to f47c1dc Compare March 17, 2021 05:38

anuraaga mentioned this pull request Mar 17, 2021

Replace ArrayBlockingQueue with jctools queue. #3034

Merged

anuraaga reviewed Mar 17, 2021

View reviewed changes

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch from f47c1dc to 5ac7737 Compare March 18, 2021 17:50

anuraaga approved these changes Mar 18, 2021

View reviewed changes

jkwatson reviewed Mar 19, 2021

View reviewed changes

sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java Outdated Show resolved Hide resolved

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch 2 times, most recently from 37d2863 to 556210e Compare March 19, 2021 00:27

More predictable signaling

216fa44

sbandadd force-pushed the sbandadd-optimize-bsp-pr branch from 556210e to 216fa44 Compare March 19, 2021 00:28

jkwatson approved these changes Mar 19, 2021

View reviewed changes

anuraaga mentioned this pull request Mar 19, 2021

Don't poll more than needed in BSP. #2971

Closed

anuraaga merged commit 085eb9d into open-telemetry:main Mar 19, 2021

This was referenced Dec 19, 2021

Temurin JDK #4011

Merged

use Eclipse Temurin JDK docker image #4012

Merged

mohitmahi mentioned this pull request Jul 3, 2022

Added a static method "drain" under JcTools with a generic consumer #4582

Merged

Optimize batch span processor #2983

Optimize batch span processor #2983

Conversation

sbandadd commented Mar 4, 2021 • edited Loading

Benchmark results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Mar 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Mar 4, 2021

sbandadd commented Mar 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Mar 5, 2021

sbandadd commented Mar 5, 2021 • edited Loading

jkwatson commented Mar 5, 2021

sbandadd commented Mar 5, 2021 • edited Loading

jkwatson commented Mar 5, 2021

Choose a reason for hiding this comment

sbandadd commented Mar 8, 2021 • edited Loading

jkwatson commented Mar 9, 2021

sbandadd commented Mar 9, 2021 • edited Loading

Raw Data of CPU (user + kernel) time in ms

Chart

Throughput stays the same

jkwatson commented Mar 9, 2021

sbandadd commented Mar 9, 2021 • edited Loading

Choose a reason for hiding this comment

jkwatson commented Mar 11, 2021

sbandadd commented Mar 11, 2021

jkwatson commented Mar 11, 2021

sbandadd commented Mar 11, 2021

jkwatson commented Mar 11, 2021

sbandadd commented Mar 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbandadd Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbandadd Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

With blocking queue batch signal

With concurrent queue batch signal

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuraaga Mar 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkwatson commented Mar 16, 2021

sbandadd commented Mar 16, 2021

anuraaga Mar 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuraaga left a comment

Choose a reason for hiding this comment

sbandadd commented Mar 19, 2021

jkwatson left a comment

Choose a reason for hiding this comment

anuraaga commented Mar 19, 2021

sbandadd commented Mar 4, 2021 •

edited

Loading

sbandadd commented Mar 5, 2021 •

edited

Loading

sbandadd commented Mar 5, 2021 •

edited

Loading

sbandadd commented Mar 8, 2021 •

edited

Loading

sbandadd commented Mar 9, 2021 •

edited

Loading

sbandadd commented Mar 9, 2021 •

edited

Loading

sbandadd Mar 15, 2021 •

edited

Loading

sbandadd Mar 15, 2021 •

edited

Loading

anuraaga Mar 17, 2021 •

edited

Loading

anuraaga Mar 17, 2021 •

edited

Loading