Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Caffeine for weak maps #2601

Merged
merged 20 commits into from
Mar 24, 2021
Merged

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Mar 19, 2021

I'd like to use weak maps in library instrumentation too in some cases and we already have it as part of the caching API. I checked some very basic benchmarks and it looks like weaklockfree could be a bit better at low contention with caffeine better at high contention, though they're pretty similar. I did notice with Datadog that they seem to use Caffeine for Java 8+ and presumably benchmarked much better than me so think this is ok. DataDog/dd-trace-java#2044 It wouldn't be hard to switch to weaklockfree in the future though if needed.

Edit: squashed just has a silly bug

Benchmark                                                                              Mode  Cnt          Score           Error   Units
WeakMapBenchmark.threads1_caffeine                                                    thrpt   15          5.298 ±         0.787  ops/us
WeakMapBenchmark.threads1_caffeine:heap.total.after                                   thrpt   15  157146589.867 ±   2315509.766   bytes
WeakMapBenchmark.threads1_caffeine:heap.total.before                                  thrpt   15   45857723.733 ±   4631019.532   bytes
WeakMapBenchmark.threads1_caffeine:heap.used.after                                    thrpt   15   57109048.533 ±  26323528.636   bytes
WeakMapBenchmark.threads1_caffeine:heap.used.before                                   thrpt   15    8902554.133 ±     22938.052   bytes
WeakMapBenchmark.threads1_caffeine:·gc.alloc.rate                                     thrpt   15        810.426 ±       150.078  MB/sec
WeakMapBenchmark.threads1_caffeine:·gc.alloc.rate.norm                                thrpt   15        240.566 ±        21.685    B/op
WeakMapBenchmark.threads1_caffeine:·gc.churn.G1_Eden_Space                            thrpt   15        784.116 ±       156.570  MB/sec
WeakMapBenchmark.threads1_caffeine:·gc.churn.G1_Eden_Space.norm                       thrpt   15        231.973 ±        22.565    B/op
WeakMapBenchmark.threads1_caffeine:·gc.churn.G1_Survivor_Space                        thrpt   15          0.001 ±         0.001  MB/sec
WeakMapBenchmark.threads1_caffeine:·gc.churn.G1_Survivor_Space.norm                   thrpt   15         ≈ 10⁻³                    B/op
WeakMapBenchmark.threads1_caffeine:·gc.count                                          thrpt   15        337.000                  counts
WeakMapBenchmark.threads1_caffeine:·gc.time                                           thrpt   15         91.000                      ms
WeakMapBenchmark.threads1_weakConcurrentMap                                           thrpt   15         12.751 ±         0.269  ops/us
WeakMapBenchmark.threads1_weakConcurrentMap:heap.total.after                          thrpt   15  140648994.133 ±  41056597.241   bytes
WeakMapBenchmark.threads1_weakConcurrentMap:heap.total.before                         thrpt   15   42502280.533 ±   2315509.766   bytes
WeakMapBenchmark.threads1_weakConcurrentMap:heap.used.after                           thrpt   15   54642184.000 ±  26072074.735   bytes
WeakMapBenchmark.threads1_weakConcurrentMap:heap.used.before                          thrpt   15    8487418.133 ±      7680.149   bytes
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.alloc.rate                            thrpt   15        518.330 ±        11.124  MB/sec
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.alloc.rate.norm                       thrpt   15         64.043 ±         0.019    B/op
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.churn.G1_Eden_Space                   thrpt   15        490.203 ±        22.257  MB/sec
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.churn.G1_Eden_Space.norm              thrpt   15         60.557 ±         2.076    B/op
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.churn.G1_Survivor_Space               thrpt   15          0.001 ±         0.001  MB/sec
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.churn.G1_Survivor_Space.norm          thrpt   15         ≈ 10⁻⁴                    B/op
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.count                                 thrpt   15        319.000                  counts
WeakMapBenchmark.threads1_weakConcurrentMap:·gc.time                                  thrpt   15         94.000                      ms
WeakMapBenchmark.threads1_weakConcurrentMap_inline                                    thrpt   15         12.130 ±         0.588  ops/us
WeakMapBenchmark.threads1_weakConcurrentMap_inline:heap.total.after                   thrpt   15  148478361.600 ±  29049531.099   bytes
WeakMapBenchmark.threads1_weakConcurrentMap_inline:heap.total.before                  thrpt   15   43061521.067 ±   3155507.910   bytes
WeakMapBenchmark.threads1_weakConcurrentMap_inline:heap.used.after                    thrpt   15   54266351.467 ±  23349224.517   bytes
WeakMapBenchmark.threads1_weakConcurrentMap_inline:heap.used.before                   thrpt   15    8487630.400 ±      7705.972   bytes
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.alloc.rate                     thrpt   15        493.376 ±        23.946  MB/sec
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.alloc.rate.norm                thrpt   15         64.042 ±         0.015    B/op
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.churn.G1_Eden_Space            thrpt   15        465.397 ±        29.450  MB/sec
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.churn.G1_Eden_Space.norm       thrpt   15         60.396 ±         1.966    B/op
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.churn.G1_Survivor_Space        thrpt   15          0.001 ±         0.001  MB/sec
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.churn.G1_Survivor_Space.norm   thrpt   15         ≈ 10⁻⁴                    B/op
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.count                          thrpt   15        297.000                  counts
WeakMapBenchmark.threads1_weakConcurrentMap_inline:·gc.time                           thrpt   15         95.000                      ms
WeakMapBenchmark.threads5_caffeine                                                    thrpt   15          9.564 ±         1.927  ops/us
WeakMapBenchmark.threads5_caffeine:heap.total.after                                   thrpt   15  470321288.533 ± 217982634.552   bytes
WeakMapBenchmark.threads5_caffeine:heap.total.before                                  thrpt   15   76895573.333 ±  18539577.291   bytes
WeakMapBenchmark.threads5_caffeine:heap.used.after                                    thrpt   15  171915765.867 ±  91269976.829   bytes
WeakMapBenchmark.threads5_caffeine:heap.used.before                                   thrpt   15    9419922.667 ±      7977.655   bytes
WeakMapBenchmark.threads5_caffeine:·gc.alloc.rate                                     thrpt   15       1315.245 ±       262.496  MB/sec
WeakMapBenchmark.threads5_caffeine:·gc.alloc.rate.norm                                thrpt   15        216.595 ±         0.315    B/op
WeakMapBenchmark.threads5_caffeine:·gc.churn.G1_Eden_Space                            thrpt   15       1227.209 ±       307.958  MB/sec
WeakMapBenchmark.threads5_caffeine:·gc.churn.G1_Eden_Space.norm                       thrpt   15        200.576 ±        11.320    B/op
WeakMapBenchmark.threads5_caffeine:·gc.churn.G1_Survivor_Space                        thrpt   15          1.002 ±         0.925  MB/sec
WeakMapBenchmark.threads5_caffeine:·gc.churn.G1_Survivor_Space.norm                   thrpt   15          0.188 ±         0.172    B/op
WeakMapBenchmark.threads5_caffeine:·gc.count                                          thrpt   15        310.000                  counts
WeakMapBenchmark.threads5_caffeine:·gc.time                                           thrpt   15        812.000                      ms
WeakMapBenchmark.threads5_weakConcurrentMap                                           thrpt   15         10.367 ±         0.858  ops/us
WeakMapBenchmark.threads5_weakConcurrentMap:heap.total.after                          thrpt   15   64592281.600 ±  50126527.278   bytes
WeakMapBenchmark.threads5_weakConcurrentMap:heap.total.before                         thrpt   15   41943040.000 ±         0.001   bytes
WeakMapBenchmark.threads5_weakConcurrentMap:heap.used.after                           thrpt   15   18035037.333 ±   7965927.844   bytes
WeakMapBenchmark.threads5_weakConcurrentMap:heap.used.before                          thrpt   15    8501754.133 ±      8465.091   bytes
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.alloc.rate                            thrpt   15        421.795 ±        34.749  MB/sec
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.alloc.rate.norm                       thrpt   15         64.080 ±         0.020    B/op
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.churn.G1_Eden_Space                   thrpt   15        421.753 ±        34.835  MB/sec
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.churn.G1_Eden_Space.norm              thrpt   15         64.076 ±         0.775    B/op
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.churn.G1_Survivor_Space               thrpt   15          0.002 ±         0.002  MB/sec
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.churn.G1_Survivor_Space.norm          thrpt   15         ≈ 10⁻⁴                    B/op
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.count                                 thrpt   15        482.000                  counts
WeakMapBenchmark.threads5_weakConcurrentMap:·gc.time                                  thrpt   15        127.000                      ms
WeakMapBenchmark.threads5_weakConcurrentMap_inline                                    thrpt   15         10.778 ±         1.856  ops/us
WeakMapBenchmark.threads5_weakConcurrentMap_inline:heap.total.after                   thrpt   15  199509060.267 ±  59960638.436   bytes
WeakMapBenchmark.threads5_weakConcurrentMap_inline:heap.total.before                  thrpt   15   51729749.333 ±   7657828.750   bytes
WeakMapBenchmark.threads5_weakConcurrentMap_inline:heap.used.after                    thrpt   15   77113805.333 ±  26556091.191   bytes
WeakMapBenchmark.threads5_weakConcurrentMap_inline:heap.used.before                   thrpt   15    8501231.467 ±      7230.036   bytes
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.alloc.rate                     thrpt   15        441.126 ±        73.258  MB/sec
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.alloc.rate.norm                thrpt   15         64.502 ±         0.356    B/op
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.churn.G1_Eden_Space            thrpt   15        407.061 ±        82.762  MB/sec
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.churn.G1_Eden_Space.norm       thrpt   15         59.255 ±         2.253    B/op
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.churn.G1_Survivor_Space        thrpt   15          0.760 ±         0.682  MB/sec
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.churn.G1_Survivor_Space.norm   thrpt   15          0.125 ±         0.112    B/op
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.count                          thrpt   15        211.000                  counts
WeakMapBenchmark.threads5_weakConcurrentMap_inline:·gc.time                           thrpt   15        556.000                      ms
WeakMapBenchmark.threads10_caffeine                                                   thrpt   15         13.823 ±         1.765  ops/us
WeakMapBenchmark.threads10_caffeine:heap.total.after                                  thrpt   15  589719142.400 ± 301587686.785   bytes
WeakMapBenchmark.threads10_caffeine:heap.total.before                                 thrpt   15   78433484.800 ±  13204062.656   bytes
WeakMapBenchmark.threads10_caffeine:heap.used.after                                   thrpt   15  182798754.667 ± 127869846.746   bytes
WeakMapBenchmark.threads10_caffeine:heap.used.before                                  thrpt   15    9443289.067 ±      6207.509   bytes
WeakMapBenchmark.threads10_caffeine:·gc.alloc.rate                                    thrpt   15       1898.692 ±       241.840  MB/sec
WeakMapBenchmark.threads10_caffeine:·gc.alloc.rate.norm                               thrpt   15        216.350 ±         0.143    B/op
WeakMapBenchmark.threads10_caffeine:·gc.churn.G1_Eden_Space                           thrpt   15       1812.612 ±       296.898  MB/sec
WeakMapBenchmark.threads10_caffeine:·gc.churn.G1_Eden_Space.norm                      thrpt   15        205.841 ±        10.249    B/op
WeakMapBenchmark.threads10_caffeine:·gc.churn.G1_Survivor_Space                       thrpt   15          0.869 ±         0.854  MB/sec
WeakMapBenchmark.threads10_caffeine:·gc.churn.G1_Survivor_Space.norm                  thrpt   15          0.109 ±         0.112    B/op
WeakMapBenchmark.threads10_caffeine:·gc.count                                         thrpt   15        344.000                  counts
WeakMapBenchmark.threads10_caffeine:·gc.time                                          thrpt   15        735.000                      ms
WeakMapBenchmark.threads10_weakConcurrentMap                                          thrpt   15         13.146 ±         0.634  ops/us
WeakMapBenchmark.threads10_weakConcurrentMap:heap.total.after                         thrpt   15  282276659.200 ±   5070196.003   bytes
WeakMapBenchmark.threads10_weakConcurrentMap:heap.total.before                        thrpt   15   62914560.000 ±         0.001   bytes
WeakMapBenchmark.threads10_weakConcurrentMap:heap.used.after                          thrpt   15   88683574.933 ±  40029276.726   bytes
WeakMapBenchmark.threads10_weakConcurrentMap:heap.used.before                         thrpt   15    8519905.067 ±      5701.110   bytes
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.alloc.rate                           thrpt   15        540.695 ±        26.257  MB/sec
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.alloc.rate.norm                      thrpt   15         64.816 ±         0.084    B/op
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.churn.G1_Eden_Space                  thrpt   15        507.180 ±        42.638  MB/sec
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.churn.G1_Eden_Space.norm             thrpt   15         60.742 ±         3.177    B/op
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.churn.G1_Survivor_Space              thrpt   15          1.582 ±         0.952  MB/sec
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.churn.G1_Survivor_Space.norm         thrpt   15          0.189 ±         0.112    B/op
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.count                                thrpt   15        120.000                  counts
WeakMapBenchmark.threads10_weakConcurrentMap:·gc.time                                 thrpt   15       1313.000                      ms
WeakMapBenchmark.threads10_weakConcurrentMap_inline                                   thrpt   15         12.884 ±         1.553  ops/us
WeakMapBenchmark.threads10_weakConcurrentMap_inline:heap.total.after                  thrpt   15  287170013.867 ±  13649728.764   bytes
WeakMapBenchmark.threads10_weakConcurrentMap_inline:heap.total.before                 thrpt   15   63473800.533 ±   2315509.766   bytes
WeakMapBenchmark.threads10_weakConcurrentMap_inline:heap.used.after                   thrpt   15  106408707.733 ±  55857715.459   bytes
WeakMapBenchmark.threads10_weakConcurrentMap_inline:heap.used.before                  thrpt   15    8520441.600 ±      5482.238   bytes
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.alloc.rate                    thrpt   15        530.420 ±        63.078  MB/sec
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.alloc.rate.norm               thrpt   15         64.882 ±         0.218    B/op
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.churn.G1_Eden_Space           thrpt   15        485.882 ±        51.651  MB/sec
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.churn.G1_Eden_Space.norm      thrpt   15         59.620 ±         3.907    B/op
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.churn.G1_Survivor_Space       thrpt   15          3.615 ±         2.581  MB/sec
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.churn.G1_Survivor_Space.norm  thrpt   15          0.468 ±         0.350    B/op
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.count                         thrpt   15        127.000                  counts
WeakMapBenchmark.threads10_weakConcurrentMap_inline:·gc.time                          thrpt   15       1327.000                      ms

@anuraaga anuraaga mentioned this pull request Mar 20, 2021
@@ -55,6 +65,9 @@ void bounded() {
void unbounded() {
Cache<String, String> cache = Cache.newBuilder().setWeakKeys().build();

assertThat(cache.computeIfAbsent("bear", unused -> "roar")).isEqualTo("roar");
cache.remove("bear");

CaffeineCache<?, ?> caffeineCache = ((CaffeineCache<?, ?>) cache);
String cat = new String("cat");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I detect that this code is problematic. According to the Performance (PERFORMANCE), Dm: Method invokes inefficient new String(String) constructor (DM_STRING_CTOR).
Using the java.lang.String(String) constructor wastes memory because the object so constructed will be functionally indistinguishable from the String passed as a parameter.  Just use the argument String directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lillieMaiBauer Can you please disable this organization from your code review bot? We have satisfactory checks in place, including exclusions that need to be made in some cases like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will

@@ -125,7 +126,7 @@ public static HelperInjector forDynamicTypes(
classLoader = BOOTSTRAP_CLASSLOADER_PLACEHOLDER;
}

if (!injectedClassLoaders.containsKey(classLoader)) {
if (Boolean.FALSE == injectedClassLoaders.get(classLoader)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I detect that this code is problematic. According to the Bad practice (BAD_PRACTICE), RC: Suspicious reference comparison of Boolean values (RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN).
This method compares two Boolean values using the == or != operator. Normally, there are only two Boolean values (Boolean.TRUE and Boolean.FALSE), but it is possible to create other Boolean objects using the new Boolean(b) constructor. It is best to avoid such objects, but if they do exist, then checking Boolean objects for equality using == or != will give results than are different than you would get using .equals(...).

@Threads(5)
public void caffeineMap_fiveThreads() {
caffeineMap.put(key, "foo");
caffeineMap.get(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inject Blackhole and use returned value. Otherwise who knows what compiler will do

@@ -1,127 +1,127 @@

#javaagent
##Dependency License Report
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@anuraaga
Copy link
Contributor Author

Looking at dd-trace more closely, I noticed they still use weaklockfree for unbounded, so I've restored it to use to reduce gaps - there seem to be some tradeoffs but in our common case of adding a field to a user object, we'd expect low contention and weaklockfree is probably a bit better.

@@ -18,35 +22,15 @@ dependencies {
}

jmh {
timeUnit = 'ms' // Output time unit. Available time units are: [m, s, ms, us, ns].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these since they override annotations. Benchmarks often need specific settings so copy-pasta of annotations is better than global control

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

private static final WeakConcurrentMap<String, String> weakConcurrentMap =
new WeakConcurrentMap<>(true, true);

private static final WeakConcurrentMap<String, String> weakConcurrentMapInline =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added inline expunction benchmark too. Found very little difference in performance so settled on it since it means not having to worry about cleaner threads which are hard to manage for library instrumentation (we were using a common task executor before)


@Override
public V computeIfAbsent(K key, Function<? super K, ? extends V> mappingFunction) {
V value = get(key);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation mostly same as WeakMapSuppliers before it, though it's using putIfAbsent so a bit better I think.

@anuraaga anuraaga requested review from trask and iNikem March 22, 2021 09:00
@anuraaga
Copy link
Contributor Author

Made some fairly large changes to continue to use weaklockfree for unbounded like dd-trace, PTAL

@@ -18,35 +22,15 @@ dependencies {
}

jmh {
timeUnit = 'ms' // Output time unit. Available time units are: [m, s, ms, us, ns].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

boolean value = hasResources(cl);
cache.put(cl, value);
return value;
return cache.computeIfAbsent(cl, this::hasResources);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +120 to +124
// We only have compileOnly dependencies on these to make sure they don't leak into POMs.
licenseReportDependencies(deps.caffeine) {
transitive = false
}
licenseReportDependencies deps.weaklockfree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@anuraaga anuraaga merged commit 79d7e88 into open-telemetry:main Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants