apisec: new sampler implementation by RomainMuller · Pull Request #39 · DataDog/appsec-internal-go

RomainMuller · 2025-02-25T17:37:18Z

Implementation for a new API Security sampler that aimes at providing
better information as the sampling decision is made per endpoint (method
plus route) and response status code; instead of being made across all.

This implementation is based on an open addressing hash table using
copy-update-replace semantics, so that it is able to operate completely
lock-less. This is described by RFC-1021.

Here is the result of running the benchmark on my machine with
1,000,000,000 operations distributed over concurrency * 10 goroutines
for each configuration:

goos: darwin
goarch: arm64
pkg: github.com/DataDog/appsec-internal-go/apisec
cpu: Apple M1 Max
                                         │      1       │                 1000                  │
                                         │    sec/op    │    sec/op      vs base                │
Sampler/keySpaceSize=2048/interval=1s-10   76.27n ±  2%    88.40n ±  1%  +15.91% (p=0.000 n=10)
Sampler/keySpaceSize=8192/interval=1s-10   77.77n ± 26%   119.85n ± 25%  +54.11% (p=0.000 n=10)
geomean                                    77.01n          102.9n        +33.65%

                                         │       1       │                 1000                 │
                                         │     keep%     │    keep%      vs base                │
Sampler/keySpaceSize=2048/interval=1s-10    15.77m ±  1%   18.23m ±  3%  +15.60% (p=0.000 n=10)
Sampler/keySpaceSize=8192/interval=1s-10   1314.5m ± 26%   204.4m ± 38%  -84.45% (p=0.000 n=10)
geomean                                     144.0m         61.04m        -57.60%

                                         │      1       │                  1000                   │
                                         │     B/op     │    B/op     vs base                     │
Sampler/keySpaceSize=2048/interval=1s-10   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Sampler/keySpaceSize=8192/interval=1s-10   1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
geomean                                               ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                         │      1       │                1000                 │
                                         │  allocs/op   │ allocs/op   vs base                 │
Sampler/keySpaceSize=2048/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Sampler/keySpaceSize=8192/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Implementation for a new API Security sampler that aimes at providing better information as the sampling decision is made per endpoint (method plus route) and response status code; instead of being made across all. This implementation is based on a ccache-style bucketed LRU cache that allows distributing lock contention over several buckets, 16 by default, which can optionally be scaled to achieve better concurrency. Here is the result of running the benchmark on my machine with `1,000,000,000` operations distribted over `concurrency * 10` goroutines for each configuration: ``` goos: darwin goarch: arm64 pkg: github.com/DataDog/appsec-internal-go/apisec cpu: Apple M1 Max │ 1 │ 1000 │ │ sec/op │ sec/op vs base │ Sampler/keySpaceSize=2048/interval=1s-10 79.74n ± 3% 94.39n ± 0% +18.37% (p=0.000 n=10) Sampler/keySpaceSize=8192/interval=1s-10 88.58n ± 1% 94.81n ± 0% +7.03% (p=0.000 n=10) geomean 84.04n 94.60n +12.56% │ 1 │ 1000 │ │ keep% │ keep% vs base │ Sampler/keySpaceSize=2048/interval=1s-10 16.38m ± 3% 19.41m ± 1% +18.50% (p=0.000 n=10) Sampler/keySpaceSize=8192/interval=1s-10 780.60m ± 6% 73.97m ± 1% -90.52% (p=0.000 n=10) geomean 113.1m 37.89m -66.49% │ 1 │ 1000 │ │ B/op │ B/op vs base │ Sampler/keySpaceSize=2048/interval=1s-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Sampler/keySpaceSize=8192/interval=1s-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ 1 │ 1000 │ │ allocs/op │ allocs/op vs base │ Sampler/keySpaceSize=2048/interval=1s-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Sampler/keySpaceSize=8192/interval=1s-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```

…rces

…sting)

Implementation of the new API Security sampler defined by the [latest RFC](https://docs.google.com/document/d/1PYoHms9PPXR8V_5_T5-KXAhoFDKQYA8mTnmS12xkGOE/edit?tab=t.0). This change makes the API Security sampler make decisions specific to a given endpoint (method + route + response status code) instead of using a simplistic sampling rate. This allows for improved coverage and accuracy of schema extraction as part of API Security. This change uses the sampler implementation from github.com/DataDog/appsec-internal-go#39.

Julio-Guerra · 2025-03-19T21:51:04Z

+			// We're already holding the maximium number of items, so we will rebuild
+			// in order to perform an eviction pass. Updates made in the meantime will
+			// be lost.
+			go m.rebuild(table, threshold)


are we sure about this unbounded goroutine creation?

This can only happen if m.rebuilding.CompareAndSwap(false, true) swaps... so there can be only 1 of these at a time.

Julio-Guerra · 2025-03-19T21:52:51Z

+	})
+}
+
+func BenchmarkSampler(b *testing.B) {


can we add it to the benchmarking platform?

We could... Would have to check how it's done...

IIRC, you just need to add the benchmark name in the config file at the root of the repo (gitlab file IIRC)

In dd-trace-go yes.... but in other repo we have to create a whole new pipeline in the multiple benchmarking-platform repository full of shell script...

If you really feel like it there is an attempt at doing this for go-libddwaf as a branch in there

eliottness · 2025-03-28T13:28:12Z

+	for {
+		var exists bool
+		entry, exists = table.FindEntry(key)
+		if exists {
+			// The entry already exists, so we can proceed...
+			break
+		}
+
+		// We're adding a new entry to the table, so we need to:
+		// 1. Ensure we have capacity (possibly trigger an eviction rebuild)
+		// 2. Claim the slot (or look for another slot if it's already claimed)
+		newCount := table.count.Add(1)
+		if newCount > config.MaxItemCount && m.rebuilding.CompareAndSwap(false, true) {
+			// We're already holding the maximium number of items, so we will rebuild
+			// in order to perform an eviction pass. Updates made in the meantime will
+			// be lost.
+			go m.rebuild(table, threshold)
+		}
+		if newCount > capacity {
+			// We don't have space to add any new item, so we'll ignore this and
+			// decide to DROP it (we may otherwise cause a surge of inconditional
+			// keep decisions, that is not desirable). This only happens in the most
+			// dire of circumstances (a table rebuild did not complete fast enough
+			// to make up free space).
+			table.count.Add(-1)
+			return false
+		}
+
+		if entry.Key.CompareAndSwap(0, key) {
+			// We have successfully claimed the slot, so now we can proceed to set it
+			// up. If we fail to swap, another goroutine has sampled this slot just
+			// before this one, so we can DROP the sample.
+			return entry.Data.CompareAndSwap(0, newEntryData(now, now))
+		}
+
+		if entry.Key.Load() == key {
+			// Another goroutine has already used this slot for this key, and very
+			// little time has passed since then, so we can DROP this sample... This
+			// is extremely unlikely to happen (and nearly impossible to reliably
+			// cover in unit tests).
+			return false
+		}
+
+		// Another goroutine has already used this slot for another key... We
+		// will try to find another slot then...
+		table.count.Add(-1)
+	}


This for loop can definitively be made into a "LoadOrStore" sub function or equivalent

I tried twice and it made things worse from a "following what's going on" standpoint...

eliottness · 2025-03-28T14:36:02Z

+	})
+}
+
+func BenchmarkSampler(b *testing.B) {


In dd-trace-go yes.... but in other repo we have to create a whole new pipeline in the multiple benchmarking-platform repository full of shell script...

If you really feel like it there is an attempt at doing this for go-libddwaf as a branch in there

Co-authored-by: Eliott Bouhana <47679741+eliottness@users.noreply.github.com>

Implementation of the new API Security sampler defined by the [latest RFC](https://docs.google.com/document/d/1PYoHms9PPXR8V_5_T5-KXAhoFDKQYA8mTnmS12xkGOE/edit?tab=t.0). This change makes the API Security sampler make decisions specific to a given endpoint (method + route + response status code) instead of using a simplistic sampling rate. This allows for improved coverage and accuracy of schema extraction as part of API Security. This change uses the sampler implementation from DataDog/appsec-internal-go#39.

RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch 2 times, most recently from 37e0277 to 981e1b3 Compare February 26, 2025 08:54

RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch from 981e1b3 to 48f9926 Compare March 5, 2025 13:07

datadog-datadog-prod-us1 Bot reviewed Mar 6, 2025

View reviewed changes

Comment thread apisec/internal/timed/set.go

switch to open addressing hash table, as it consumes a lot less resou…

64eeac0

…rces

RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch from 6354163 to 64eeac0 Compare March 6, 2025 18:15

Host sampler in APISec config, add option to override it (e.g, for te…

cf70774

…sting)

RomainMuller marked this pull request as ready for review March 19, 2025 17:07

RomainMuller requested a review from a team as a code owner March 19, 2025 17:07

RomainMuller mentioned this pull request Mar 19, 2025

feat(apisec): implement new API Security sampler DataDog/dd-trace-go#3315

Merged

7 tasks

Julio-Guerra reviewed Mar 19, 2025

View reviewed changes

eliottness reviewed Mar 28, 2025

View reviewed changes

RomainMuller and others added 2 commits March 31, 2025 10:02

Documentation adjustments

33bf2ea

Co-authored-by: Eliott Bouhana <47679741+eliottness@users.noreply.github.com>

curData.Kept()

9ff0f53

RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch from 392cd58 to 9ff0f53 Compare March 31, 2025 08:44

RomainMuller added 6 commits March 31, 2025 10:53

Clarify some commentary

ccb72cf

Clarify loop conditions

acb5697

Simplify sort for eviction

bc6da63

Make sampler test more specific on keep rates

6ddc5d4

Improve Set documentation a bit

0b8d34d

Rename Set to LRU

b94c22d

eliottness approved these changes Apr 2, 2025

View reviewed changes

RomainMuller merged commit 5688dad into main Apr 2, 2025
41 checks passed

RomainMuller deleted the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch April 2, 2025 09:01

campaigner-prod Bot mentioned this pull request Apr 23, 2026

fix(deps): vuln major upgrades — 23 packages (major: 1 · unstable: 1 · minor: 21) [exp/khaudit] DataDog/KubeHound#419

Closed

6 tasks

Conversation

RomainMuller commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RomainMuller commented Feb 25, 2025 •

edited

Loading