Skip to content
This repository was archived by the owner on Apr 3, 2026. It is now read-only.

apisec: new sampler implementation#39

Merged
RomainMuller merged 11 commits intomainfrom
romain.marcadier/APPSEC-56547/api-sec-sampling-v2
Apr 2, 2025
Merged

apisec: new sampler implementation#39
RomainMuller merged 11 commits intomainfrom
romain.marcadier/APPSEC-56547/api-sec-sampling-v2

Conversation

@RomainMuller
Copy link
Copy Markdown
Contributor

@RomainMuller RomainMuller commented Feb 25, 2025

Implementation for a new API Security sampler that aimes at providing
better information as the sampling decision is made per endpoint (method
plus route) and response status code; instead of being made across all.

This implementation is based on an open addressing hash table using
copy-update-replace semantics, so that it is able to operate completely
lock-less. This is described by RFC-1021.

Here is the result of running the benchmark on my machine with
1,000,000,000 operations distributed over concurrency * 10 goroutines
for each configuration:

goos: darwin
goarch: arm64
pkg: github.com/DataDog/appsec-internal-go/apisec
cpu: Apple M1 Max
                                         │      1       │                 1000                  │
                                         │    sec/op    │    sec/op      vs base                │
Sampler/keySpaceSize=2048/interval=1s-10   76.27n ±  2%    88.40n ±  1%  +15.91% (p=0.000 n=10)
Sampler/keySpaceSize=8192/interval=1s-10   77.77n ± 26%   119.85n ± 25%  +54.11% (p=0.000 n=10)
geomean                                    77.01n          102.9n        +33.65%

                                         │       1       │                 1000                 │
                                         │     keep%     │    keep%      vs base                │
Sampler/keySpaceSize=2048/interval=1s-10    15.77m ±  1%   18.23m ±  3%  +15.60% (p=0.000 n=10)
Sampler/keySpaceSize=8192/interval=1s-10   1314.5m ± 26%   204.4m ± 38%  -84.45% (p=0.000 n=10)
geomean                                     144.0m         61.04m        -57.60%

                                         │      1       │                  1000                   │
                                         │     B/op     │    B/op     vs base                     │
Sampler/keySpaceSize=2048/interval=1s-10   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
Sampler/keySpaceSize=8192/interval=1s-10   1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
geomean                                               ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                         │      1       │                1000                 │
                                         │  allocs/op   │ allocs/op   vs base                 │
Sampler/keySpaceSize=2048/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Sampler/keySpaceSize=8192/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@RomainMuller RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch 2 times, most recently from 37e0277 to 981e1b3 Compare February 26, 2025 08:54
Implementation for a new API Security sampler that aimes at providing
better information as the sampling decision is made per endpoint (method
plus route) and response status code; instead of being made across all.

This implementation is based on a ccache-style bucketed LRU cache that
allows distributing lock contention over several buckets, 16 by default,
which can optionally be scaled to achieve better concurrency.

Here is the result of running the benchmark on my machine with
`1,000,000,000` operations distribted over `concurrency * 10` goroutines
for each configuration:

```
goos: darwin
goarch: arm64
pkg: github.com/DataDog/appsec-internal-go/apisec
cpu: Apple M1 Max
                                         │      1      │                1000                 │
                                         │   sec/op    │   sec/op     vs base                │
Sampler/keySpaceSize=2048/interval=1s-10   79.74n ± 3%   94.39n ± 0%  +18.37% (p=0.000 n=10)
Sampler/keySpaceSize=8192/interval=1s-10   88.58n ± 1%   94.81n ± 0%   +7.03% (p=0.000 n=10)
geomean                                    84.04n        94.60n       +12.56%

                                         │      1       │                1000                 │
                                         │    keep%     │    keep%     vs base                │
Sampler/keySpaceSize=2048/interval=1s-10    16.38m ± 3%   19.41m ± 1%  +18.50% (p=0.000 n=10)
Sampler/keySpaceSize=8192/interval=1s-10   780.60m ± 6%   73.97m ± 1%  -90.52% (p=0.000 n=10)
geomean                                     113.1m        37.89m       -66.49%

                                         │      1       │                1000                 │
                                         │     B/op     │    B/op     vs base                 │
Sampler/keySpaceSize=2048/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Sampler/keySpaceSize=8192/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                         │      1       │                1000                 │
                                         │  allocs/op   │ allocs/op   vs base                 │
Sampler/keySpaceSize=2048/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Sampler/keySpaceSize=8192/interval=1s-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
@RomainMuller RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch from 981e1b3 to 48f9926 Compare March 5, 2025 13:07
Comment thread apisec/internal/timed/set.go
@RomainMuller RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch from 6354163 to 64eeac0 Compare March 6, 2025 18:15
@RomainMuller RomainMuller marked this pull request as ready for review March 19, 2025 17:07
@RomainMuller RomainMuller requested a review from a team as a code owner March 19, 2025 17:07
RomainMuller added a commit to DataDog/dd-trace-go that referenced this pull request Mar 19, 2025
Implementation of the new API Security sampler defined by the
[latest RFC](https://docs.google.com/document/d/1PYoHms9PPXR8V_5_T5-KXAhoFDKQYA8mTnmS12xkGOE/edit?tab=t.0).

This change makes the API Security sampler make decisions specific to a
given endpoint (method + route + response status code) instead of using
a simplistic sampling rate. This allows for improved coverage and
accuracy of schema extraction as part of API Security.

This change uses the sampler implementation from
github.com/DataDog/appsec-internal-go#39.
// We're already holding the maximium number of items, so we will rebuild
// in order to perform an eviction pass. Updates made in the meantime will
// be lost.
go m.rebuild(table, threshold)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure about this unbounded goroutine creation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only happen if m.rebuilding.CompareAndSwap(false, true) swaps... so there can be only 1 of these at a time.

Comment thread apisec/sampler_test.go
})
}

func BenchmarkSampler(b *testing.B) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add it to the benchmarking platform?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could... Would have to check how it's done...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, you just need to add the benchmark name in the config file at the root of the repo (gitlab file IIRC)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd-trace-go yes.... but in other repo we have to create a whole new pipeline in the multiple benchmarking-platform repository full of shell script...

If you really feel like it there is an attempt at doing this for go-libddwaf as a branch in there

Comment thread apisec/internal/timed/set.go Outdated
Comment thread apisec/internal/timed/set.go
Comment on lines +104 to +150
for {
var exists bool
entry, exists = table.FindEntry(key)
if exists {
// The entry already exists, so we can proceed...
break
}

// We're adding a new entry to the table, so we need to:
// 1. Ensure we have capacity (possibly trigger an eviction rebuild)
// 2. Claim the slot (or look for another slot if it's already claimed)
newCount := table.count.Add(1)
if newCount > config.MaxItemCount && m.rebuilding.CompareAndSwap(false, true) {
// We're already holding the maximium number of items, so we will rebuild
// in order to perform an eviction pass. Updates made in the meantime will
// be lost.
go m.rebuild(table, threshold)
}
if newCount > capacity {
// We don't have space to add any new item, so we'll ignore this and
// decide to DROP it (we may otherwise cause a surge of inconditional
// keep decisions, that is not desirable). This only happens in the most
// dire of circumstances (a table rebuild did not complete fast enough
// to make up free space).
table.count.Add(-1)
return false
}

if entry.Key.CompareAndSwap(0, key) {
// We have successfully claimed the slot, so now we can proceed to set it
// up. If we fail to swap, another goroutine has sampled this slot just
// before this one, so we can DROP the sample.
return entry.Data.CompareAndSwap(0, newEntryData(now, now))
}

if entry.Key.Load() == key {
// Another goroutine has already used this slot for this key, and very
// little time has passed since then, so we can DROP this sample... This
// is extremely unlikely to happen (and nearly impossible to reliably
// cover in unit tests).
return false
}

// Another goroutine has already used this slot for another key... We
// will try to find another slot then...
table.count.Add(-1)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop can definitively be made into a "LoadOrStore" sub function or equivalent

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried twice and it made things worse from a "following what's going on" standpoint...

Comment thread apisec/internal/timed/set.go Outdated
Comment thread apisec/internal/timed/set.go Outdated
Comment thread apisec/internal/timed/set.go Outdated
Comment thread apisec/internal/timed/set.go Outdated
Comment thread apisec/internal/timed/set.go
Comment thread apisec/internal/timed/table.go Outdated
Comment thread apisec/internal/timed/table.go Outdated
Comment thread apisec/internal/timed/table.go Outdated
Comment thread apisec/sampler_test.go
})
}

func BenchmarkSampler(b *testing.B) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd-trace-go yes.... but in other repo we have to create a whole new pipeline in the multiple benchmarking-platform repository full of shell script...

If you really feel like it there is an attempt at doing this for go-libddwaf as a branch in there

Comment thread apisec/sampler_test.go Outdated
RomainMuller and others added 2 commits March 31, 2025 10:02
Co-authored-by: Eliott Bouhana <47679741+eliottness@users.noreply.github.com>
@RomainMuller RomainMuller force-pushed the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch from 392cd58 to 9ff0f53 Compare March 31, 2025 08:44
@RomainMuller RomainMuller merged commit 5688dad into main Apr 2, 2025
41 checks passed
@RomainMuller RomainMuller deleted the romain.marcadier/APPSEC-56547/api-sec-sampling-v2 branch April 2, 2025 09:01
RomainMuller added a commit to DataDog/dd-trace-go that referenced this pull request Apr 3, 2025
Implementation of the new API Security sampler defined by the [latest RFC](https://docs.google.com/document/d/1PYoHms9PPXR8V_5_T5-KXAhoFDKQYA8mTnmS12xkGOE/edit?tab=t.0).

This change makes the API Security sampler make decisions specific to a given endpoint (method + route + response status code) instead of using a simplistic sampling rate. This allows for improved coverage and accuracy of schema extraction as part of API Security.

This change uses the sampler implementation from DataDog/appsec-internal-go#39.
darccio pushed a commit to DataDog/dd-trace-go that referenced this pull request Apr 10, 2025
Implementation of the new API Security sampler defined by the [latest RFC](https://docs.google.com/document/d/1PYoHms9PPXR8V_5_T5-KXAhoFDKQYA8mTnmS12xkGOE/edit?tab=t.0).

This change makes the API Security sampler make decisions specific to a given endpoint (method + route + response status code) instead of using a simplistic sampling rate. This allows for improved coverage and accuracy of schema extraction as part of API Security.

This change uses the sampler implementation from DataDog/appsec-internal-go#39.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants