Add BPF Batch Ops Methods #207

nathanjsweet · 2021-01-29T22:42:56Z

As of kernel v5.6 batch methods allow for the
fast lookup, deletion, and updating of bpf maps
so that the syscall overhead (repeatedly calling
into any of these methods) can be avoided.

Add support for BatchUpdate, BatchLookup, BatchLookupDelete, and BatchDelete
as well as tests for all of the above.

Signed-off-by: Nate Sweet [email protected]

lmb

Thank you for your PR, it's cool to have you contributing again :) I think it makes sense to expose the low level batch primitives. I'm a bit worried that Map is getting bigger and bigger, but NextKey, etc. is already there.

Looking at the PR I realised that the current situation with unmarshalBytes, unmarshalMap, marshalPtr is pretty confusing. I wanted to suggest folding some of the []*Map checks into unmarshalBytes but now I'm not so sure how that would work. I'll try to clean up that code and get back to you. Take a look at my comments on the test, maybe we can get rid of the new map / program unmarshaling in the first place.

map.go

map_test.go

run-tests.sh

map.go

prog.go

syscalls.go

map.go

ti-mo · 2021-02-11T13:46:08Z

Hi Nate, I realize I'm a little late to the party, sorry about that. I have some gripes with the API proposed here, but that should not necessarily be a blocker to merge this as long as we don't slap a version number on it.

It feels more like C than Go (taking mostly references as function arguments, etc.)
interface{} everywhere and generous use of reflect
It expects the caller to allocate everything (I know this was likely done to avoid allocs)
Function documentation is rather sparse and there are no examples provided at all

I would suggest a MapIterator-like approach here, although I would also like to propose some changes to MapIterator to reduce its reliance on interface{}.

Let's call it.. MapBatchIterator? Instead of Next() returning a slice of results, it pulls the results into an internal preallocated scratch buffer to avoid any iterator-related allocations. After a successful pull, the caller invokes Batch() or Results() or something similar, which returns the result set. The efficiency gains here may be non-existent, but it's meant to be consistent with what I had in mind for MapIterator, where this approach could eliminate any use of interface{} for primitive k/v types.

Iterator pagination state is kept internally, and must be reset using a Reset() method to restart the iteration, allowing them to be re-used. Iterators are not thread-safe, that would be up to the caller to implement.

I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general.

WDYT?

lmb · 2021-02-11T17:57:42Z

You're right, it's not a very nice API. For better or worse I think it's useful to have it though: there is always something lost in translation when we build a higher-level interface, some use case that we can't foresee. This kind of bare bones API is a pressure release valve for such situations: users can just drop down to the ugly interface and aren't blocked. Map.Update and Map.NextKey are other examples of such an API. Their downside is that the ugly interfaces make it harder to find and use the nicer ones. I don't have a good solution for this except moving low level stuff in to a separate package. That seems like a lot of boilerplate though.

So my take is: we should merge this once we've got the array vs hashmap ErrNotExist thing figured out.

I would suggest a MapIterator-like approach here, although I would also like to propose some changes to MapIterator to reduce its reliance on interface{}.
Let's call it.. MapBatchIterator? Instead of Next() returning a slice of results, it pulls the results into an internal preallocated scratch buffer to avoid any iterator-related allocations.

I kind of figured that we could make MapIterator use batch lookups behind the scenes, if they are available. So Next() still operates on single elements but in the background we only issue one batch lookup every X calls to Next.

Is this something you were considering @nathanjsweet?

I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general.

Can you create an issue and describe your ideas for MapIterator and/or key value typing? I agonized about interface{} in the beginning, but in practice I can't find much wrong with it. Also keep in mind that Go generics are on the horizon (another year?) which will surely make this a lot nicer.

nathanjsweet · 2021-02-11T19:11:08Z

Hi Nate, I realize I'm a little late to the party, sorry about that. I have some gripes with the API proposed here, but that should not necessarily be a blocker to merge this as long as we don't slap a version number on it.

It feels more like C than Go (taking mostly references as function arguments, etc.)
interface{} everywhere and generous use of reflect
It expects the caller to allocate everything (I know this was likely done to avoid allocs)
Function documentation is rather sparse and there are no examples provided at all

I share your concerns @ti-mo. I'm not wild about it either. All I can say is that I'm excited for generics to land. My only real defense is that I tried to be generous in the error messaging. Also, I wouldn't say there are no examples. The tests offer some decent examples for the 3 different map types batch supports.

I would suggest a MapIterator-like approach here, although I would also like to propose some changes to MapIterator to reduce its reliance on interface{}.

Let's call it.. MapBatchIterator? Instead of Next() returning a slice of results, it pulls the results into an internal preallocated scratch buffer to avoid any iterator-related allocations. After a successful pull, the caller invokes Batch() or Results() or something similar, which returns the result set. The efficiency gains here may be non-existent, but it's meant to be consistent with what I had in mind for MapIterator, where this approach could eliminate any use of interface{} for primitive k/v types.

Iterator pagination state is kept internally, and must be reset using a Reset() method to restart the iteration, allowing them to be re-used. Iterators are not thread-safe, that would be up to the caller to implement.

I'm not opposed to this at all. It's a good idea, but I really don't like obfuscating basic operations from the users of this library. There are just too many variables with using the batch operations that prevent us from abstracting them in a way that would be satisfactory to everyone. Philosophically, I don't think we should be too scared about lots of methods piling up. There are lots methods/operations in eBPF. I do think we can go beyond the harshness of libbpf, but I never want to hide basic functionality.

I know this omits a lot of the details like how we would design typing for arbitrary key/value types, but I think that's a separate discussion we should have for the lib in general.

We do need to have a better design philosophy that we can spell out in the repository. It's getting big enough that we should have a document we can all point to for justifying an approach. Maybe we could do a megathread on slack or have a zoom meeting.

lmb

I made the following change to the Array test:

diff --git a/map_test.go b/map_test.go
index 1fb5732..6abcedc 100644
--- a/map_test.go
+++ b/map_test.go
@@ -168,7 +168,7 @@ func TestBatchAPIArray(t *testing.T) {
 		Type:       Array,
 		KeySize:    4,
 		ValueSize:  4,
-		MaxEntries: 10,
+		MaxEntries: 2,
 	})
 	if err != nil {
 		t.Fatal(err)
@@ -218,6 +218,28 @@ func TestBatchAPIArray(t *testing.T) {
 		t.Errorf("BatchUpdate and BatchLookup values disagree: %v %v", values, lookupValues)
 	}
 
+	count, err = m.BatchLookup(uint32(1), &nextKey, lookupKeys, lookupValues, nil)
+	if !errors.Is(err, ErrKeyNotExist) {
+		t.Error("Expected ErrKeyNotExist when batch runs into end of array, got", err)
+	}
+	if count != 1 {
+		t.Error("Expected a single result, got", count)
+	}
+	if lookupKeys[0] != 1 {
+		t.Error("Expected first key to be 1, got", lookupKeys[0])
+	}
+	if lookupValues[0] != 4242 {
+		t.Error("Expected first value to be 4242, got", lookupValues[0])
+	}
+
+	count, err = m.BatchLookup(uint32(2), &nextKey, lookupKeys, lookupValues, nil)
+	if !errors.Is(err, ErrKeyNotExist) {
+		t.Error("Expected ErrKeyNotExist, got", err)
+	}
+	if count != 0 {
+		t.Error("Expected no result, got", count)
+	}
+
 	_, err = m.BatchLookupAndDelete(nil, &nextKey, deleteKeys, deleteValues, nil)
 	if !errors.Is(err, ErrBatchOpNotSup) {
 		t.Fatalf("BatchLookUpDelete: expected error %v, but got %v", ErrBatchOpNotSup, err)

This is roughly how I would expect the API to behave based on our conversation. Here is what I get:

=== RUN   TestBatchAPIArray
    /home/lorenz/dev/ebpf/map_test.go:223: Expected ErrKeyNotExist when batch runs into end of array, got <nil>
    /home/lorenz/dev/ebpf/map_test.go:226: Expected a single result, got 0
    /home/lorenz/dev/ebpf/map_test.go:229: Expected first key to be 1, got 0
    /home/lorenz/dev/ebpf/map_test.go:232: Expected first value to be 4242, got 0
    /home/lorenz/dev/ebpf/map_test.go:237: Expected ErrKeyNotExist, got <nil>
    /home/lorenz/dev/ebpf/map_test.go:240: Expected no result, got 2

PTAL.

map_test.go

map.go

nathanjsweet · 2021-02-17T23:11:58Z

This is roughly how I would expect the API to behave based on our conversation. Here is what I get:

=== RUN TestBatchAPIArray
/home/lorenz/dev/ebpf/map_test.go:223: Expected ErrKeyNotExist when batch runs into end of array, got
/home/lorenz/dev/ebpf/map_test.go:226: Expected a single result, got 0
/home/lorenz/dev/ebpf/map_test.go:229: Expected first key to be 1, got 0
/home/lorenz/dev/ebpf/map_test.go:232: Expected first value to be 4242, got 0
/home/lorenz/dev/ebpf/map_test.go:237: Expected ErrKeyNotExist, got
/home/lorenz/dev/ebpf/map_test.go:240: Expected no result, got 2

There's a couple of things missing from your assumptions.

startKey is excluded from the batch processing (i.e. it starts with the key after startKey).
The kernel returns ENOENT is used to indicate the end of the list even if a successful partial result set is returned. I decided against returning the error because it is not typical in golang to return an error even if everything went fine.

A possible workaround is that we can add a done return value that is "true" when the batch operation has reached the end of the map or array. Otherwise, I think the way the library is working maps to libbpf's behavior pretty cleanly.

lmb

startKey is excluded from the batch processing (i.e. it starts with the key after startKey).

Ah ok, that is quite confusing. Can you mention that in the docs? Maybe rename startKey -> prevKey?

The kernel returns ENOENT is used to indicate the end of the list even if a successful partial result set is returned. I decided against returning the error because it is not typical in golang to return an error even if everything went fine.

I think the API is already plenty weird, so returning ErrKeyNotExist (or some other sentinel) doesn't feel too onerous ;) It's important to be able to use the API in a generic fashion, for example as it stands it can't be used to optimize MapIterator.

A possible workaround is that we can add a done return value that is "true" when the batch operation has reached the end of the map or array.

That's what the lookup APIs were originally: Lookup(key, value) (bool, error). It turns out this is actually really cumbersome to use in the common case where we want to treat an absent key as an error:

var value uint32
if ok, err := m.Lookup(key, &value); !ok {
    return fmt.Errorf("doesn't exist")
} else if  err != nil {
   return fmt.Errorf("bla: %s", err)
}

// vs

var value uint32
if err := m.Lookup(key, value); err != nil {
    return fmt.Errorf("bla: %s", err) // NB: err already contains a stringification of key here for free
}

lmb · 2021-02-18T17:08:36Z

syscalls.go

@@ -345,6 +377,10 @@ func wrapMapError(err error) error {
 		return ErrKeyExist
 	}

+	if errors.Is(err, unix.ENOTSUPP) {
+		return ErrBatchOpNotSup


I didn't see this before, this is going to trigger false positives if some other map non-batch command returns ENOTSUPP. Why not just return ErrNotSupported in this case? Seems like you have a specific use case for ErrBatchOpNotSup in mind?

I did when we were doing this by map type, but now that we're not your suggestion is correct.

marshalers.go

nathanjsweet · 2021-02-22T18:15:36Z

@lmb I don't understand why the push test didn't work, but the PR one did.

As of kernel v5.6 batch methods allow for the fast lookup, deletion, and updating of bpf maps so that the syscall overhead (repeatedly calling into any of these methods) can be avoided. The batch methods are as follows: * BatchUpdate * BatchLookup * BatchLookupAndDelete * BatchDelete Only the "array" and "hash" types currently support batch operations, and the "array" type does not support batch deletion. Tests are in place to test every scenario and helper functions have been written to catch errors that normally the kernel would give to helpful to users of the library. Signed-off-by: Nate Sweet <[email protected]>

startKey is now called prevKey, remove mentions of the former.

lmb · 2021-02-23T10:39:50Z

@nathanjsweet I pushed three small clean up commits, if you agree with them feel free to squash + merge.

As a follow up to cilium#207, add support for PerCPU Hash and Array maps to the following methods: - BatchLookup() - BatchLookupAndDelete() - BatchUpdate() - BatchDelete() This provides a significant performance improvement by amortizing the overhead of the underlying syscall. In this change, the API contact for the batches is a flat slice of values []T: batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN In order to avoid confusion and panics for users, the library is strict about the expected lengths of slices passed to these methods, rather than padding slices to zeros or writing partial results. An alternative design that was considered was [][]T: batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...} []T was partly chosen as it matches the underlying semantics of the syscall, although without correctly aligned data it cannot be a zero copy pass through. Caveats: * Array maps of any type do not support batch delete. * Batched ops support for PerCPU Array Maps was only added in 5.13: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Alun Evans <[email protected]> Co-developed-by: Lorenz Bauer <[email protected]>

As a follow up to #207, add support for PerCPU Hash and Array maps to the following methods: - BatchLookup() - BatchLookupAndDelete() - BatchUpdate() - BatchDelete() This provides a significant performance improvement by amortizing the overhead of the underlying syscall. In this change, the API contact for the batches is a flat slice of values []T: batch0cpu0,batch0cpu1,..batch0cpuN,batch1cpu0...batchNcpuN In order to avoid confusion and panics for users, the library is strict about the expected lengths of slices passed to these methods, rather than padding slices to zeros or writing partial results. An alternative design that was considered was [][]T: batch0{cpu0,cpu1,..cpuN},batch1{...},..batchN{...} []T was partly chosen as it matches the underlying semantics of the syscall, although without correctly aligned data it cannot be a zero copy pass through. Caveats: * Array maps of any type do not support batch delete. * Batched ops support for PerCPU Array Maps was only added in 5.13: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Alun Evans <[email protected]> Co-developed-by: Lorenz Bauer <[email protected]>

lmb requested changes Feb 2, 2021

View reviewed changes

nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch 2 times, most recently from 5adb79a to 1ed71e7 Compare February 4, 2021 04:41

nathanjsweet marked this pull request as ready for review February 4, 2021 04:44

nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch 2 times, most recently from 0edde42 to e979c3c Compare February 4, 2021 18:21

lmb requested changes Feb 5, 2021

View reviewed changes

map.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch from e979c3c to aeb1ca1 Compare February 9, 2021 15:27

lmb mentioned this pull request Feb 12, 2021

add an ARCHITECTURE.md #215

Merged

lmb requested changes Feb 16, 2021

View reviewed changes

map_test.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

map.go Outdated Show resolved Hide resolved

nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch from aeb1ca1 to 97bb871 Compare February 17, 2021 23:04

lmb requested changes Feb 18, 2021

View reviewed changes

nathanjsweet force-pushed the pr/nathanjsweet/batch-map-ops branch 2 times, most recently from 682ad4e to dadfc3e Compare February 22, 2021 17:49

nathanjsweet and others added 3 commits February 23, 2021 10:31

map: remove mention of startKey from batch methods

847ab24

startKey is now called prevKey, remove mentions of the former.

map: remove stray fmt.Println

29d1a38

lmb force-pushed the pr/nathanjsweet/batch-map-ops branch from dadfc3e to b5f3b14 Compare February 23, 2021 10:39

lmb approved these changes Feb 23, 2021

View reviewed changes

map: remove left over per-CPU marshaler changes

95b312f

lmb force-pushed the pr/nathanjsweet/batch-map-ops branch from b5f3b14 to 95b312f Compare February 23, 2021 10:40

nathanjsweet merged commit 0ad1835 into master Feb 23, 2021

nathanjsweet deleted the pr/nathanjsweet/batch-map-ops branch February 23, 2021 17:36

kolyshkin mentioned this pull request May 24, 2021

cgroup2: devices filtering cleanup opencontainers/runc#2951

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BPF Batch Ops Methods #207

Add BPF Batch Ops Methods #207

nathanjsweet commented Jan 29, 2021

lmb left a comment

ti-mo commented Feb 11, 2021

lmb commented Feb 11, 2021

nathanjsweet commented Feb 11, 2021

lmb left a comment

nathanjsweet commented Feb 17, 2021 •

edited

Loading

lmb left a comment

lmb Feb 18, 2021

nathanjsweet Feb 18, 2021

nathanjsweet commented Feb 22, 2021

lmb commented Feb 23, 2021

Add BPF Batch Ops Methods #207

Add BPF Batch Ops Methods #207

Conversation

nathanjsweet commented Jan 29, 2021

lmb left a comment

Choose a reason for hiding this comment

ti-mo commented Feb 11, 2021

lmb commented Feb 11, 2021

nathanjsweet commented Feb 11, 2021

lmb left a comment

Choose a reason for hiding this comment

nathanjsweet commented Feb 17, 2021 • edited Loading

lmb left a comment

Choose a reason for hiding this comment

lmb Feb 18, 2021

Choose a reason for hiding this comment

nathanjsweet Feb 18, 2021

Choose a reason for hiding this comment

nathanjsweet commented Feb 22, 2021

lmb commented Feb 23, 2021

nathanjsweet commented Feb 17, 2021 •

edited

Loading