Border router restructure #4351

rohrerj · 2023-06-06T16:03:43Z

This pullrequest is the first out of 3 with the goal to restructure the border router as described in the Design Document.

This pullrequest contains the restructuring of the router/dataplane.go to use receiver, processing routines and forwarders instead of doing this all in a per interface goroutine and the buffer reuse support.

This change is

Restructure the border router into receiver, processing routines and forwarders.

matzf · 2023-06-07T08:24:39Z

Great stuff, thanks @rohrerj. I'll take a closer look at this tomorrow.

matzf

Reviewed 1 of 5 files at r1, all commit messages.
Reviewable status: 1 of 5 files reviewed, 25 unresolved discussions (waiting on @rohrerj)

a discussion (no related file):
Structure: the main run loop is spread across too many places, too many methods access fields of the Dataplane struct. This "global" state makes it harder than necessary to see that the main loops are actually quite simple. The configuration options (numProcRoutines etc.) are also fields in the Dataplane, which is a too wide scope for these.
Here's what I would prefer:

the Run function takes configuration values as parameters (e.g. in a struct RunConfig{BatchSize int, NumProcessorRoutines int} or something like that). This will make it easier to make these configurable later.
the main setup is contained in Run. All variables are defined here and passed / returned from free-standing helper functions; use methods only where it's too awkward to pass the data explicitly
use helpers to simplify the main loops, so that the important bits stand out, namely the way the I/O and processing calls and the moving of the buffers through the queues. The rest should

a discussion (no related file):
Currently packets are passed through the channels as pointer, *packet. Why not pass them by value, packet instead? It's only a little bit wider than a pointer to it and I would expect that this does not affect the performance of the channel.

This would allow to have the pool only take care of the []byte buffers, not the packets. The rest of the information is all passed by value and does not have to be recycled.
The advantage is that it's impossible to forget resetting a field. And it would seem to simplify the handling in a bunch of places.
Maybe it could then even make sense to use chan processResult for the forwarder queue.

router/dataplane.go line 598 at r1 (raw file):

	// The address to where we are forwarding the packet.
	// Will be set by the processing routine
	dstAddr *net.UDPAddr

Use netip.AddrPort for new code wherever possible.
Once golang/go#45886 or the corresponding functionality for the x/net package comes available, we'll use that to avoid the memory allocations for the addresses.

router/dataplane.go line 126 at r3 (raw file):

	// the random value is hashed together with the flowID and the address header
	// to identify the processing routine that is responsible to process a packet
	randomValue []byte

See comment on structure; none of these fields need to be here. These can all be local variables in the Run function instead.

router/dataplane.go line 484 at r3 (raw file):

	d.processorQueueSize = int(math.Max(
		float64(len(d.interfaces)*d.interfaceBatchSize/int(d.numProcRoutines)),
		float64(d.interfaceBatchSize)))

Don't use floats for this stuff, please. Define a max(a, b int) int somewhere or just use an if.

router/dataplane.go line 510 at r3 (raw file):

// initComponents initializes the channels, buffers, the receivers, the forwarders,
// processing routines and the bfd sessions
func (d *DataPlane) initComponents(ctx context.Context) {

This does all the work of the actual running, so the name initComponents seems deceptive; I think just having this all in Run is just fine.

router/dataplane.go line 584 at r3 (raw file):

		log.Debug("Forwarder error", "err", err)
	}
	log.Debug("all components terminated")

The shutdown logic of waiting for the goroutines to stop and closing the channels does not seem to add anything. It also seems to be in a race against the process termination.
Just fire-and-forget the go routines. Consequently, remove the error groups.

router/dataplane.go line 591 at r3 (raw file):

	Addr        net.Addr
	Conn        BatchConn
}

I don't think we need this struct. Addr is unused, so this is the same information as eg. a map[uint16]BatchConn which is already present in Dataplane.

router/dataplane.go line 614 at r3 (raw file):

	for d.running {
		// collect packets
		for newBufferCount+unusedBufferCount < d.interfaceBatchSize {

No need to maintain the newBufferCount variable:

Suggestion:

for i :=0; i < d.interfaceBatchSize - unusedBufferCount; i++ {

router/dataplane.go line 616 at r3 (raw file):

		for newBufferCount+unusedBufferCount < d.interfaceBatchSize {
			p := <-d.packetPool
			msgs[newBufferCount].Buffers[0] = p.rawPacket[:bufSize]

Suggestion:

p.rawPacket[:cap(p.rawPacket)]

router/dataplane.go line 622 at r3 (raw file):

		// read batch
		numPkts, err := ni.Conn.ReadBatch(msgs[:newBufferCount+unusedBufferCount])

It's always the full batch size

Suggestion:

ni.Conn.ReadBatch(msgs)

router/dataplane.go line 625 at r3 (raw file):

		if err != nil {
			log.Debug("Error while reading batch", "interfaceId", ni.InterfaceId, "err", err)
			unusedBufferCount += newBufferCount

Just set the unusedBufferCount directly after the read (before error handling), so it only needs to be updated once:
unusedBufferCount = len(msgs) - numPkts.

Also, variable naming: I'd suggest
numPkts -> numRead.
unusedBufferCount -> numUnused or numReusable

router/dataplane.go line 646 at r3 (raw file):

			if err != nil {
				log.Debug("Error while computing procId", "err", err)
				continue

oops, dropped packet.

router/dataplane.go line 663 at r3 (raw file):

func (d *DataPlane) computeProcId(data []byte) (uint32, error) {
	srcHostAddrLen := ((data[9] & 0x3) + 1) * 4

Panics on short data

router/dataplane.go line 664 at r3 (raw file):

func (d *DataPlane) computeProcId(data []byte) (uint32, error) {
	srcHostAddrLen := ((data[9] & 0x3) + 1) * 4
	dstHostAddrLen := ((data[9] >> 4 & 0x3) + 1) * 4

Can use slayers.AddrType
Also, maybe put dest first (dest type field is first in header)

Suggestion:

	dstHostAddrLen := slayers.AddrType(data[9] >> 4 & 0xf).Length()
	srcHostAddrLen := slayers.AddrType(data[9] & 0xf).Length()

router/dataplane.go line 665 at r3 (raw file):

	srcHostAddrLen := ((data[9] & 0x3) + 1) * 4
	dstHostAddrLen := ((data[9] >> 4 & 0x3) + 1) * 4
	addrHdrLen := 16 + int(srcHostAddrLen+dstHostAddrLen)

Suggestion:

addrHdrLen := 2*addr.IABytes + int(srcHostAddrLen+dstHostAddrLen)

router/dataplane.go line 676 at r3 (raw file):

	flowIDBuffer[0] &= 0xF // the left 4 bits don't belong to the flowID

	hasher := fnv.New32a()

flowIDBuffer, hasher very likely escape to the heap. Bad, should be pre-allocated and reused.

It's annoying that fnv only exposes the hash.Hasher interface. Conceptually, it would make sense to add the randomValue only once and then always restart from there. Oh well...

router/dataplane.go line 679 at r3 (raw file):

	hasher.Write(d.randomValue)
	hasher.Write(flowIDBuffer[:])
	hasher.Write(data[slayers.CmnHdrLen : slayers.CmnHdrLen+addrHdrLen+1])

What's with the +1?

router/dataplane.go line 712 at r3 (raw file):

			result.OutAddr = p.srcAddr
			// SCMP does not use the same buffer as we provide.
			// Because of that we have to copy it back to our buffer

Adapt the SCMP handling logic instead (i.e. do the copy there).
Also move the logic for the lines above ("SCMP go back the way they came") into the processor code. Here, on this level, we shouldn't have exceptional handling for SCMPs (except for debug log / metrics perhaps).

router/dataplane.go line 792 at r3 (raw file):

			prepareMsg(p, i)
			byteLen += len(p.rawPacket)
		}

This read-loop could nicely be extracted into a function readUpTo(chan packet, n int, []packet) int.

Code quote:

		p, ok := <-c
		if !ok {
			continue
		}
		prepareMsg(p, 0)
		byteLen = len(p.rawPacket)
		availablePacket := int(math.Min(float64(len(c)), float64(d.interfaceBatchSize-1)) + 1)
		for i := 1; i < availablePacket; i++ {
			p, ok = <-c
			if !ok {
				availablePacket = i
				break
			}
			prepareMsg(p, i)
			byteLen += len(p.rawPacket)
		}

router/dataplane.go line 793 at r3 (raw file):

			byteLen += len(p.rawPacket)
		}
		k, err := ni.Conn.WriteBatch(writeMsgs[:availablePacket], syscall.MSG_DONTWAIT)

This MSG_DONTWAIT is no longer necessary. This was simple way to avoid having a slow write blocking a read. As this is now decoupled, this can go away, and the EAGAIN error special case is not needed anymore.

Also, sendmmsg returns after the first error. We can skip the erroring packet without retry (I'd say it's too hard to figure out whether retry would work), but we need to retry all the remaining ones. For this, we ideally first check if we can add in some new packets from the channel, to always use a full batch size when possible.
sendmmsg returns an error only if the error occured on the first packet, so the error is effectively useless, we just need the number

router/dataplane.go line 832 at r3 (raw file):

type processResult struct {
	EgressID uint16
	OutConn  BatchConn

OutConn can be removed.

router/dataplane_test.go line 204 at r3 (raw file):

	numProcs := 10000
	dp.SetRandomValue(randomValue)
	dp.ConfigureProcChannels(numProcs, 1)

Make the hash computation a free function so you don't need to initialize the dataplane here.

router/dataplane_test.go line 210 at r3 (raw file):

		hasher.Write(randomValue)
		hasher.Write(flowBuf[1:4])
		hasher.Write(pktBuf[slayers.CmnHdrLen : slayers.CmnHdrLen+s.AddrHdrLen()+1])

Same +1 bug as in implementation?
Write RawSrc/DstAddr fields instead, then you don't even need to access the serialized packet to compute this.

router/export_test.go line 78 at r3 (raw file):

}

func (d *DataPlane) ComputeProcId(data []byte) (uint32, error) {

Your new tests are tests of the internals, and that's fine! To test the internals, it makes more sense to have a ..._test.go file with package router (vs package router_test). Then you have access to all the internals without jumping through these hoops of exposing all kinds of weird things here.

rohrerj

Reviewable status: 0 of 6 files reviewed, 25 unresolved discussions (waiting on @matzf)

a discussion (no related file):

Previously, matzf (Matthias Frei) wrote…

Structure: the main run loop is spread across too many places, too many methods access fields of the Dataplane struct. This "global" state makes it harder than necessary to see that the main loops are actually quite simple. The configuration options (numProcRoutines etc.) are also fields in the Dataplane, which is a too wide scope for these.
Here's what I would prefer:

the Run function takes configuration values as parameters (e.g. in a struct RunConfig{BatchSize int, NumProcessorRoutines int} or something like that). This will make it easier to make these configurable later.

the main setup is contained in Run. All variables are defined here and passed / returned from free-standing helper functions; use methods only where it's too awkward to pass the data explicitly

use helpers to simplify the main loops, so that the important bits stand out, namely the way the I/O and processing calls and the moving of the buffers through the queues. The rest should

I restructured the run functions. Is it better now?

a discussion (no related file):

Previously, matzf (Matthias Frei) wrote…

Currently packets are passed through the channels as pointer, *packet. Why not pass them by value, packet instead? It's only a little bit wider than a pointer to it and I would expect that this does not affect the performance of the channel.

This would allow to have the pool only take care of the []byte buffers, not the packets. The rest of the information is all passed by value and does not have to be recycled.
The advantage is that it's impossible to forget resetting a field. And it would seem to simplify the handling in a bunch of places.
Maybe it could then even make sense to use chan processResult for the forwarder queue.

I changed the implementation to pass packets instead of *packets and the buffer pool to store only byte slices.

router/dataplane.go line 598 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

Use netip.AddrPort for new code wherever possible.
Once golang/go#45886 or the corresponding functionality for the x/net package comes available, we'll use that to avoid the memory allocations for the addresses.

I think it makes more sense to change this as soon as that functionality is available because the ipv4.Message struct still requires a net.Addr address and I was not able to find an efficient way to convert from netip.AddrPort back to net.Addr.

router/dataplane.go line 126 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

See comment on structure; none of these fields need to be here. These can all be local variables in the Run function instead.

I moved all fields except those for the packet pool and the channels to a new struct called RunConfig.

router/dataplane.go line 484 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Don't use floats for this stuff, please. Define a max(a, b int) int somewhere or just use an if.

Done.

router/dataplane.go line 510 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

This does all the work of the actual running, so the name initComponents seems deceptive; I think just having this all in Run is just fine.

Done.

router/dataplane.go line 584 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

The shutdown logic of waiting for the goroutines to stop and closing the channels does not seem to add anything. It also seems to be in a race against the process termination.
Just fire-and-forget the go routines. Consequently, remove the error groups.

Done.

router/dataplane.go line 591 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

I don't think we need this struct. Addr is unused, so this is the same information as eg. a map[uint16]BatchConn which is already present in Dataplane.

Done.

router/dataplane.go line 614 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

No need to maintain the newBufferCount variable:

Done.

router/dataplane.go line 616 at r3 (raw file):

		for newBufferCount+unusedBufferCount < d.interfaceBatchSize {
			p := <-d.packetPool
			msgs[newBufferCount].Buffers[0] = p.rawPacket[:bufSize]

Done.

router/dataplane.go line 622 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

It's always the full batch size

Done.

router/dataplane.go line 625 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Just set the unusedBufferCount directly after the read (before error handling), so it only needs to be updated once:
unusedBufferCount = len(msgs) - numPkts.

Also, variable naming: I'd suggest
numPkts -> numRead.
unusedBufferCount -> numUnused or numReusable

Done.

router/dataplane.go line 646 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

oops, dropped packet.

Done.

router/dataplane.go line 663 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Panics on short data

Done.

router/dataplane.go line 664 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Can use slayers.AddrType
Also, maybe put dest first (dest type field is first in header)

Done.

router/dataplane.go line 665 at r3 (raw file):

	srcHostAddrLen := ((data[9] & 0x3) + 1) * 4
	dstHostAddrLen := ((data[9] >> 4 & 0x3) + 1) * 4
	addrHdrLen := 16 + int(srcHostAddrLen+dstHostAddrLen)

Done.

router/dataplane.go line 676 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

flowIDBuffer, hasher very likely escape to the heap. Bad, should be pre-allocated and reused.

It's annoying that fnv only exposes the hash.Hasher interface. Conceptually, it would make sense to add the randomValue only once and then always restart from there. Oh well...

I changed the implementation such that the calling function has to provide a hasher and a flowIDBuffer and the function just resets the hasher.

router/dataplane.go line 679 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

What's with the +1?

Should be removed now.

router/dataplane.go line 712 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Adapt the SCMP handling logic instead (i.e. do the copy there).
Also move the logic for the lines above ("SCMP go back the way they came") into the processor code. Here, on this level, we shouldn't have exceptional handling for SCMPs (except for debug log / metrics perhaps).

Moved the copy logic to the packSCMP() function.

router/dataplane.go line 792 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

This read-loop could nicely be extracted into a function readUpTo(chan packet, n int, []packet) int.

I created such a function.

router/dataplane.go line 793 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

This MSG_DONTWAIT is no longer necessary. This was simple way to avoid having a slow write blocking a read. As this is now decoupled, this can go away, and the EAGAIN error special case is not needed anymore.

Also, sendmmsg returns after the first error. We can skip the erroring packet without retry (I'd say it's too hard to figure out whether retry would work), but we need to retry all the remaining ones. For this, we ideally first check if we can add in some new packets from the channel, to always use a full batch size when possible.
sendmmsg returns an error only if the error occured on the first packet, so the error is effectively useless, we just need the number

I changed the implementation. Additionally to what you suggested I rearranged the packets to prevent packet reordering.
E.g. we have the packets 1, 2, 3, ... 8 in the queue and the batch [1,2,3,4]. If we have an error on packet 3, packet 1 and 2 were sent successfully, packet 3 is dropped and the next batch would look like [4,5,6,7].

router/dataplane.go line 832 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

OutConn can be removed.

We still have to identify the BFD case. Should I just add a new field "isBFD" or do you have any other suggestion?

router/dataplane_test.go line 204 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Make the hash computation a free function so you don't need to initialize the dataplane here.

Done.

router/dataplane_test.go line 210 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Same +1 bug as in implementation?
Write RawSrc/DstAddr fields instead, then you don't even need to access the serialized packet to compute this.

Instead of manually extracting the address header from the scion header I now use the s.SerializeAddrHeader() function.

router/export_test.go line 78 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Your new tests are tests of the internals, and that's fine! To test the internals, it makes more sense to have a ..._test.go file with package router (vs package router_test). Then you have access to all the internals without jumping through these hoops of exposing all kinds of weird things here.

I moved a new test file called dataplane_internal_test.go. With that I was able to remove all those functions here.

matzf

Reviewed 3 of 6 files at r4, all commit messages.
Reviewable status: 3 of 6 files reviewed, 19 unresolved discussions (waiting on @rohrerj)

a discussion (no related file):

Previously, rohrerj wrote…

I restructured the run functions. Is it better now?

👍

router/dataplane.go line 598 at r1 (raw file):

Previously, rohrerj wrote…

I think it makes more sense to change this as soon as that functionality is available because the ipv4.Message struct still requires a net.Addr address and I was not able to find an efficient way to convert from netip.AddrPort back to net.Addr.

Fair enough.

router/dataplane.go line 126 at r3 (raw file):

Previously, rohrerj wrote…

I moved all fields except those for the packet pool and the channels to a new struct called RunConfig.

Better. IMO it would be better still to move all of these into Run and pass them around explicitly, but ok to leave it.

At least the interfaces one needs to be fixed though; either remove it entirely, keep it as a local variable, or move it up to the existing fields external/internal and initialize it together with these.

router/dataplane.go line 792 at r3 (raw file):

Previously, rohrerj wrote…

I created such a function.

What I had in mind when I suggested this function was that this would block for the first packet (only) and then use non-blocking read using select to read the available entries.
Now we're using the channel len to determine the number of available packets and use max(1, len(c)) to wait block for at least one. This is not ideal; if the channel is initially empty, this will always read only a single message. However, it is possible (even likely) that multiple packets are added to the channel "simultaneously" (while this goroutine is sleeping) and so we needlessly use a low batch size.
As we now don't want to block when we have re-try packets, perhaps extend the interface with a number of packets for blocking read, func readUpTo(c chan<- packet, nBlocking, nNonblocking, int, msg[]ipv4.Message) int.

router/dataplane.go line 793 at r3 (raw file):

Previously, rohrerj wrote…

I changed the implementation. Additionally to what you suggested I rearranged the packets to prevent packet reordering.
E.g. we have the packets 1, 2, 3, ... 8 in the queue and the batch [1,2,3,4]. If we have an error on packet 3, packet 1 and 2 were sent successfully, packet 3 is dropped and the next batch would look like [4,5,6,7].

Great! The logic looks good now!
I think it could be clarified a bit though; instead of re-arranging right after reading, first return the processed buffers back to the pool, then we can simply move up the packets from the back (simply overwriting the packets previously there):

	numWritten, _ := conn.WriteBatch(msg, 0)
        // accounting for metrics
        bytesWritten := sumBytes(msg[:numWritten])
        // return consumed packets
        for i := 0; i < numWritten + 1 && i < len(msg); i++ {
          returnPacketToPool(msg[i].Buffer[0])
        }
        // move up the re-try packets
        for i := numWritten + 1, j :=0 ; i < len(msg); i++, j++ {
          msg[j].Buffer[0] = msg[i].Buffer[0]
        }

router/dataplane.go line 832 at r3 (raw file):

Previously, rohrerj wrote…

We still have to identify the BFD case. Should I just add a new field "isBFD" or do you have any other suggestion?

Make OutPkt nil.

router/dataplane.go line 473 at r4 (raw file):

	d.populateInterfaces()

	cfg.ProcessorQueueSize = max(

Make this a local variable instead.
Same for ForwarderQueueSize, initialized from InterfaceBatchSize.

router/dataplane.go line 484 at r4 (raw file):

			defer log.HandlePanic()
			d.runReceiver(interfaceId, conn, cfg)
		}(interfaceId, conn)

Perhaps the runXXX functions could be documented to be run as a goroutine and do the log.HandlePanic internally. Then this can be shortened for better readability:

for interfaceId, conn := range interfaces {
   interfaceId, conn := interfaceId, conn // sic
   go d.runReceiver(interfaceId, conn, cfg)
   go d.runForwarder(interfaceId, conn, cfg)
}

And the same for the other goroutine invocations here.

And btw, as pointed out in the comment on the fields of Dataplane, I think the overall structure would be even clearer if the queues are local variables here and the runXXX would take their queues explicitly here, e.g.

   go runReceiver(cfg, interfaceId, conn, processorQueues)
   go runForwarder(cfg, interfaceId, forwarderQueues[interfaceId], conn)
// ...

   go runProcessor(cfg, processorQueues[i], forwarderQueues)

router/dataplane.go line 520 at r4 (raw file):

	r.RandomValue = make([]byte, 16)
	if _, err := rand.Read(r.RandomValue); err != nil {
		log.Error("Error while generating random value", "err", err)

This should likely be fatal. It's not really a configuration property anyway, there is no point Perhaps it's easier to move the random value as a local into the Run func, then you don't need to add an error return value to this function.

Btw, the receivers could also have different random values each, so you could move this even further down.

router/dataplane.go line 576 at r4 (raw file):

	numReusable := 0 // unused buffers from previous loop
	metrics := d.forwardingMetrics[interfaceId]
	currentPacketsFromPool := make([][]byte, cfg.InterfaceBatchSize)

Same as with the forwarder loop, there doesn't seem to be a need to track this currentPacketsFromPool, all the information is in the msgs.

router/dataplane.go line 580 at r4 (raw file):

	hasher := fnv.New32a()

	preparePacketAndForward := func(k int, pkt ipv4.Message) {

Nit: the naming is a bit unfortunate here, as "forward" sounds like it would be enqueued for the forwarder. There's also not really any preparation going on. Maybe enqueueForProcessing?

router/dataplane.go line 585 at r4 (raw file):

		srcAddr := pkt.Addr.(*net.UDPAddr)
		currPkt := currentPacketsFromPool[k]

This currPkt is not needed here, just use the msg.Buffers[0] instead.

router/dataplane.go line 602 at r4 (raw file):

				IP:   srcAddr.IP,
				Port: srcAddr.Port,
			},

We don't need to copy srcAddr here, it's newly allocated on every read anyway (cannot be avoided) and can just be assigned.

router/dataplane.go line 603 at r4 (raw file):

				Port: srcAddr.Port,
			},
		}:

gofmt picks a rather puzzling way to indent this; it looks like this is closing the select brace and I was quite puzzled for a moment here.
Can you assign this to a local variable first?

outPkt := packet {
   ....
}
select {
case d.queue <- outPkt:
   ...
default
   ...
}

router/dataplane.go line 647 at r4 (raw file):

	}
	if addrHdrLen > 48 {
		return 0, serrors.New("Address header is invalid")

This is simply impossible, the format of the DL/SL fields is such that they can never be invalid. The check is also not really necessary as a longer address header wouldn't seem to break the logic below. If you want to keep the check, you can assert with a panic.

router/dataplane.go line 675 at r4 (raw file):

		processor.ingressID = p.ingress
		metrics := d.forwardingMetrics[p.ingress]
		result, err := processor.processPkt(p.rawPacket, p.srcAddr)

Previously the processor was tied to a single interface. As this is no longer the case, move the ingressID to the processPkt function interface and remove it from newPacketProcessor. Like all the other reused fields in there, it should now be reset in the reset() function.

(Complete aside: I strongly dislike all the mutable state in the scionPacketProcessor. But that's something for another day.)

router/dataplane.go line 703 at r4 (raw file):

		}
		p.rawPacket = result.OutPkt
		if result.OutAddr != nil {

This check is not needed here; p.dstAddr is the same type as result.OutAddr (both *net.UDPAddr). The nil check only needs to be there when assigning this to a net.Addr variable.

router/dataplane.go line 731 at r4 (raw file):

	successfullWrites := 0
	readPackets := 0
	availablePackets := 0

Move all variables into tightest scope possible. Only nextFreeIndex is stateful with the current logic (and I don't find this a particularly descriptive name, tbh).

router/dataplane.go line 749 at r4 (raw file):

		}
		byteLen := 0
		for _, pkt := range currentPacketsToSend[:successfullWrites] {

If I understand this correctly, this is now counting the wrong packets as the currentPacketsToSend has been rearranged. Perhaps a good fix is to do this accounting first and only then rearrange (see suggestion in other comment on the line with WriteBatch).

router/dataplane.go line 791 at r4 (raw file):

			msg[i].Addr = p.dstAddr
		}
		pkts[i] = &p

Surely this escapes to the heap, bad.
Btw. why do we even bother with keeping the []*packet? All the information we need is in the msg. Having only a single structure to deal with would simplify the logic a bit.

router/dataplane_test.go line 210 at r3 (raw file):

Previously, rohrerj wrote…

Instead of manually extracting the address header from the scion header I now use the s.SerializeAddrHeader() function.

💯

router/dataplane_test.go line 208 at r4 (raw file):

					func(ms underlayconn.Messages, flags int) (int, error) {
						if totalCount == 0 {
							return 0, nil

Shouldn't this be a test failure? More packets are written than expected.

router/export_test.go line 31 at r4 (raw file):

func GetMetrics() *Metrics {
	return metrics
}

Is this really needed, or something left-over from a previous variant?

rohrerj

Reviewable status: 2 of 6 files reviewed, 19 unresolved discussions (waiting on @matzf)

router/dataplane.go line 126 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Better. IMO it would be better still to move all of these into Run and pass them around explicitly, but ok to leave it.

At least the interfaces one needs to be fixed though; either remove it entirely, keep it as a local variable, or move it up to the existing fields external/internal and initialize it together with these.

Now it will be initialized together with external / internal when AddInternalInterface() / AddExternalInterface() is called.

router/dataplane.go line 792 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

What I had in mind when I suggested this function was that this would block for the first packet (only) and then use non-blocking read using select to read the available entries.
Now we're using the channel len to determine the number of available packets and use max(1, len(c)) to wait block for at least one. This is not ideal; if the channel is initially empty, this will always read only a single message. However, it is possible (even likely) that multiple packets are added to the channel "simultaneously" (while this goroutine is sleeping) and so we needlessly use a low batch size.
As we now don't want to block when we have re-try packets, perhaps extend the interface with a number of packets for blocking read, func readUpTo(c chan<- packet, nBlocking, nNonblocking, int, msg[]ipv4.Message) int.

Done.

router/dataplane.go line 793 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Great! The logic looks good now!
I think it could be clarified a bit though; instead of re-arranging right after reading, first return the processed buffers back to the pool, then we can simply move up the packets from the back (simply overwriting the packets previously there):
	numWritten, _ := conn.WriteBatch(msg, 0)
        // accounting for metrics
        bytesWritten := sumBytes(msg[:numWritten])
        // return consumed packets
        for i := 0; i < numWritten + 1 && i < len(msg); i++ {
          returnPacketToPool(msg[i].Buffer[0])
        }
        // move up the re-try packets
        for i := numWritten + 1, j :=0 ; i < len(msg); i++, j++ {
          msg[j].Buffer[0] = msg[i].Buffer[0]
        }

Done.

router/dataplane.go line 832 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Make OutPkt nil.

Done.

router/dataplane.go line 473 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Make this a local variable instead.
Same for ForwarderQueueSize, initialized from InterfaceBatchSize.

Done.

router/dataplane.go line 484 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Perhaps the runXXX functions could be documented to be run as a goroutine and do the log.HandlePanic internally. Then this can be shortened for better readability:
for interfaceId, conn := range interfaces {
   interfaceId, conn := interfaceId, conn // sic
   go d.runReceiver(interfaceId, conn, cfg)
   go d.runForwarder(interfaceId, conn, cfg)
}
And the same for the other goroutine invocations here.

And btw, as pointed out in the comment on the fields of Dataplane, I think the overall structure would be even clearer if the queues are local variables here and the runXXX would take their queues explicitly here, e.g.
   go runReceiver(cfg, interfaceId, conn, processorQueues)
   go runForwarder(cfg, interfaceId, forwarderQueues[interfaceId], conn)
// ...

   go runProcessor(cfg, processorQueues[i], forwarderQueues)

The top part will not compile: "go statement should always call a func lit. (gocall)".
Regarding the lower part, I changed the code to pass de processor and forwarder queues instead of storing them in the dataplane.

router/dataplane.go line 520 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

This should likely be fatal. It's not really a configuration property anyway, there is no point Perhaps it's easier to move the random value as a local into the Run func, then you don't need to add an error return value to this function.

Btw, the receivers could also have different random values each, so you could move this even further down.

The random variable is now a local variable initialized in the runReceiver function. If it has an error it panics.

router/dataplane.go line 576 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Same as with the forwarder loop, there doesn't seem to be a need to track this currentPacketsFromPool, all the information is in the msgs.

You are right. Since we changed the buffer pool to store only byte slices instead of packets there is no use for this anymore.

router/dataplane.go line 580 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: the naming is a bit unfortunate here, as "forward" sounds like it would be enqueued for the forwarder. There's also not really any preparation going on. Maybe enqueueForProcessing?

Renamed.

router/dataplane.go line 585 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

This currPkt is not needed here, just use the msg.Buffers[0] instead.

I changed the function to pass already the currPkt instead of the index.

router/dataplane.go line 602 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

We don't need to copy srcAddr here, it's newly allocated on every read anyway (cannot be avoided) and can just be assigned.

Done.

router/dataplane.go line 603 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

gofmt picks a rather puzzling way to indent this; it looks like this is closing the select brace and I was quite puzzled for a moment here.
Can you assign this to a local variable first?
outPkt := packet {
   ....
}
select {
case d.queue <- outPkt:
   ...
default
   ...
}

Done.

router/dataplane.go line 647 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

This is simply impossible, the format of the DL/SL fields is such that they can never be invalid. The check is also not really necessary as a longer address header wouldn't seem to break the logic below. If you want to keep the check, you can assert with a panic.

Removed that if condition.

router/dataplane.go line 675 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Previously the processor was tied to a single interface. As this is no longer the case, move the ingressID to the processPkt function interface and remove it from newPacketProcessor. Like all the other reused fields in there, it should now be reset in the reset() function.

(Complete aside: I strongly dislike all the mutable state in the scionPacketProcessor. But that's something for another day.)

I removed the ingress parameter from the newPacketProcessor function and added it to the processPkt function. But I don't think it is necessary to add it to the reset() function since we always overwrite it at the beginning of the processPkt function. (same as for srcAddr).

router/dataplane.go line 703 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

This check is not needed here; p.dstAddr is the same type as result.OutAddr (both *net.UDPAddr). The nil check only needs to be there when assigning this to a net.Addr variable.

You are right. This is a copy-leftover from the "old" router.

router/dataplane.go line 731 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Move all variables into tightest scope possible. Only nextFreeIndex is stateful with the current logic (and I don't find this a particularly descriptive name, tbh).

Right. I renamed it to "batchFillStartIndex".

router/dataplane.go line 749 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

If I understand this correctly, this is now counting the wrong packets as the currentPacketsToSend has been rearranged. Perhaps a good fix is to do this accounting first and only then rearrange (see suggestion in other comment on the line with WriteBatch).

Done.

router/dataplane.go line 791 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Surely this escapes to the heap, bad.
Btw. why do we even bother with keeping the []*packet? All the information we need is in the msg. Having only a single structure to deal with would simplify the logic a bit.

Done.

router/dataplane_test.go line 208 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Shouldn't this be a test failure? More packets are written than expected.

Added a test failure.

router/export_test.go line 31 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Is this really needed, or something left-over from a previous variant?

The function NewMetrics can only be called once. Otherwise it will throw an error. Because of that I initialized it here and added a function that gives access to the metrics for both the internal test and the normal test.

matzf

Reviewable status: 2 of 6 files reviewed, 7 unresolved discussions (waiting on @rohrerj)

router/dataplane.go line 792 at r3 (raw file):

Previously, rohrerj wrote…

Done.

Nit: Looking at this and how its used now, it seems that it would be somewhat more natural to pass the total number of packets to read instead of the number of non-blocking. Or alternatively, create a separate function for the blocking vs the non-blocking read.

router/dataplane.go line 484 at r4 (raw file):

Previously, rohrerj wrote…

The top part will not compile: "go statement should always call a func lit. (gocall)".
Regarding the lower part, I changed the code to pass de processor and forwarder queues instead of storing them in the dataplane.

Meh, linter getting in the way, ok.

router/dataplane.go line 585 at r4 (raw file):

Previously, rohrerj wrote…

I changed the function to pass already the currPkt instead of the index.

Better, it's still redundant though. currPkt == pkt.Buffers[0].

Btw, rename pkt -> msg (consistency)

router/dataplane.go line 675 at r4 (raw file):

Previously, rohrerj wrote…

I removed the ingress parameter from the newPacketProcessor function and added it to the processPkt function. But I don't think it is necessary to add it to the reset() function since we always overwrite it at the beginning of the processPkt function. (same as for srcAddr).

Ok. Also remove the processor.ingressID = p.ingress above.
The reset() is a bit defensive as the recycling of the state is just inherently somewhat dangerous and its easy to mess up. I'd prefer to keep resetting everything, even if it's pointless.

router/dataplane.go line 731 at r4 (raw file):

Previously, rohrerj wrote…

Right. I renamed it to "batchFillStartIndex".

Hmm, that doesn't sound very good to me either, maybe just remaining?

Generally the identifiers used here look a bit jarring. Suggestion:
batchFillStartIndex -> remaining
availablePackets -> available
k -> written
byteLen -> writtenBytes or bytesWritten

router/dataplane.go line 645 at r5 (raw file):

}

func (d *DataPlane) runProcessingRoutine(id int, c chan packet,

c <-chan packet (read-only)

Ideally, we'd also make the forwarder queues write-only, fwChans map[uint16]chan<- packet, and the same for the processing queues in the receiver routines. I don't see a very elegant way to do this without making the initialization overly clumsy.

router/dataplane.go line 698 at r5 (raw file):

func (d *DataPlane) runForwarder(interfaceId uint16, conn BatchConn,
	cfg *RunConfig, c chan packet) {

<-chan packet (read only)

router/dataplane.go line 748 at r5 (raw file):

func readUpTo(c chan packet, nBlocking int, nNonBlocking int, msg []ipv4.Message) int {
	assign := func(p *packet, m *ipv4.Message) {

Sure this p does not escape to the heap here? Pass p packet to make sure.

router/dataplane.go line 1613 at r5 (raw file):

		}
		// OHP should always be directed to the correct BR.
		if _, ok := p.d.external[ohp.FirstHop.ConsEgress]; ok {

This check safely be removed now. There is an explicit check for the validity of the ConsEgress above and this is sufficient as long as the router's internal state is consistent. Also, the same check is now done when looking up the forwarder-queue.

rohrerj

Reviewable status: 2 of 6 files reviewed, 7 unresolved discussions (waiting on @matzf)

router/dataplane.go line 585 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Better, it's still redundant though. currPkt == pkt.Buffers[0].

Btw, rename pkt -> msg (consistency)

You are totally right. I removed it.

router/dataplane.go line 675 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Ok. Also remove the processor.ingressID = p.ingress above.
The reset() is a bit defensive as the recycling of the state is just inherently somewhat dangerous and its easy to mess up. I'd prefer to keep resetting everything, even if it's pointless.

Then my question is to what should we reset it? Unlike the other fields we cannot set it to nil, and if we set it to 0, we cannot distinguish between a reseted 0 and a normal 0. So I don't really see the point of reseting it. As I wrote above, you also don't reset srcAddr.

router/dataplane.go line 731 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Hmm, that doesn't sound very good to me either, maybe just remaining?

Generally the identifiers used here look a bit jarring. Suggestion:
batchFillStartIndex -> remaining
availablePackets -> available
k -> written
byteLen -> writtenBytes or bytesWritten

Done.

router/dataplane.go line 645 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

c <-chan packet (read-only)

Ideally, we'd also make the forwarder queues write-only, fwChans map[uint16]chan<- packet, and the same for the processing queues in the receiver routines. I don't see a very elegant way to do this without making the initialization overly clumsy.

Done.

router/dataplane.go line 698 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

<-chan packet (read only)

Done.

router/dataplane.go line 748 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Sure this p does not escape to the heap here? Pass p packet to make sure.

Done.

router/dataplane.go line 1613 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

This check safely be removed now. There is an explicit check for the validity of the ConsEgress above and this is sufficient as long as the router's internal state is consistent. Also, the same check is now done when looking up the forwarder-queue.

Done.

matzf

Reviewed 3 of 4 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @rohrerj)

router/dataplane.go line 675 at r4 (raw file):

Previously, rohrerj wrote…

Then my question is to what should we reset it? Unlike the other fields we cannot set it to nil, and if we set it to 0, we cannot distinguish between a reseted 0 and a normal 0. So I don't really see the point of reseting it. As I wrote above, you also don't reset srcAddr.

Ah, sorry, I misunderstood and didn't see that srcAddr is also not reset. My preference would be to reset it anyway (srcAddr to nil, interfaceID to -1), but not worth a long discussion.

router/dataplane.go line 453 at r6 (raw file):

	NumProcessorRoutines int
	InterfaceBatchSize   int
}

Collecting a bunch of comments on identifier names here:
-NumProcessorRoutines: everywhere else it's process_ing_. Keep it consistent, NumProcessingRoutines. To me, also just shorter NumProcessors and runProcessor etc. would seem good.

InterfaceBatchSize -> BatchSize
interfaceId -> ifID (I know it's not entirely consistent in the existing code, but ifID is the preferred form
...Id -> ...ID
initializePacketPool, initializeChannels -> shorten to init.....
forwardingChannels, fwChannels, procChannels -> better queue, initQueues, fwQs, procQs, etc.

router/dataplane.go line 620 at r6 (raw file):

func computeProcId(data []byte, numProcRoutines int, randomValue []byte,
	flowIDBuffer [3]byte, hasher hash.Hash32) (uint32, error) {

flowIDBuffer should be passed as a slice here; not entirely sure, but I believe this will now allocate on the heap on each call because of the hasher.Write.

router/dataplane.go line 642 at r6 (raw file):

func (d *DataPlane) returnPacketToPool(pkt []byte) {
	d.packetPool <- pkt[:bufSize]

Nit: pkt[:cap(pkt)] might be clearer.

More important: decide on whether to reset the length here or when popping from the pool in the receiver. We currently do it twice.

router/dataplane.go line 727 at r6 (raw file):

		for i, pkt := range writeMsgs[:written] {
			writtenBytes += len(pkt.Buffers[0])
			d.returnPacketToPool(writeMsgs[i].Buffers[0])

Nit: use either pkt or writeMsgs[i], not both. My preference iswriteMsgs[i] (as it seems more consistent with the expressions int the written!=available case) and drop the pkt identifier.

router/dataplane_internal_test.go line 75 at r5 (raw file):

						spkt, gopacket.Payload(payload))
					require.NoError(t, err)
					raw := buffer.Bytes()

Replace prepBaseMsg with a helper that returns a fully serialized SCION message. Here we don't care at all what is in this message, so hide everything away.
In fact, I think this doesn't even have to be a proper SCION packet, as long as it's sufficiently long for the computeProcID to succeed.

router/dataplane_internal_test.go line 94 at r5 (raw file):

		return ret
	}
	dp := prepareDP(ctrl)

If we don't have multiple subtests, then there is no reason to keep the prepDP in a func. Just inline it here, and leave out everything that is not relevant for this testcase.

router/dataplane_internal_test.go line 111 at r5 (raw file):

		case <-procCh[0]:
			// make sure that the pool size has decreased
			assert.Greater(t, initialPoolSize, len(dp.packetPool))

Also check that the buffer and address are returned correctly

router/dataplane_internal_test.go line 146 at r5 (raw file):

					totalCount--
					// 1/5 of the packets (randomly chosen) are errors
					if scrypto.RandInt64()%5 == 0 {

The scrypto stuff is a bit odd. Just math/rand suffices.

router/dataplane_internal_test.go line 159 at r5 (raw file):

							done <- struct{}{}
						}
						expectedPktId++

Perhaps also add a check for the dstAddr (both for nil and non-nil case).

router/dataplane_internal_test.go line 223 at r5 (raw file):

		hasher.Write(randomValue)
		hasher.Write(flowBuf[1:4])
		if err := s.SerializeAddrHdr(tmpBuffer); err != nil {

No need to be efficient here, allocate the tmpBuffer := make([]byte, s.AddrHdrLen()] here for max simplicity.
Also move the extraction of the flow ID into this function. Then the signature is just func(s *slayers.SCION) uint32.

router/dataplane_internal_test.go line 263 at r5 (raw file):

	dpath.HopFields[2].Mac = computeMAC(t, key,
		dpath.InfoFields[0], dpath.HopFields[2])
	spkt.Path = dpath

Again, use a helper function to create some opaque SCION packet here, the path etc is not relevant here.

router/dataplane_internal_test.go line 283 at r5 (raw file):

		compareHash(payload, spkt)
		spkt.TrafficClass++
	}

Great that this is testing on multiple inputs.
Subtests are typically organized the other way around; instead of defining a helper function that does the actual check, we typically create tables/maps/... (whatever works) with test cases and then have a loop that contains the test logic with each defined test case. This helps to make it very obvious which state (ideally non) that is kept between subtests and allows the test runner to identify the failing subtest cases. This is an article about this idea, linked in our style guide: https://dave.cheney.net/2019/05/07/prefer-table-driven-tests

There are many ways to express this, just as an example of how this could look like here:

testCases := map[string]struct {
  pkt *slayers.SCION 
} {
   "base": {
     pkt: prepBasePkt(),
   },
   "traffic class 1": {
     pkt: func() *slayer.SCION { 
        pkt := prepBasePkt()
        pkt.TrafficClass = 1 
        return pkt
      }(),
     }
}

for name, tc := range testCases {
   t.Run(name, func(t *testing.T) {
       expected := hashForSCIONPacket(tc.pkt)
       raw :=  // serialize tc.pkt
       actual := computeProcID(raw, ...)
       assert.Equal(t, expected, actual)
   })
}

What I see is currently missing from this test are:

error cases (i.e. truncated messages).
cases with different address types (IPv4, IPv6, SVC, or even for currently undefined ones)

rohrerj

Reviewable status: 2 of 6 files reviewed, 14 unresolved discussions (waiting on @matzf)

router/dataplane.go line 675 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Ah, sorry, I misunderstood and didn't see that srcAddr is also not reset. My preference would be to reset it anyway (srcAddr to nil, interfaceID to -1), but not worth a long discussion.

We use as a datatype to store the interface ids uint16. Because of that we cannot set it to -1. But to not overcomplicate that change I reset it now to 0.

router/dataplane.go line 453 at r6 (raw file):

Previously, matzf (Matthias Frei) wrote…

Collecting a bunch of comments on identifier names here:
-NumProcessorRoutines: everywhere else it's process_ing_. Keep it consistent, NumProcessingRoutines. To me, also just shorter NumProcessors and runProcessor etc. would seem good.

InterfaceBatchSize -> BatchSize

interfaceId -> ifID (I know it's not entirely consistent in the existing code, but ifID is the preferred form

...Id -> ...ID

initializePacketPool, initializeChannels -> shorten to init.....

forwardingChannels, fwChannels, procChannels -> better queue, initQueues, fwQs, procQs, etc.

I renamed everything I found within my code.

router/dataplane.go line 620 at r6 (raw file):

Previously, matzf (Matthias Frei) wrote…

flowIDBuffer should be passed as a slice here; not entirely sure, but I believe this will now allocate on the heap on each call because of the hasher.Write.

Done.

router/dataplane.go line 642 at r6 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: pkt[:cap(pkt)] might be clearer.

More important: decide on whether to reset the length here or when popping from the pool in the receiver. We currently do it twice.

Right. I decided to reset the length in the returnPacketToPool function.

router/dataplane_internal_test.go line 75 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Replace prepBaseMsg with a helper that returns a fully serialized SCION message. Here we don't care at all what is in this message, so hide everything away.
In fact, I think this doesn't even have to be a proper SCION packet, as long as it's sufficiently long for the computeProcID to succeed.

I changed the preBaseMsg function to include a lot of stuff that we don't really care about. But I think it still makes sense to return a *slayers.SCION because the TestComputeProcId function still requires it for easier comparison.

router/dataplane_internal_test.go line 94 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

If we don't have multiple subtests, then there is no reason to keep the prepDP in a func. Just inline it here, and leave out everything that is not relevant for this testcase.

Done.

router/dataplane_internal_test.go line 111 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Also check that the buffer and address are returned correctly

Done.

router/dataplane_internal_test.go line 146 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

The scrypto stuff is a bit odd. Just math/rand suffices.

Done.

router/dataplane_internal_test.go line 159 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Perhaps also add a check for the dstAddr (both for nil and non-nil case).

Done.

router/dataplane_internal_test.go line 223 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

No need to be efficient here, allocate the tmpBuffer := make([]byte, s.AddrHdrLen()] here for max simplicity.
Also move the extraction of the flow ID into this function. Then the signature is just func(s *slayers.SCION) uint32.

Done.

router/dataplane_internal_test.go line 263 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Again, use a helper function to create some opaque SCION packet here, the path etc is not relevant here.

Done.

router/dataplane_internal_test.go line 283 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Great that this is testing on multiple inputs.
Subtests are typically organized the other way around; instead of defining a helper function that does the actual check, we typically create tables/maps/... (whatever works) with test cases and then have a loop that contains the test logic with each defined test case. This helps to make it very obvious which state (ideally non) that is kept between subtests and allows the test runner to identify the failing subtest cases. This is an article about this idea, linked in our style guide: https://dave.cheney.net/2019/05/07/prefer-table-driven-tests

There are many ways to express this, just as an example of how this could look like here:
testCases := map[string]struct {
  pkt *slayers.SCION 
} {
   "base": {
     pkt: prepBasePkt(),
   },
   "traffic class 1": {
     pkt: func() *slayer.SCION { 
        pkt := prepBasePkt()
        pkt.TrafficClass = 1 
        return pkt
      }(),
     }
}

for name, tc := range testCases {
   t.Run(name, func(t *testing.T) {
       expected := hashForSCIONPacket(tc.pkt)
       raw :=  // serialize tc.pkt
       actual := computeProcID(raw, ...)
       assert.Equal(t, expected, actual)
   })
}
What I see is currently missing from this test are:

error cases (i.e. truncated messages).

cases with different address types (IPv4, IPv6, SVC, or even for currently undefined ones)

Thank you for pointing this out, I have seen this already in some other tests. I thought because it was just something small that this was not necessary. But since I added now some additional cases I restructured the test to apply this approach.

matzf

Reviewed 4 of 4 files at r7, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @rohrerj)

router/dataplane_internal_test.go line 75 at r5 (raw file):

Previously, rohrerj wrote…

I changed the preBaseMsg function to include a lot of stuff that we don't really care about. But I think it still makes sense to return a *slayers.SCION because the TestComputeProcId function still requires it for easier comparison.

Just create a separate helper function for that too that includes the serialization part, prepRawMsg or something like that.

router/dataplane_internal_test.go line 94 at r5 (raw file):

Previously, rohrerj wrote…

Done.

Are SetIA and SetKey really needed?

router/dataplane_internal_test.go line 113 at r7 (raw file):

			}
		}
	}

At the end, we can check that the pool size has decreased by exactly the number of packets read. And perhaps it could also make sense to verify that each packet buffer is returned exactly once (e.g by stuffing reflect.ValueOf(pkt.rawPacket).Pointer() into a set map[uinptr]struct{}).

router/dataplane_internal_test.go line 151 at r7 (raw file):

						}
						if totalCount <= 100 {
							if m.Addr == nil {

assert.True(t, m.Addr == nil) for nil case, and add a comment like // stronger check than assert.Nil
assert.NotNil(t, m.Addr) for not-nil (which is stronger than m.Addr != nil).

router/dataplane_internal_test.go line 233 at r7 (raw file):

		hasher.Write(flowBuf[1:4])
		if err := s.SerializeAddrHdr(tmpBuffer); err != nil {
			return 0

panic / fail test here

router/dataplane_internal_test.go line 254 at r7 (raw file):

		val2, err := computeProcID(raw, numProcs, randomValue, flowIdBuffer, hasher)
		assert.NoError(t, err)
		assert.Equal(t, val1, val2)

This is now not using the right t to show failing subtests (t here is the main test case, not a subtest).

Generally, just pass the test case t *testing.T as first argument to such helper functions. Here, there seems to be a better approach though: instead of doing the comparison in compareHash, change this into a helper function to call computeProcID (i.e. remove the call to hashForScionPacket here).
Then the invocation in the test loop becomes:

rs := tc(t);
expected := referenceHash(rs[0].s) // renamed from "hashForSionPacket"
for i, r := range rs {
	actual, err := computeProcIDHelper(r.payload, r.s)
        assert.NoError(t, err)
        assert.Equal(t, expected, actual)
}

router/dataplane_internal_test.go line 477 at r7 (raw file):

	dpath.HopFields[2].Mac = computeMAC(t, []byte("testkey_xxxxxxxx"),
		dpath.InfoFields[0], dpath.HopFields[2])
	spkt.Path = dpath

Just drop mac computation, it's not necessary here.

rohrerj

Reviewable status: 5 of 6 files reviewed, 8 unresolved discussions (waiting on @matzf)

router/dataplane_internal_test.go line 75 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Just create a separate helper function for that too that includes the serialization part, prepRawMsg or something like that.

Done.

router/dataplane_internal_test.go line 94 at r5 (raw file):

Previously, matzf (Matthias Frei) wrote…

Are SetIA and SetKey really needed?

No. You are right. This was stilll a leftover when I copied that part from another test. Since we don't call the packet processing function we don't need that. Removed.

router/dataplane_internal_test.go line 113 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

At the end, we can check that the pool size has decreased by exactly the number of packets read. And perhaps it could also make sense to verify that each packet buffer is returned exactly once (e.g by stuffing reflect.ValueOf(pkt.rawPacket).Pointer() into a set map[uinptr]struct{}).

Done.

router/dataplane_internal_test.go line 151 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

assert.True(t, m.Addr == nil) for nil case, and add a comment like // stronger check than assert.Nil
assert.NotNil(t, m.Addr) for not-nil (which is stronger than m.Addr != nil).

Done.

router/dataplane_internal_test.go line 233 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

panic / fail test here

Done.

router/dataplane_internal_test.go line 254 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

This is now not using the right t to show failing subtests (t here is the main test case, not a subtest).

Generally, just pass the test case t *testing.T as first argument to such helper functions. Here, there seems to be a better approach though: instead of doing the comparison in compareHash, change this into a helper function to call computeProcID (i.e. remove the call to hashForScionPacket here).
Then the invocation in the test loop becomes:
rs := tc(t);
expected := referenceHash(rs[0].s) // renamed from "hashForSionPacket"
for i, r := range rs {
	actual, err := computeProcIDHelper(r.payload, r.s)
        assert.NoError(t, err)
        assert.Equal(t, expected, actual)
}

Right, this look cleaner.

router/dataplane_internal_test.go line 477 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

Just drop mac computation, it's not necessary here.

Done.

matzf

Reviewed 1 of 1 files at r8.
Reviewable status: all files reviewed (commit messages unreviewed), 4 unresolved discussions (waiting on @rohrerj)

router/dataplane_internal_test.go line 151 at r7 (raw file):

Previously, rohrerj wrote…

Done.

For the second check assert.NotNil would be better as it also checks that the value contained in the interface is non-nil.

router/dataplane_internal_test.go line 105 at r8 (raw file):

		}
	}
	time.Sleep(time.Millisecond)

Is this sleep to wait for the receiver to acquire new packets from the pool?
Sleeps like this are always finicky as it's just so easy to fail on slow or over-utilized machines. Would it make sense to wait for a signal from the third ReadBatch call instead?

router/dataplane_internal_test.go line 240 at r8 (raw file):

		val2, err := computeProcID(raw, numProcs, randomValue, flowIdBuffer, hasher)
		return val2, err

Nit: just return computeProcID(...).

rohrerj

Reviewable status: 5 of 6 files reviewed, 3 unresolved discussions (waiting on @matzf)

router/dataplane_internal_test.go line 151 at r7 (raw file):

Previously, matzf (Matthias Frei) wrote…

For the second check assert.NotNil would be better as it also checks that the value contained in the interface is non-nil.

Sorry I had accidentially changed both to assert.True(). Should be fixed now.

router/dataplane_internal_test.go line 105 at r8 (raw file):

Previously, matzf (Matthias Frei) wrote…

Is this sleep to wait for the receiver to acquire new packets from the pool?
Sleeps like this are always finicky as it's just so easy to fail on slow or over-utilized machines. Would it make sense to wait for a signal from the third ReadBatch call instead?

I added a done channel to signal when its ready.

rohrerj

Reviewable status: 4 of 6 files reviewed, 3 unresolved discussions (waiting on @matzf)

router/dataplane.go line 792 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Nit: Looking at this and how its used now, it seems that it would be somewhat more natural to pass the total number of packets to read instead of the number of non-blocking. Or alternatively, create a separate function for the blocking vs the non-blocking read.

Sorry, I was not sure whether this was just a comment or whether you asked me to fix this. I changed now the readUpTo function to accept "n" and a boolean to indicate whether we have to block or not. In case of blocking, we block only on the first element.

matzf

Reviewed 2 of 2 files at r9, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @rohrerj)

router/dataplane.go line 792 at r3 (raw file):

Previously, rohrerj wrote…

Sorry, I was not sure whether this was just a comment or whether you asked me to fix this. I changed now the readUpTo function to accept "n" and a boolean to indicate whether we have to block or not. In case of blocking, we block only on the first element.

Great!

router/dataplane.go line 645 at r9 (raw file):

func (d *DataPlane) runProcessor(id int, c <-chan packet,
	fwChans map[uint16]chan packet) {

Nit: fwQs. Also, maybe c -> q?

matzf · 2023-06-23T07:53:07Z

Do you have some evaluation on the performance difference that you could attach? My own first test showed slightly lower throughput -- the test was likely overly simplistic and unreliable, but still it would be great to see some evidence that this is actually a performance improvement.

Also good to see would be some profiling info captured when processing some test traffic. The go pprof tool can visualize the allocations made; previously this has been showing that only the strictly necessary allocations were made during packet processing. It would be great to confirm that this is still the case.
It's ok to focus on the happy path for this (no errors, no SCMPs), as we know that this area will need more work anyway.

rohrerj

I will do a proper performance evaluation in the next few days. I can attach the data once I am done with it.

Reviewable status: 5 of 6 files reviewed, all discussions resolved (waiting on @matzf)

rohrerj · 2023-07-01T13:23:17Z

I ran the evaluation on my test systems. The results can be found in the short report. I also attached a pprof file for the CPU and memory allocations.
Report.pdf
cpu.pb.gz
allocs.pb.gz

rohrerj

During the evaluation I identified some additional metrics that I found useful. I added them here too. What do you think?
I also found an missing error case for WriteBatch().

Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @matzf)

router/dataplane.go line 710 at r10 (raw file):

			writeMsgs[remaining:])
		available += remaining
		written, _ := conn.WriteBatch(writeMsgs[:available], 0)

I think we should add some error handling here because conn.WriteBatch() could return "-1" for "written" and then we would panic when accessing writeMsgs[:-1].
How should we handle that case? Should we drop all packets or just the first one and try the other packets again?

matzf

Great, thanks for the performance report. Good to see that you're seeing a decent throughput improvement.

The pprof reports appears to show that during the test SCMPs were triggered for the egress interface not being up. Is this expected from your setup and the packets that you generate?
Otherwise, the allocations look good, looks like nothing escapes.

One thing that stands out for me on the profile is the huge overhead for "mmsghdr pack" in the ReadBatch. This is the temporary structure used to translate the convenient Message struct into the syscall arguments and back -- ideally, we'd "fix" this by creating a better ReadBatch API that avoids the translation overhead for the syscall, but that goes beyond the point here.
Note that (almost) the same conversion also happens for WriteBatch, but this barely shows up in the profile. The difference between the Read and Write calls is that the Read call always uses the full batch size while the Write calls only use the available packets.
So my interpretation of this is that we're transferring significantly fewer than batch size packets in the write calls. The only two explanations that I can come up with are: we are reading significantly fewer than BatchSize packets per ReadBatch, or we're dropping many packets due to lack of "processors". Can you tell which one it is?

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @rohrerj)

router/dataplane.go line 710 at r10 (raw file):

Previously, rohrerj wrote…

I think we should add some error handling here because conn.WriteBatch() could return "-1" for "written" and then we would panic when accessing writeMsgs[:-1].
How should we handle that case? Should we drop all packets or just the first one and try the other packets again?

Indeed! I'd stick with just dropping the erroring packet. Handling this more specifically would require the inspect the error and figure out if it's a temporary, recoverable condition or permanent. Unfortunately this is much harder than it sounds.

I'd just add something like if written < 0 { written = 0 } // WriteBatch returns -1 on error, we just consider this as 0 packets written.

router/metrics.go line 61 at r11 (raw file):

			},
			[]string{"interface", "isd_as", "neighbor_isd_as"},
		),

Packets dropped due to processing errors (previously only these were counted) will nee to be looked at quite differently from packets dropped due to overload. With the current setup of the metrics, this is not very nicely separable.

Perhaps it would be nicer to keep only router_dropped_pkts_total counter but use a reason label to split it up. This would be somewhat consistent with the way this is presented in the gateway (gateway_frames_discarded_total). A consumer of the metrics has a clear way to filter which "reasons" it wants to consider for an analysis.
For the implementation, this would mean: add a "reason" label to the creation of DroppedPacketsTotal below. In the forwardingMetrics you'd keep separate counters, just initialize them from the same CounterVec by using different values for the reason label.

rohrerj

The SCMPs were not intended. Does this make a huge difference for the performance evaluation?

When I look at the raw evaluation data for e.g. WSL (100bytes payload) I see an average packet drop due to busy processors of 500pps and due to busy forwarders of 36'000pps. Unfortunately I don't have the data to reason about the actual transferred batch size.

Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @matzf)

router/metrics.go line 61 at r11 (raw file):

Previously, matzf (Matthias Frei) wrote…

Packets dropped due to processing errors (previously only these were counted) will nee to be looked at quite differently from packets dropped due to overload. With the current setup of the metrics, this is not very nicely separable.

Perhaps it would be nicer to keep only router_dropped_pkts_total counter but use a reason label to split it up. This would be somewhat consistent with the way this is presented in the gateway (gateway_frames_discarded_total). A consumer of the metrics has a clear way to filter which "reasons" it wants to consider for an analysis.
For the implementation, this would mean: add a "reason" label to the creation of DroppedPacketsTotal below. In the forwardingMetrics you'd keep separate counters, just initialize them from the same CounterVec by using different values for the reason label.

Done.

matzf

The SCMPs were not intended. Does this make a huge difference for the performance evaluation?

I haven't the foggiest. Benchmarking things correctly is very hard. An evaluation that did unexpected things might or might not be reliable. 😉

I see an average packet drop [...] due to busy forwarders of 36'000pps

Together with my suspicion that the write batch size is unexpectedly low, this seems a bit worrying. It just seems like something might not quite be working as we intend it to. It would be great if you could investigate this a bit.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @rohrerj)

router/dataplane.go line 2096 at r12 (raw file):

	}
	labels["reason"] = "any"
	c.DroppedPacketsTotal = metrics.DroppedPacketsTotal.With(labels)

The "reasons" should be non-overlapping, so that a consumer can take the sum to get the total number of drops.
Instead of "any", use e.g. "invalid". Perhaps rename the DroppedPacketsTotal to something more specific, and adapt the use of the counter accordingly.

router/dataplane.go line 2097 at r12 (raw file):

	labels["reason"] = "any"
	c.DroppedPacketsTotal = metrics.DroppedPacketsTotal.With(labels)
	labels["reason"] = "busy processor"

Conventionally the label values are snake_case (see e.g. the result values defined in pkg/private/prom/prom.go)

rohrerj

I investigated it on my local WSL. The code seems to be correct. Based on my investigation I think the packet drop is because of bursty traffic and the go scheduler. It seems like the forwarder goroutine was not selected enough often by the go scheduler. I was able to mitigate the problem by decreasing the batch size and increasing the forwarder queue size.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @matzf)

router/dataplane.go line 2096 at r12 (raw file):

Previously, matzf (Matthias Frei) wrote…

The "reasons" should be non-overlapping, so that a consumer can take the sum to get the total number of drops.
Instead of "any", use e.g. "invalid". Perhaps rename the DroppedPacketsTotal to something more specific, and adapt the use of the counter accordingly.

I changed the implementation to use "invalid", "busy_processor" and "busy_forwarder" as reasons. All of them do not overlapp with each other.

router/dataplane.go line 2097 at r12 (raw file):

Previously, matzf (Matthias Frei) wrote…

Conventionally the label values are snake_case (see e.g. the result values defined in pkg/private/prom/prom.go)

Done.

rohrerj

Reviewable status: 2 of 7 files reviewed, 2 unresolved discussions (waiting on @matzf)

matzf

Ok, sure, with the different goroutines the scheduler becomes more important, that's the deal that we're making with this setup.
Perhaps this is good enough for now, it seems to be a bit better than before and it's not very complicated, so let's just take this.

It would be nice to have somewhat more reliable benchmark setup and results though, so we can also use the information to attempt tracking and tuning this a bit. Perhaps we can look into this as a follow up.

Reviewed 5 of 5 files at r13, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @rohrerj)

router/dataplane.go line 682 at r13 (raw file):

		}
		if result.OutPkt == nil { // e.g. BFD case no message is forwarded
			metrics.ProcessedPackets.Inc()

Shouldn't we count a packet as processed even if there was an error?

rohrerj

Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @matzf)

router/dataplane.go line 682 at r13 (raw file):

Previously, matzf (Matthias Frei) wrote…

Shouldn't we count a packet as processed even if there was an error?

Right. When I implemented that I thought about successfully processed packets but since we already have the invalid metrics it makes more sense to just count all.

matzf

If there are no further comments here or in the chat, I'll merge this tomorrow.

Reviewed 1 of 1 files at r14, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @rohrerj)

Implement configuration for the border router processing structure added in #4351. This is implementation part two of three described in the design document (#4339, doc/dev/design/BorderRouter.rst).

Add a low-priority slow path for special case packet handling, in particular SCMP errors and traceroutes, for the the asynchronous packet processing pipeline in the router added in #4351. This is implementation part three of three described in the design document (#4339, doc/dev/design/BorderRouter.rst). Closes #4334

Add a low-priority slow path for special case packet handling, in particular SCMP errors and traceroutes, for the the asynchronous packet processing pipeline in the router added in scionproto#4351. This is implementation part three of three described in the design document (scionproto#4339, doc/dev/design/BorderRouter.rst). Closes scionproto#4334

Border router restructure

8032c1d

Restructure the border router into receiver, processing routines and forwarders.

rohrerj requested a review from matzf as a code owner June 6, 2023 16:03

rohrerj added 2 commits June 6, 2023 21:41

bugfix

2451df6

copy scmp packets back to our buffers

246a01f

fixed wrong packet length after copying scmp packet

a451e9b

matzf suggested changes Jun 8, 2023

View reviewed changes

rohrerj added 3 commits June 11, 2023 20:47

improved code based on PR comments

c65b60d

fix for PR

b7a503b

fixed linter

1bc5b6b

rohrerj commented Jun 11, 2023

View reviewed changes

matzf suggested changes Jun 12, 2023

View reviewed changes

resolve PR comments

9fa9161

rohrerj commented Jun 12, 2023

View reviewed changes

matzf suggested changes Jun 15, 2023

View reviewed changes

resolve PR comments

10e4480

rohrerj commented Jun 15, 2023

View reviewed changes

matzf suggested changes Jun 16, 2023

View reviewed changes

resolved PR comments

940e6c1

rohrerj commented Jun 16, 2023

View reviewed changes

modified unit tests

ada294f

matzf suggested changes Jun 19, 2023

View reviewed changes

rohrerj commented Jun 19, 2023

View reviewed changes

empty commit to rerun build

8bb23d0

matzf suggested changes Jun 19, 2023

View reviewed changes

resolve PR comments

9397a10

rohrerj commented Jun 19, 2023

View reviewed changes

change readUpTo function

b3e22f8

rohrerj commented Jun 19, 2023

View reviewed changes

matzf approved these changes Jun 23, 2023

View reviewed changes

rename variables

f76dbcf

rohrerj commented Jun 23, 2023

View reviewed changes

additional metrics

2b05aea

rohrerj commented Jul 1, 2023

View reviewed changes

empty commit to rerun build

cdd7865

matzf suggested changes Jul 3, 2023

View reviewed changes

PR comments

e7ddb41

rohrerj commented Jul 3, 2023

View reviewed changes

matzf suggested changes Jul 7, 2023

View reviewed changes

rohrerj added 2 commits July 7, 2023 14:46

rename metrics

7ae5251

add missing metrics case

54bcf8b

rohrerj commented Jul 7, 2023

View reviewed changes

rohrerj added 2 commits July 12, 2023 14:34

Merge

319580c

rerun tests

baae33c

rohrerj commented Jul 12, 2023

View reviewed changes

matzf approved these changes Jul 13, 2023

View reviewed changes

change to processed packets metrics

5eca797

rohrerj commented Jul 13, 2023

View reviewed changes

matzf approved these changes Jul 17, 2023

View reviewed changes

matzf and others added 2 commits July 18, 2023 10:49

Merge branch 'master' into br-restructure-basic

7d31f0a

Merge branch 'master' into br-restructure-basic

b25c340

matzf merged commit ab37c7a into scionproto:master Jul 18, 2023

JordiSubira mentioned this pull request Jul 31, 2024

[router] Batch writing error handling #4589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Border router restructure #4351

Border router restructure #4351

rohrerj commented Jun 6, 2023 •

edited

Loading

matzf commented Jun 7, 2023

matzf left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

rohrerj left a comment

matzf left a comment

matzf commented Jun 23, 2023

rohrerj left a comment

rohrerj commented Jul 1, 2023

rohrerj left a comment

matzf left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

rohrerj left a comment

matzf left a comment

rohrerj left a comment

matzf left a comment

Border router restructure #4351

Border router restructure #4351

Conversation

rohrerj commented Jun 6, 2023 • edited Loading

matzf commented Jun 7, 2023

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

matzf commented Jun 23, 2023

rohrerj left a comment

Choose a reason for hiding this comment

rohrerj commented Jul 1, 2023

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj left a comment

Choose a reason for hiding this comment

matzf left a comment

Choose a reason for hiding this comment

rohrerj commented Jun 6, 2023 •

edited

Loading