[RFC] Add a built-in pipeline abstraction #686

lachlansneff · 2022-03-29T16:51:58Z

I think it'd be beneficial to have a built-in pipeline abstraction to make pipelining more concise and less likely to have copy-paste errors. My RFC is based on Silice's pipeline feature.

Proposal

I propose that two methods be added to Module:

def Pipeline(self, stb, name="pipeline", domain="sync"):
    ...
def Stage(self, name=None):
    ...

These would be to create pipelines like this:

class MulAdd(Elaboratable):
    def __init__(self, width):
        self.a = Signal(width)
        self.b = Signal(width)
        self.c = Signal(width)
        self.stb = Signal()
        
        self.o = Signal(width * 2 + 1)
        self.o_stb = Signal()
        self.width = width

    def elaborate(self, platform):
        m = Module()

        with m.Pipeline(self.stb) as pln:
            with m.Stage():
                pln.mul = self.a * self.b
            with m.Stage("ADD_ONE"):
                pln.added_one = pln.mul + 1
            with m.Stage("ADD"):
                m.d.sync += self.o.eq(pln.added_one + self.c)
            
            m.d.comb += self.o_stb.eq(pln.o_stb)

        return m

When the stb argument to m.Pipeline() is strobed, the pipeline starts on that clock cycle and progresses through the stages. Of course, multiple stages can be running at any given time.

For data to get pipelined through each stage, you set the pipeline. property with the value you want propagated forward through the stages. You can access any value you've set in a previous stage in any future stage. The signal is set within the clock domain that the pipeline was instantiated with.

The pipeline.o_stb property is a signal that is strobed a cycle after the last stage executes (I think that's the right behavior, let me know if not).

A stage can be made invalid by calling the pipeline.invalid_stage() method. This is, of course, propagated forward through the stages.

Alternatives

No pipeline abstraction is added.
The pipeline.<signal name> support is changed to something more similar to the current way of synchronously setting signals.
Signals are automatically pipelined if set within the pipeline. This was my original thought (and is more similar to how Silice does it), but it turned out to be very difficult to implement.

The text was updated successfully, but these errors were encountered:

alyssarosenzweig · 2022-03-29T17:08:37Z

The getattr, setattr syntax seems magical. I don't know that this is a bad thing explicitly, but it seems to me that it departs from established Amaranth conventions (explicit Signal construction, explicit .eq)

Lunaphied · 2022-03-29T17:16:38Z

It does have one comparable instance which is the "magic" way that FSMs work where m.next is set to the next state of the FSM to move to when an edge occurs on the FSMs clock domain (which is, similarly to this design of pipelines, established when the FSM context is created). I've always found that syntax to feel a bit odd and magical too but I think both the FSM and the Pipeline case might have a similar difficulty; both have a link to a clock domain where the logic will operate but it is hidden inside the abstraction to some extent.

I would be in favor of finding a way to solve both the issue of the "magic" and the desire to not have multiple references to the domain the pipeline operates in.

Before design is considered at all though, I think it's worth thinking about if this is an abstraction that Amaranth should have at all, and how well it fits. What use case does this address and how well does it do so? I get the feeling that a lot of the use cases for pipelines either overlap with the existing desire for streams as discussed in #317 or wouldn't likely be served well by a one-size-fits-all approach (I have in mind here processors).

lachlansneff · 2022-03-29T17:40:44Z

Setting pipeline signals could also be done like this:

pln.mul.eq(a + b)

Allowing explicit signal creation would add even more magic since the pipeline would need to rewrite assignments to point to pipelined versions of the signals.

whitequark · 2022-03-29T17:44:17Z

It does have one comparable instance which is the "magic" way that FSMs work where m.next is set to the next state of the FSM to move to when an edge occurs on the FSMs clock domain (which is, similarly to this design of pipelines, established when the FSM context is created). I've always found that syntax to feel a bit odd and magical too but I think both the FSM and the Pipeline case might have a similar difficulty; both have a link to a clock domain where the logic will operate but it is hidden inside the abstraction to some extent.

The m.next syntax is something I'd like to eliminate, probably as a part of more general FSM improvements. So I don't think it should be referred to as a reason to add more similar syntax.

FSMs should, probably, use something like with m.FSM() as fsm:..m.d.sync += fsm.next("FOO") rather than FSM(domain="...")..m.next = "FOO".

I think it's worth thinking about if this is an abstraction that Amaranth should have at all, and how well it fits. What use case does this address and how well does it do so?

I agree.

lachlansneff · 2022-03-29T17:59:49Z

The FFT butterfly implementation from https://github.com/lambdaconcept/mfcc/blob/master/mfcc/misc/fft.py#L98-L192, which is over 100 lines long, becomes around 40 lines when the pipeline abstraction is used:

with m.Pipeline(self.i.stb) as pln:
    with m.Stage("LATCH"):
        pln.x0 = self.i.x0
        pln.x1 = self.i.x1
        pln.tw = self.i.tw
    with m.Stage():
        # Re(x₁) + Im(x₁)
        pln.add0 = pln.x1.real + pln.x1.imag
    with m.Stage():
        # Re(x₁)Re(ω) + Im(x₁)Re(ω)
        pln.mul0 = pln.add0 * pln.tw.real
    with m.Stage():
        pln.mul0 = pln.mul0 + bias
        # Re(ω) + Im(ω)
        pln.add1 = pln.tw.real + pln.tw.imag
        # Re(ω) - Im(ω)
        pln.sub0 = pln.tw.real - pln.tw.imag
    with m.Stage():
        # Im(x₁)Re(ω) + Im(x₁)Im(ω)
        pln.mul1 = pln.x1.imag * pln.add1
        # Re(x₁)Re(ω) - Re(x₁)Im(ω)
        pln.mul2 = pln.x1.real * pln.sub0
    with m.Stage():
        # Re(x₁)Re(ω) - Im(x₁)Im(ω)
        pln.sub1 = pln.mul0 - pln.mul1
        # Im(x₁)Re(ω) + Re(x₁)Im(ω)
        pln.sub2 = pln.mul0 - pln.mul2
    with m.Stage():
        m.d.sync += [
            # Re(y₀) = Re(x₀) +  Re(x₁)Re(ω) - Im(x₁)Im(ω)
            self.o.y0.real.eq((pln.x0.real + pln.sub1[self.bias_width:])[self.scale_bit:]),
            # Im(y₀) = Im(x₀) +  Im(x₁)Re(ω) + Re(x₁)Im(ω)
            self.o.y0.imag.eq((pln.x0.imag + pln.sub2[self.bias_width:])[self.scale_bit:]),
            # Re(y₁) = Re(x₀) - (Re(x₁)Re(ω) - Im(x₁)Im(ω))
            self.o.y1.real.eq((pln.x0.real - pln.sub1[self.bias_width:])[self.scale_bit:]),
            # Im(y₁) = Im(x₀) - (Im(x₁)Re(ω) + Re(x₁)Im(ω))
            self.o.y1.imag.eq((pln.x0.imag - pln.sub2[self.bias_width:])[self.scale_bit:]),
        ]

    m.d.comb += self.o.stb.eq(pln.o_stb)

Lunaphied · 2022-03-29T18:09:09Z

The m.next syntax is something I'd like to eliminate, probably as a part of more general FSM improvements. So I don't think it should be referred to as a reason to add more similar syntax.

FSMs should, probably, use something like with m.FSM() as fsm:..m.d.sync += fsm.next("FOO") rather than FSM(domain="...")..m.next = "FOO".

This is welcome news, I was under the impression that the FSM interface was somewhat "fixed". The suggested change would also mostly handle the concern I had about domains nicely. It does have a slight downside of making it a little bit more challenging to move a FSM to a different domain at a later point. This is an entirely different issue though, I agree that less magic is better in any case.

The FFT butterfly implementation from https://github.com/lambdaconcept/mfcc/blob/master/mfcc/misc/fft.py#L98-L192, which is over 100 lines long, becomes around 40 lines when the pipeline abstraction is used:

This implementation could be a lot smaller if it made more use of code generation to eliminate the duplicate structures.

lachlansneff · 2022-03-29T18:16:02Z

I couldn't find an example of the stream abstraction mentioned earlier would actually look like. Is there some code or example available?

whitequark · 2022-03-29T18:19:38Z

Is there some code or example available?

Take a look at LUNA; beyond that, not currently.

lachlansneff · 2022-03-29T18:38:41Z

Perhaps I should move this abstraction to a separate package that can be layered on top of Amaranth—also to test out my ideas with adding automated FSM construction to support things like while loops, jumps, etc.

Lunaphied · 2022-03-29T18:46:13Z

That sounds like the sort of higher level language constructed on top of Amaranth that I would expect you would want in the first place. It is also a good place for you to define your own machinery to generate pipelines that understand the specific constraints of what is being solved.

You could easily expand a concept like that to allow automatic extraction of a pipeline out of a set of computations.

lekcyjna123 · 2022-04-03T15:24:15Z

Perhaps I should move this abstraction to a separate package that can be layered on top of Amaranth

If you create your own package please share a link. We are currently starting implementation of our own Risc-V OoO processor and I think that such abstraction will be very useful in construction of subunits (for example: functional units or in branch predictors).

lachlansneff · 2022-04-03T18:36:04Z

When I make the package, I'll make sure to update you.

whitequark added the design label Apr 5, 2022

lachlansneff closed this as completed Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add a built-in pipeline abstraction #686

[RFC] Add a built-in pipeline abstraction #686

lachlansneff commented Mar 29, 2022

alyssarosenzweig commented Mar 29, 2022

Lunaphied commented Mar 29, 2022 •

edited

Loading

lachlansneff commented Mar 29, 2022

whitequark commented Mar 29, 2022

lachlansneff commented Mar 29, 2022

Lunaphied commented Mar 29, 2022

lachlansneff commented Mar 29, 2022

whitequark commented Mar 29, 2022

lachlansneff commented Mar 29, 2022

Lunaphied commented Mar 29, 2022

lekcyjna123 commented Apr 3, 2022

lachlansneff commented Apr 3, 2022

[RFC] Add a built-in pipeline abstraction #686

[RFC] Add a built-in pipeline abstraction #686

Comments

lachlansneff commented Mar 29, 2022

Proposal

Alternatives

alyssarosenzweig commented Mar 29, 2022

Lunaphied commented Mar 29, 2022 • edited Loading

lachlansneff commented Mar 29, 2022

whitequark commented Mar 29, 2022

lachlansneff commented Mar 29, 2022

Lunaphied commented Mar 29, 2022

lachlansneff commented Mar 29, 2022

whitequark commented Mar 29, 2022

lachlansneff commented Mar 29, 2022

Lunaphied commented Mar 29, 2022

lekcyjna123 commented Apr 3, 2022

lachlansneff commented Apr 3, 2022

Lunaphied commented Mar 29, 2022 •

edited

Loading