Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add a built-in pipeline abstraction #686

Closed
lachlansneff opened this issue Mar 29, 2022 · 12 comments
Closed

[RFC] Add a built-in pipeline abstraction #686

lachlansneff opened this issue Mar 29, 2022 · 12 comments
Labels

Comments

@lachlansneff
Copy link

I think it'd be beneficial to have a built-in pipeline abstraction to make pipelining more concise and less likely to have copy-paste errors. My RFC is based on Silice's pipeline feature.

Proposal

I propose that two methods be added to Module:

def Pipeline(self, stb, name="pipeline", domain="sync"):
    ...
def Stage(self, name=None):
    ...

These would be to create pipelines like this:

class MulAdd(Elaboratable):
    def __init__(self, width):
        self.a = Signal(width)
        self.b = Signal(width)
        self.c = Signal(width)
        self.stb = Signal()
        
        self.o = Signal(width * 2 + 1)
        self.o_stb = Signal()
        self.width = width

    def elaborate(self, platform):
        m = Module()

        with m.Pipeline(self.stb) as pln:
            with m.Stage():
                pln.mul = self.a * self.b
            with m.Stage("ADD_ONE"):
                pln.added_one = pln.mul + 1
            with m.Stage("ADD"):
                m.d.sync += self.o.eq(pln.added_one + self.c)
            
            m.d.comb += self.o_stb.eq(pln.o_stb)

        return m

When the stb argument to m.Pipeline() is strobed, the pipeline starts on that clock cycle and progresses through the stages. Of course, multiple stages can be running at any given time.

For data to get pipelined through each stage, you set the pipeline. property with the value you want propagated forward through the stages. You can access any value you've set in a previous stage in any future stage. The signal is set within the clock domain that the pipeline was instantiated with.

The pipeline.o_stb property is a signal that is strobed a cycle after the last stage executes (I think that's the right behavior, let me know if not).

A stage can be made invalid by calling the pipeline.invalid_stage() method. This is, of course, propagated forward through the stages.

Alternatives

  1. No pipeline abstraction is added.
  2. The pipeline.<signal name> support is changed to something more similar to the current way of synchronously setting signals.
  3. Signals are automatically pipelined if set within the pipeline. This was my original thought (and is more similar to how Silice does it), but it turned out to be very difficult to implement.
@alyssarosenzweig
Copy link
Contributor

The getattr, setattr syntax seems magical. I don't know that this is a bad thing explicitly, but it seems to me that it departs from established Amaranth conventions (explicit Signal construction, explicit .eq)

@Lunaphied
Copy link
Contributor

Lunaphied commented Mar 29, 2022

It does have one comparable instance which is the "magic" way that FSMs work where m.next is set to the next state of the FSM to move to when an edge occurs on the FSMs clock domain (which is, similarly to this design of pipelines, established when the FSM context is created). I've always found that syntax to feel a bit odd and magical too but I think both the FSM and the Pipeline case might have a similar difficulty; both have a link to a clock domain where the logic will operate but it is hidden inside the abstraction to some extent.

I would be in favor of finding a way to solve both the issue of the "magic" and the desire to not have multiple references to the domain the pipeline operates in.

Before design is considered at all though, I think it's worth thinking about if this is an abstraction that Amaranth should have at all, and how well it fits. What use case does this address and how well does it do so? I get the feeling that a lot of the use cases for pipelines either overlap with the existing desire for streams as discussed in #317 or wouldn't likely be served well by a one-size-fits-all approach (I have in mind here processors).

@lachlansneff
Copy link
Author

Setting pipeline signals could also be done like this:

pln.mul.eq(a + b)

Allowing explicit signal creation would add even more magic since the pipeline would need to rewrite assignments to point to pipelined versions of the signals.

@whitequark
Copy link
Member

It does have one comparable instance which is the "magic" way that FSMs work where m.next is set to the next state of the FSM to move to when an edge occurs on the FSMs clock domain (which is, similarly to this design of pipelines, established when the FSM context is created). I've always found that syntax to feel a bit odd and magical too but I think both the FSM and the Pipeline case might have a similar difficulty; both have a link to a clock domain where the logic will operate but it is hidden inside the abstraction to some extent.

The m.next syntax is something I'd like to eliminate, probably as a part of more general FSM improvements. So I don't think it should be referred to as a reason to add more similar syntax.

FSMs should, probably, use something like with m.FSM() as fsm:..m.d.sync += fsm.next("FOO") rather than FSM(domain="...")..m.next = "FOO".

I think it's worth thinking about if this is an abstraction that Amaranth should have at all, and how well it fits. What use case does this address and how well does it do so?

I agree.

@lachlansneff
Copy link
Author

The FFT butterfly implementation from https://github.com/lambdaconcept/mfcc/blob/master/mfcc/misc/fft.py#L98-L192, which is over 100 lines long, becomes around 40 lines when the pipeline abstraction is used:

with m.Pipeline(self.i.stb) as pln:
    with m.Stage("LATCH"):
        pln.x0 = self.i.x0
        pln.x1 = self.i.x1
        pln.tw = self.i.tw
    with m.Stage():
        # Re(x₁) + Im(x₁)
        pln.add0 = pln.x1.real + pln.x1.imag
    with m.Stage():
        # Re(x₁)Re(ω) + Im(x₁)Re(ω)
        pln.mul0 = pln.add0 * pln.tw.real
    with m.Stage():
        pln.mul0 = pln.mul0 + bias
        # Re(ω) + Im(ω)
        pln.add1 = pln.tw.real + pln.tw.imag
        # Re(ω) - Im(ω)
        pln.sub0 = pln.tw.real - pln.tw.imag
    with m.Stage():
        # Im(x₁)Re(ω) + Im(x₁)Im(ω)
        pln.mul1 = pln.x1.imag * pln.add1
        # Re(x₁)Re(ω) - Re(x₁)Im(ω)
        pln.mul2 = pln.x1.real * pln.sub0
    with m.Stage():
        # Re(x₁)Re(ω) - Im(x₁)Im(ω)
        pln.sub1 = pln.mul0 - pln.mul1
        # Im(x₁)Re(ω) + Re(x₁)Im(ω)
        pln.sub2 = pln.mul0 - pln.mul2
    with m.Stage():
        m.d.sync += [
            # Re(y₀) = Re(x₀) +  Re(x₁)Re(ω) - Im(x₁)Im(ω)
            self.o.y0.real.eq((pln.x0.real + pln.sub1[self.bias_width:])[self.scale_bit:]),
            # Im(y₀) = Im(x₀) +  Im(x₁)Re(ω) + Re(x₁)Im(ω)
            self.o.y0.imag.eq((pln.x0.imag + pln.sub2[self.bias_width:])[self.scale_bit:]),
            # Re(y₁) = Re(x₀) - (Re(x₁)Re(ω) - Im(x₁)Im(ω))
            self.o.y1.real.eq((pln.x0.real - pln.sub1[self.bias_width:])[self.scale_bit:]),
            # Im(y₁) = Im(x₀) - (Im(x₁)Re(ω) + Re(x₁)Im(ω))
            self.o.y1.imag.eq((pln.x0.imag - pln.sub2[self.bias_width:])[self.scale_bit:]),
        ]

    m.d.comb += self.o.stb.eq(pln.o_stb)

@Lunaphied
Copy link
Contributor

The m.next syntax is something I'd like to eliminate, probably as a part of more general FSM improvements. So I don't think it should be referred to as a reason to add more similar syntax.

FSMs should, probably, use something like with m.FSM() as fsm:..m.d.sync += fsm.next("FOO") rather than FSM(domain="...")..m.next = "FOO".

This is welcome news, I was under the impression that the FSM interface was somewhat "fixed". The suggested change would also mostly handle the concern I had about domains nicely. It does have a slight downside of making it a little bit more challenging to move a FSM to a different domain at a later point. This is an entirely different issue though, I agree that less magic is better in any case.

The FFT butterfly implementation from https://github.com/lambdaconcept/mfcc/blob/master/mfcc/misc/fft.py#L98-L192, which is over 100 lines long, becomes around 40 lines when the pipeline abstraction is used:

This implementation could be a lot smaller if it made more use of code generation to eliminate the duplicate structures.

@lachlansneff
Copy link
Author

I couldn't find an example of the stream abstraction mentioned earlier would actually look like. Is there some code or example available?

@whitequark
Copy link
Member

Is there some code or example available?

Take a look at LUNA; beyond that, not currently.

@lachlansneff
Copy link
Author

Perhaps I should move this abstraction to a separate package that can be layered on top of Amaranth—also to test out my ideas with adding automated FSM construction to support things like while loops, jumps, etc.

@Lunaphied
Copy link
Contributor

That sounds like the sort of higher level language constructed on top of Amaranth that I would expect you would want in the first place. It is also a good place for you to define your own machinery to generate pipelines that understand the specific constraints of what is being solved.

You could easily expand a concept like that to allow automatic extraction of a pipeline out of a set of computations.

@lekcyjna123
Copy link

Perhaps I should move this abstraction to a separate package that can be layered on top of Amaranth

If you create your own package please share a link. We are currently starting implementation of our own Risc-V OoO processor and I think that such abstraction will be very useful in construction of subunits (for example: functional units or in branch predictors).

@lachlansneff
Copy link
Author

When I make the package, I'll make sure to update you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants