Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placer fails with long CARRY4 chains #34

Open
kazkojima opened this issue Mar 1, 2022 · 4 comments
Open

Placer fails with long CARRY4 chains #34

kazkojima opened this issue Mar 1, 2022 · 4 comments

Comments

@kazkojima
Copy link
Contributor

Hi,

A test case which is a 256-bit version of the carry stress test in SymbiFlow/f4pga-arch-defs

module top (
    input clk,
    output out
);
    wire [256-1:0] a;
    wire [256-1:0] b;
    wire [256-1:0] c;

    LFSR #(.POLY(5)) a_src(
        .clk(clk),
        .out(a)
        );

    LFSR #(.POLY(9)) b_src(
        .clk(clk),
        .out(b)
        );

    assign carry = 1;
    assign c = a + b + carry;
    assign out = c[256-1];
endmodule

module LFSR (
    input clk,
    output [256-1:0] out
    );
    parameter POLY = 1;

    reg [256-1:0] r = 1;
    assign out = r;
    wire f;
    assign f = ^(POLY & ~r);
    always @( posedge clk)
      r <= {r[256-2:0], ~f};
endmodule

which causes a hang up of HeAP placer. SA placer fails with

ERROR: failed to place chain starting at cell '$auto$alumacc.cc:485:replace_alu$1627.genblk1.slice[0].genblk1.carry4$split$muxcy0$PACKED_CARRY4$'
ERROR: Placing design failed.

It seems that all xc7 have no CARRY4 at grid_y which is a multiple of 26, as far as I've checked prjxray/database tilegrid.json files. gdb shows that that hang up happens in placer_heap.cc:legalise_placement_strict() looking for vertically contiguous chain of length > 25.
The patch below works for me, though I can test it for one kintex-7 target only.

diff --git a/xilinx/pack_carry_xc7.cc b/xilinx/pack_carry_xc7.cc
index 85faa39d..a56f8407 100644
--- a/xilinx/pack_carry_xc7.cc
+++ b/xilinx/pack_carry_xc7.cc
@@ -240,7 +240,8 @@ void XC7Packer::pack_carries()
                 c4->constr_parent = root;
                 root->constr_children.push_back(c4);
                 c4->constr_x = 0;
-                c4->constr_y = -i / 4;
+		// Looks no CARRY4 on the tile of which grid_y is a multiple of 26. Skip them
+                c4->constr_y = -(i / 4 + i / (4*25));
                 c4->constr_abs_z = true;
                 c4->constr_z = BEL_CARRY4;
             }
@@ -345,7 +346,7 @@ void XC7Packer::pack_carries()
                 root->constr_children.push_back(s_lut);
                 s_lut->constr_parent = root;
                 s_lut->constr_x = 0;
-                s_lut->constr_y = -i / 4;
+                s_lut->constr_y = -(i / 4 + i / (4*25));
                 s_lut->constr_abs_z = true;
                 s_lut->constr_z = (z << 4 | BEL_6LUT);
             }
@@ -353,7 +354,7 @@ void XC7Packer::pack_carries()
                 root->constr_children.push_back(di_lut);
                 di_lut->constr_parent = root;
                 di_lut->constr_x = 0;
-                di_lut->constr_y = -i / 4;
+                di_lut->constr_y = -(i / 4 + i / (4*25));
                 di_lut->constr_abs_z = true;
                 di_lut->constr_z = (z << 4 | BEL_5LUT);
             }
@programmerjake
Copy link

We at Libre-SOC encountered this bug before, when trying to build microwatt, a workaround is to tell yosys to not generate CARRY4 at all with synth_xilinx -nocarry.

cc @toshywoshy

@kazkojima
Copy link
Contributor Author

Thanks. That would certainly be a workaround.
It would disable fast carry chains, which is a bit damaging for long bit-length additions, though. I ran into this issue with an x25519 calculation and -nocarry reduced the max clock freq. by ~1/3.

@programmerjake
Copy link

programmerjake commented Mar 11, 2022

Luke (the person at Libre-SOC who originally encountered this bug) thinks this bug is best fixed fixed by making yosys split long carry4 chains. imho vpr and nextpnr-xilinx are where that would be handled since they know the details of how many carry4 blocks are available and should be able to insert the necessary routing between them. imho yosys is the wrong place to try to fix that.

additional context: https://libre-soc.org/irclog/%23libre-soc.2022-03-11.log.html#t2022-03-11T18:23:08

Contents of email Luke (lkcl) asked me to post
<lkcl> marzoul: fyi the same bug in symbiflow related to CARRY4, which
i discussed with acomodi when this was the #symbiflow channel, is also
present in nextpnr-xilinx
<lkcl> gatecat ^
<lkcl> it does not show up when using RISC-V 32-bit cores because
those never create carry-chains of more than 23-25 CARRY4 blocks
<lkcl> only if you attempt a 64-bit core using e.g. a divide unit with
128-bit remainder/quotients e.g. in Power ISA 64-bit do you run smack
into this problem
<lkcl> toshywoshy, as discussed yesterday ^
<lkcl> the symptoms for symbiflow are that it bombs out with an error
<lkcl> the symptoms for nextpnr-xilinx are that it completely locks up
and fails to complete, going into an infinite loop
<lkcl> as long as you stay below about 4x23-or-so (95) bit add,
subtract or compare, you're "fine"
<lkcl> the "workaround" is to use "-nocarry" to yosys "synth_xilinx"
<lkcl>     -nocarry
<lkcl>         do not use XORCY/MUXCY/CARRY4 cells in output netlist
<lkcl> the proper solution is to fix the synth_xilinx techmap so that
it splits anything above 95-or-thereabouts bits into separate
adds/subs/cmps with an explicit carry-signal
<lkcl> x[128], y[128]
<lkcl> add(x, y)
<lkcl> -->
<lkcl> z[0:97] = add(x[0:96], y[0:96])
<lkcl> z[96:128] = add(z[96], x[96:128], y[96:128])
<lkcl> you get the general idea i'm sure

@kazkojima
Copy link
Contributor Author

Ok if yosys can solve the problem during synthesis, though it seems more natural to resolve that during placement. I found that the split_carry_chain() in nextpnr/ecp5/pack.cc splits the carry chain into multiple legal chains based on the chip information. There appears to be a quite similar function in nextpnr/ice40/chains.cc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants