Skip to content

Commit

Permalink
🐛 fix minor regression bug; minor RTL optimizations (#998)
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting authored Aug 31, 2024
2 parents 0377316 + 2d9896b commit e3136f4
Show file tree
Hide file tree
Showing 8 changed files with 115 additions and 110 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 30.08.2024 | 1.10.2.9 | :bug: fix PC reset bug (introduced in v1.10.2.8); minor RTL optimizations (size and critical path) | [#998](https://github.com/stnolting/neorv32/pull/998) |
| 25.08.2024 | 1.10.2.8 | :warning: remove user-mode HPM counters; add individual `mocuntern` bits (`CY` and `IR`) rework Vivado IP module; minor RTL cleanups and optimization | [#996](https://github.com/stnolting/neorv32/pull/996) |
| 16.08.2024 | 1.10.2.7 | minor CPU area and critical path optimizations; minor code cleanups | [#990](https://github.com/stnolting/neorv32/pull/990) |
| 09.08.2024 | 1.10.2.6 | :warning: re-organize RTL files; all core files are now located in `rtl/core`; remove `mem` sub-folder | [#985](https://github.com/stnolting/neorv32/pull/985) |
Expand Down
20 changes: 13 additions & 7 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -298,11 +298,11 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
:sectnums:
=== Processor Clocking

The processor is implemented as fully-synchronous logic design using a single clock domain that is driven entirely by the
top's `clk_i` signal. This clock signal is used by all internal registers and memories, which trigger on the rising edge of
this clock signal - except for the <<_processor_reset>> and the clock switching gate that trigger on a falling edge.
External "clocks" like the OCD's JTAG clock or the SDI's serial clock are synchronized into the processor's clock domain
before being further processed.
The processor is implemented as fully-synchronous logic design using a single clock domain that is driven entirely
by the top's `clk_i` signal. This clock signal is used by all internal registers and memories. All of them trigger
on the **rising edge** of this clock signal - the only exception it the default <<_clock_gating>> module. External
"clocks" like the OCD's JTAG clock or the SDI's serial clock are synchronized into the processor's clock domain
before being used as "general logic signal" (and not as a dedicated clock).

==== Clock Gating

Expand Down Expand Up @@ -371,8 +371,8 @@ The actual reset cause can be determined via the <<_watchdog_timer_wdt>>.

If any of these sources trigger a reset, the internal reset will be triggered for at least 4 clock cycles ensuring
a valid reset of the entire processor. The internal global reset is asserted _aysynchronoulsy_ if triggered by the external
`rstn_i` signal. For internal reset sources, the global reset is asserted _synchronously_. If the reset cause gets inactive
the internal reset is de-asserted _synchronously_ at a falling clock edge.
`rstn_i` signal. For internal reset sources, the global reset is asserted _synchronously_. If the reset cause is de-asserted
the internal reset is de-asserted _synchronously_ at the next rising clock edge.

Internally, **all registers** that are not meant for mapping to blockRAM (like the register file) do provide a dedicated and
low-active **asynchronous hardware reset**. This asynchronous reset ensures that the entire processor logic is reset to a
Expand Down Expand Up @@ -646,6 +646,12 @@ package file (`rtl/core/neorv323_package.vhd`).
constant base_io_dma_c : std_ulogic_vector(31 downto 0) := x"ffffed00";
----

.IO Access Latency
[IMPORTANT]
In order to shorten the critical path of the IO system, the IO switch contain a partial register stage that
buffers the address bus. Hence, accesses to the processor-internal IO region requires an additional clock cycle
to complete.


:sectnums:
==== Boot Configuration
Expand Down
2 changes: 1 addition & 1 deletion rtl/core/neorv32_cpu_alu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ begin
addsub_res <= std_ulogic_vector(unsigned(opa_x) - unsigned(opb_x)) when (ctrl_i.alu_sub = '1') else
std_ulogic_vector(unsigned(opa_x) + unsigned(opb_x));

add_o <= addsub_res(XLEN-1 downto 0); -- direct output of adder result
add_o <= addsub_res(XLEN-1 downto 0); -- direct output


-- ALU Operation Select -------------------------------------------------------------------
Expand Down
101 changes: 31 additions & 70 deletions rtl/core/neorv32_cpu_control.vhd

Large diffs are not rendered by default.

42 changes: 25 additions & 17 deletions rtl/core/neorv32_cpu_cp_fpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ architecture neorv32_cpu_cp_fpu_rtl of neorv32_cpu_cp_fpu is
instr_addsub : std_ulogic;
instr_mul : std_ulogic;
funct : std_ulogic_vector(2 downto 0);
valid : std_ulogic;
end record;
signal cmd : cmd_t;
signal funct_ff : std_ulogic_vector(2 downto 0);
Expand Down Expand Up @@ -313,24 +314,31 @@ begin
-- Instruction Decoding -------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
-- one-hot re-encoding --
cmd.instr_class <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "11100") else '0';
cmd.instr_comp <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "10100") else '0';
cmd.instr_i2f <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "11010") else '0';
cmd.instr_f2i <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "11000") else '0';
cmd.instr_sgnj <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "00100") else '0';
cmd.instr_minmax <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "00101") else '0';
cmd.instr_addsub <= '1' when (ctrl_i.ir_funct12(11 downto 8) = "0000" ) else '0';
cmd.instr_mul <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "00010") else '0';
cmd.instr_class <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "11100") and (ctrl_i.ir_funct3 = "001") else '0'; -- FCLASS
cmd.instr_comp <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "10100") and (ctrl_i.ir_funct3(2) = '0') else '0'; -- FEQ/FLT/FLE
cmd.instr_i2f <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "11010") and (ctrl_i.ir_funct12(4 downto 1) = "0000") else '0'; -- FCVT
cmd.instr_f2i <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "11000") and (ctrl_i.ir_funct12(4 downto 1) = "0000") else '0' ;-- FCVT
cmd.instr_sgnj <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "00100") and (ctrl_i.ir_funct3(2) = '0') else '0'; -- FSGNJ
cmd.instr_minmax <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "00101") and (ctrl_i.ir_funct3(2 downto 1) = "00") else '0'; -- FMIN/FMAX
cmd.instr_addsub <= '1' when (ctrl_i.ir_funct12(11 downto 8) = "0000") else '0'; -- FADD/FSUB
cmd.instr_mul <= '1' when (ctrl_i.ir_funct12(11 downto 7) = "00010") else '0'; -- FMUL

-- valid FPU operation? --
cmd.valid <= '1' when (ctrl_i.ir_funct12(6 downto 5) = "00") and -- single-precision format only
((cmd.instr_class = '1') or (cmd.instr_comp = '1') or
(cmd.instr_i2f = '1') or (cmd.instr_f2i = '1') or
(cmd.instr_sgnj = '1') or (cmd.instr_minmax = '1') or
(cmd.instr_addsub = '1') or (cmd.instr_mul = '1')) else '0';

-- binary re-encoding --
cmd.funct <= op_mul_c when (cmd.instr_mul = '1') else
op_addsub_c when (cmd.instr_addsub = '1') else
op_minmax_c when (cmd.instr_minmax = '1') else
op_sgnj_c when (cmd.instr_sgnj = '1') else
op_f2i_c when (cmd.instr_f2i = '1') else
op_i2f_c when (cmd.instr_i2f = '1') else
op_comp_c when (cmd.instr_comp = '1') else
op_class_c;--when (cmd.instr_class = '1') else (others => '-');
cmd.funct <= op_mul_c when (cmd.instr_mul = '1') else
op_addsub_c when (cmd.instr_addsub = '1') else
op_minmax_c when (cmd.instr_minmax = '1') else
op_sgnj_c when (cmd.instr_sgnj = '1') else
op_f2i_c when (cmd.instr_f2i = '1') else
op_i2f_c when (cmd.instr_i2f = '1') else
op_comp_c when (cmd.instr_comp = '1') else
op_class_c;


-- Input Operands: Check for subnormal numbers (flush to zero) ----------------------------
Expand Down Expand Up @@ -425,7 +433,7 @@ begin
fpu_operands.frm <= ctrl_i.ir_funct3(2 downto 0);
end if;
--
if (start_i = '1') then
if (start_i = '1') and (cmd.valid = '1') then
-- operand data --
fpu_operands.rs1 <= op_data(0);
fpu_operands.rs1_class <= op_class(0);
Expand Down
46 changes: 39 additions & 7 deletions rtl/core/neorv32_intercon.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -363,8 +363,9 @@ end neorv32_bus_gateway_rtl;
-- ================================================================================ --
-- NEORV32 SoC - Processor Bus Infrastructure: IO Switch --
-- -------------------------------------------------------------------------------- --
-- Simple switch for accessing one out of several (IO) devices. --
-- [Note] Enabled ports do not have to be contiguous. --
-- Simple switch for accessing one out of several (IO) devices. The main request --
-- input bus provides a partial register stage to relax timing. Thus, accesses --
-- require an additional clock cycle. --
-- -------------------------------------------------------------------------------- --
-- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 --
-- Copyright (c) NEORV32 contributors. --
Expand All @@ -382,7 +383,7 @@ use neorv32.neorv32_package.all;
entity neorv32_bus_io_switch is
generic (
DEV_SIZE : natural; -- size of a single IO device, has to be a power of two
-- device port enable and base address --
-- device port enable and base address; enabled ports do not have to be contiguous --
DEV_00_EN : boolean := false; DEV_00_BASE : std_ulogic_vector(31 downto 0) := (others => '-');
DEV_01_EN : boolean := false; DEV_01_BASE : std_ulogic_vector(31 downto 0) := (others => '-');
DEV_02_EN : boolean := false; DEV_02_BASE : std_ulogic_vector(31 downto 0) := (others => '-');
Expand Down Expand Up @@ -417,6 +418,9 @@ entity neorv32_bus_io_switch is
DEV_31_EN : boolean := false; DEV_31_BASE : std_ulogic_vector(31 downto 0) := (others => '-')
);
port (
-- global control --
clk_i : in std_ulogic; -- global clock, rising edge
rstn_i : in std_ulogic; -- global reset, low-active, async
-- host port --
main_req_i : in bus_req_t; -- host request
main_rsp_o : out bus_rsp_t; -- host response
Expand Down Expand Up @@ -489,6 +493,9 @@ architecture neorv32_bus_io_switch_rtl of neorv32_bus_io_switch is
signal dev_req : dev_req_t;
signal dev_rsp : dev_rsp_t;

-- (partial) register stage --
signal main_req : bus_req_t;

begin

-- Combine Device Ports -------------------------------------------------------------------
Expand Down Expand Up @@ -527,18 +534,43 @@ begin
dev_31_req_o <= dev_req(31); dev_rsp(31) <= dev_31_rsp_i;


-- Input Buffer ---------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
request_reg: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
main_req.addr <= (others => '0');
main_req.stb <= '0';
elsif rising_edge(clk_i) then
if (main_req_i.stb = '1') then -- reduce switching activity on IO bus system
main_req.addr <= main_req_i.addr;
end if;
main_req.stb <= main_req_i.stb;
end if;
end process request_reg;

-- no need to register these signals; they are stable for the entire transfer and do not impact the critical path --
main_req.data <= main_req_i.data;
main_req.ben <= main_req_i.ben;
main_req.rw <= main_req_i.rw;
main_req.src <= main_req_i.src;
main_req.priv <= main_req_i.priv;
main_req.rvso <= main_req_i.rvso;
main_req.fence <= main_req_i.fence;


-- Request --------------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
bus_request_gen:
for i in 0 to (num_devs_c-1) generate

bus_request_port_enabled:
if dev_en_list_c(i) generate
bus_request: process(main_req_i)
bus_request: process(main_req)
begin
dev_req(i) <= main_req_i;
if (main_req_i.addr(addr_hi_c downto addr_lo_c) = dev_base_list_c(i)(addr_hi_c downto addr_lo_c)) then
dev_req(i).stb <= main_req_i.stb; -- propagate transaction strobe if address match
dev_req(i) <= main_req;
if (main_req.addr(addr_hi_c downto addr_lo_c) = dev_base_list_c(i)(addr_hi_c downto addr_lo_c)) then
dev_req(i).stb <= main_req.stb; -- propagate transaction strobe if address match
else
dev_req(i).stb <= '0';
end if;
Expand Down
7 changes: 1 addition & 6 deletions rtl/core/neorv32_package.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ package neorv32_package is

-- Architecture Constants -----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01100208"; -- hardware version
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01100209"; -- hardware version
constant archid_c : natural := 19; -- official RISC-V architecture ID
constant XLEN : natural := 32; -- native data path width

Expand Down Expand Up @@ -285,11 +285,6 @@ package neorv32_package is

-- RISC-V Floating-Point Stuff ------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
constant float_single_c : std_ulogic_vector(1 downto 0) := "00"; -- single-precision (32-bit)
--constant float_double_c : std_ulogic_vector(1 downto 0) := "01"; -- double-precision (64-bit)
--constant float_half_c : std_ulogic_vector(1 downto 0) := "10"; -- half-precision (16-bit)
--constant float_quad_c : std_ulogic_vector(1 downto 0) := "11"; -- quad-precision (128-bit)

-- number class flags --
constant fp_class_neg_inf_c : natural := 0; -- negative infinity
constant fp_class_neg_norm_c : natural := 1; -- negative normal number
Expand Down
6 changes: 4 additions & 2 deletions rtl/core/neorv32_top.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ begin
rstn_ext <= '0';
rstn_sys_sreg <= (others => '0');
rstn_sys <= '0';
elsif falling_edge(clk_i) then -- inverted clock to release reset _before_ all FFs trigger (rising edge)
elsif rising_edge(clk_i) then -- inverted clock to release reset _before_ all FFs trigger (rising edge)
-- external reset --
rstn_ext_sreg <= rstn_ext_sreg(rstn_ext_sreg'left-1 downto 0) & '1'; -- active for at least <rstn_ext_sreg'size> clock cycles
rstn_ext <= and_reduce_f(rstn_ext_sreg);
Expand All @@ -408,7 +408,7 @@ begin
begin
if (rstn_ext = '0') then
rst_cause <= "00"; -- reset from external pin
elsif falling_edge(clk_i) then
elsif rising_edge(clk_i) then
if (dci_ndmrstn = '0') then
rst_cause <= "01"; -- reset from on-chip debugger
elsif (rstn_wdt = '0') then
Expand Down Expand Up @@ -1033,6 +1033,8 @@ begin
DEV_31_EN => false, DEV_21_BASE => (others => '-') -- reserved
)
port map (
clk_i => clk_i,
rstn_i => rstn_sys,
main_req_i => io_req,
main_rsp_o => io_rsp,
dev_00_req_o => iodev_req(IODEV_OCD), dev_00_rsp_i => iodev_rsp(IODEV_OCD),
Expand Down

0 comments on commit e3136f4

Please sign in to comment.