Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Rework IO/peripheral address space #1126

Merged
merged 20 commits into from
Dec 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 23.12.2024 | 1.10.7.9 | :warning: rework IO/peripheral address space; :sparkles: increase device size from 256 bytes to 64kB | [#1126](https://github.com/stnolting/neorv32/pull/1126) |
| 22.12.2024 | 1.10.7.8 | :warning: rename CPU tuning options / generics | [#1125](https://github.com/stnolting/neorv32/pull/1125) |
| 22.12.2024 | 1.10.7.7 | :warning: move clock gating switch from processor top to CPU clock; `CLOCK_GATING_EN` is now a CPU tuning option | [#1124](https://github.com/stnolting/neorv32/pull/1124) |
| 21.12.2024 | 1.10.7.6 | minor rtl cleanups and optimizations | [#1123](https://github.com/stnolting/neorv32/pull/1123) |
Expand Down
127 changes: 55 additions & 72 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -468,55 +468,47 @@ table (the channel number also corresponds to the according FIRQ priority: 0 = h
=== Address Space

As a 32-bit architecture the NEORV32 can access a 4GB physical address space. By default, this address space is
split into six main regions. Each region provides specific _physical memory attributes_ ("PMAs") that define
the access capabilities (`rwxac`; `r` = read permission, `w` = write permission, `x` - execute permission,
`a` = atomic access support, `c` = cached CPU access, `p` = privileged access only).
split into four main regions. All accesses to "unmapped" addresses (a.k.a. "the void") are redirected to the
<<_processor_external_bus_interface_xbus>>. For example, if the internal IMEM is disabled, the accesses to the
_entire_ address space between `0x00000000` and `0x7FFFFFFF` are converted into XBUS requests. If the XBUS interface
is not enabled any access to the void will raise a bus error exception.

.NEORV32 Processor Address Space (Default Configuration)
image::address_space.png[900]

.The "Void" (Unmapped Addresses)
[NOTE]
All accesses to "unmapped" addresses (= "void") are redirected to the <<_processor_external_bus_interface_xbus>>.
For example, if the internal IMEM is disabled, the accesses to the _entire_ address space between `0x00000000` and
`0x7FFFFFFF` are converted into XBUS requests. If the XBUS interface is not enabled any access to the void will
raise a bus error exception.
Each region provides specific _physical memory attributes_ ("PMAs") that define the access capabilities (`rwxac`;
`r` = read access, `w` = write access, `x` - execute access, `a` = atomic access, `c` = cached CPU access).

.Custom PMAs
[TIP]
Custom physical memory attributes enforced by the CPU's _physcial memory protection_ (<<_smpmp_isa_extension>>)
can be used to further constrain the physical memory attributes.

.Main Address Regions
[cols="<1,^4,^2,<7"]
[options="header",grid="rows"]
|=======================
| # | Region | PMAs | Description
| 1 | Internal IMEM address space | `rwxac-` | For instructions (=code) and constants; mapped to the internal <<_instruction_memory_imem>>.
| 2 | Internal DMEM address space | `rwxac-` | For application runtime data (heap, stack, etc.); mapped to the internal <<_data_memory_dmem>>).
| 3 | Memory-mapped XIP flash | `r-xac-` | Memory-mapped access to the <<_execute_in_place_module_xip>> SPI flash.
| 4 | Bootloader address space | `r-xa-p` | Read-only memory for the internal <<_bootloader_rom_bootrom>> containing the default <<_bootloader>>.
| 5 | IO/peripheral address space | `rwxa-p` | Processor-internal peripherals / IO devices.
| 6 | The "**void**" | `rwxac-` | Unmapped address space. All accesses to this region(s) are redirected to the <<_processor_external_bus_interface_xbus>> (if implemented).
| # | Region | PMAs | Description
| 1 | Internal IMEM address space | `rwxac` | For instructions / code and constants; mapped to the internal <<_instruction_memory_imem>> if implemented.
| 2 | Internal DMEM address space | `rwxac` | For application runtime data (heap, stack, etc.); mapped to the internal <<_data_memory_dmem>>) if implemented.
| 3 | Memory-mapped XIP flash | `r-xac` | Transparent memory-mapped access to an external <<_execute_in_place_module_xip>> SPI flash.
| 4 | IO/peripheral address space | `rwxa-` | Processor-internal peripherals / IO devices including the <<_bootloader_rom_bootrom>>.
| - | The "**void**" | `rwxa[c]` | Unmapped address space. All accesses to this region(s) are redirected to the <<_processor_external_bus_interface_xbus>> if implemented.
|=======================

.Privileged IO and BOOTROM Access Only
[IMPORTANT]
Only privileged accesses (M-mode) to the IO/peripheral and bootloader address spaces are allowed.
If an unprivileged application tries to access this address space a bus access error exception is raised.

.Custom PMAs
[TIP]
Custom physical memory attributes enforced by the CPU's _physcial memory protection_ (<<_smpmp_isa_extension>>)
can be used to further constrain the physical memory attributes.


:sectnums:
==== Bus System

The CPU can access all of the 32-bit address space from the instruction fetch interface and also from the data access
interface. Both CPU interfaces can be equipped with optional caches (<<_processor_internal_data_cache_dcache>> and
<<_processor_internal_instruction_cache_icache>>). The two CPU interfaces are multiplexed by a simple bus switch into
a _single processor-internal bus_. Optionally, this bus is further switched by another instance of the bus switch so the
<<_direct_memory_access_controller_dma>> controller can also access the entire address space. Accesses via the
resulting SoC bus are split by the <<_bus_gateway>> that redirects accesses to the according main address regions
(see table above). Accesses to the processor-internal IO/peripheral devices are further redirected via a
dedicated <<_io_switch>>.
The CPU provides individual interfaces for instruction fetch and data access. It can can access all of the 32-bit
address space from each of the interface. Both of them can be equipped with optional caches (<<_processor_internal_data_cache_dcache>>
and <<_processor_internal_instruction_cache_icache>>).

The two CPU interfaces are multiplexed by a simple bus switch into a _single processor-internal bus_. Optionally,
this bus is further multiplexed by another instance of the bus switch so the <<_direct_memory_access_controller_dma>>
controller can also access the entire address space. Accesses via the resulting SoC bus are split by the <<_bus_gateway>>
that redirects accesses to the according main address regions (see table above). Accesses to the processor-internal
IO/peripheral devices are further redirected via a dedicated <<_io_switch>>.

.Processor-Internal Bus Architecture
image::neorv32_bus.png[1300]
Expand All @@ -533,31 +525,27 @@ See sections CPU <<_architecture>> and <<_bus_interface>> for more information r
:sectnums:
==== Bus Gateway

The central bus gateway serves two purposes: **redirect** core accesses to the according modules (e.g. memory accesses
vs. memory-mapped IO accesses) and **monitor** all bus transactions. The redirection of access request is based on a
The central bus gateway serves two purposes: it **redirects** accesses to the according modules (e.g. memory accesses
vs. memory-mapped IO accesses) and also **monitors** all bus transactions. The redirection of access request is based on a
customizable memory map implemented via VHDL constants in the main package file (`rtl/core/neorv323_package.vhd`):

.Main Address Regions Configuration in the VHDL Package File
[source,vhdl]
----
-- Main Address Regions ---
constant mem_imem_base_c : std_ulogic_vector(31 downto 0) := x"00000000";
constant mem_dmem_base_c : std_ulogic_vector(31 downto 0) := x"80000000";
constant mem_xip_base_c : std_ulogic_vector(31 downto 0) := x"e0000000";
constant mem_imem_base_c : std_ulogic_vector(31 downto 0) := x"00000000"; -- IMEM size via generic
constant mem_dmem_base_c : std_ulogic_vector(31 downto 0) := x"80000000"; -- DMEM size via generic
constant mem_xip_base_c : std_ulogic_vector(31 downto 0) := x"e0000000"; -- page (4 MSBs) only!
constant mem_xip_size_c : natural := 256*1024*1024;
constant mem_boot_base_c : std_ulogic_vector(31 downto 0) := x"ffffc000";
constant mem_boot_size_c : natural := 8*1024;
constant mem_io_base_c : std_ulogic_vector(31 downto 0) := x"ffffe000";
constant mem_io_size_c : natural := 8*1024;
constant mem_io_base_c : std_ulogic_vector(31 downto 0) := x"ffe00000";
constant mem_io_size_c : natural := 32*64*1024; -- = 32 * iodev_size_c
----

Besides the delegation of bus requests the gateway also implements a bus monitor (aka "the bus keeper") that tracks all
active bus transactions to ensure _safe_ and _deterministic_ operations.

Whenever a memory-mapped device is accessed (a real memory, a memory-mapped IO or some processor-external module) the bus
monitor starts an internal timer. The accessed module has to respond ("ACK") to the bus request within a specific
**time window**. This time window is defined by a global constant in the processor's VHDL package file
(`rtl/core/neorv323_package.vhd`).
Besides the redirecting of bus requests the gateway also implements a bus monitor (aka "the bus keeper") that tracks all
active bus transactions to ensure _safe_ and _deterministic_ operations. Whenever a memory-mapped device is accessed (a
real memory, a memory-mapped IO or some processor-external module) the bus monitor starts an internal countdown. The
accessed module has to respond ("ACK") to the bus request within a bound **time window**. This time window is defined
by a global constant in the processor's VHDL package file (`rtl/core/neorv323_package.vhd`).

.Internal Bus Timeout Configuration
[source,vhdl]
Expand Down Expand Up @@ -662,12 +650,6 @@ constant base_io_slink_c : std_ulogic_vector(31 downto 0) := x"ffffec00";
constant base_io_dma_c : std_ulogic_vector(31 downto 0) := x"ffffed00";
----

.IO Access Latency
[IMPORTANT]
In order to shorten the critical path of the IO system, the IO switch contain a partial register stage that
buffers the address bus. Hence, accesses to the processor-internal IO region requires an additional clock cycle
to complete.


<<<
// ####################################################################################################################
Expand Down Expand Up @@ -737,30 +719,31 @@ need for an explicit initialization / executable upload.
:sectnums:
=== Processor-Internal Modules

.Privileged IO Access Only
.Full-Word Write Accesses Only
[IMPORTANT]
Only privileged accesses (M-mode) to the IO/peripheral modules are allowed. If an unprivileged application
tries to access this address space a bus access error exception is raised.
All peripheral/IO devices should only be accessed in full-word mode (i.e. 32-bit).
Byte or half-word (8/16-bit) write accesses might cause undefined behavior.

.Full-Word Write Accesses Only
[NOTE]
All peripheral/IO devices should only be written in full-word mode (i.e. 32-bit). Byte or half-word (8/16-bit) write accesses
might cause undefined behavior.
.IO Module Address Space
[IMPORTAN]
Each peripheral/IO module occupies an address space of 64kB bytes. Most devices do not fully utilize this
address space and will _mirror_ the available memory-mapped registers across the entire 64kB address space.
However, accessing memory-mapped registers other than the specified ones should be avoided.

.Writing to Read-Only Registers
.Unimplemented Modules / Address Holes
[NOTE]
Unless otherwise specified, writing to registers that are listed as read-only does not trigger an exception.
The write access is simply ignored by the corresponding hardware module.
When accessing an IO device that hast not been implemented (disabled via the according generic)
or when accessing an address that is actually unused, a load/store access fault exception is raised.

.IO Module's Address Space
.Writing to Read-Only Registers
[NOTE]
Each peripheral/IO module occupies an address space of 256 bytes (64 words). Most devices do not fully utilize this address
space and will simply _mirror_ the available interface registers across the entire 256 bytes of address space.
Unless otherwise specified, writing to registers that are listed as read-only does not trigger an exception
as the write access is simply ignored by the corresponding hardware module.

.Unimplemented Modules / Address Holes
.IO Access Latency
[NOTE]
When accessing an IO device that hast not been implemented (disabled via the according generic)
or when accessing an address that is actually unused, a load/store access fault exception is raised.
In order to shorten the critical path of the IO system, the IO switch provides register stages for the request and
response buses.Hence, accesses to the processor-internal IO region require two additional clock cycles to complete.

.Module Interrupts
[NOTE]
Expand Down
9 changes: 5 additions & 4 deletions docs/datasheet/soc_bootrom.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,11 @@

The boot ROM contains the executable image of the default NEORV32 <<_bootloader>>. When the
<<_boot_configuration>> is set to _bootloader_ mode (0) via the `BOOT_MODE_SELECT` generic, the
boot ROM is automatically enabled and the CPU boot address is automatically adjusted to the
base address of the boot ROM.
boot ROM is automatically enabled and the CPU boot address is adjusted to the base address of the boot ROM.

.Bootloader Image
[IMPORTANT]
The boot ROM is initialized during synthesis with the default bootloader image
(`rtl/core/neorv32_bootloader_image.vhd`). Note that the BOOTROM size is constrained to 4kB.
The bootloader ROM is initialized during synthesis with the default bootloader image
(`rtl/core/neorv32_bootloader_image.vhd`). The physical size of the ROM is automatically
adjusted to the next power of two of the image size. However, note that the BOOTROM is
constrained to a maximum size of 64kB.
12 changes: 6 additions & 6 deletions docs/datasheet/soc_cfs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
**Overview**

The custom functions subsystem is meant for implementing custom tightly-coupled co-processors or interfaces.
IT provides up to 64 32-bit memory-mapped read/write registers (`REG`, see register map below) that can be
IT provides up to 16384 32-bit memory-mapped read/write registers (`REG`, see register map below) that can be
accessed by the CPU via normal load/store operations. The actual functionality of these register has to be
defined by the hardware designer. Furthermore, the CFS provides two IO conduits to implement custom on-chip
or off-chip interfaces.
Expand Down Expand Up @@ -94,9 +94,9 @@ If the CFU output signals are to be used outside the chip, it is recommended to
[options="header",grid="all"]
|=======================
| Address | Name [C] | Bit(s) | R/W | Function
| `0xffffeb00` | `REG[0]` |`31:0` | (r)/(w) | custom CFS register 0
| `0xffffeb04` | `REG[1]` |`31:0` | (r)/(w) | custom CFS register 1
| ... | ... |`31:0` | (r)/(w) | ...
| `0xffffebf8` | `REG[62]` |`31:0` | (r)/(w) | custom CFS register 62
| `0xffffebfc` | `REG[63]` |`31:0` | (r)/(w) | custom CFS register 63
| `0xffeb0000` | `REG[0]` |`31:0` | (r)/(w) | custom CFS register 0
| `0xffeb0004` | `REG[1]` |`31:0` | (r)/(w) | custom CFS register 1
| ... | ... |`31:0` | (r)/(w) | ...
| `0xffebfff8` | `REG[16382]` |`31:0` | (r)/(w) | custom CFS register 16382
| `0xffebfffc` | `REG[16383]` |`31:0` | (r)/(w) | custom CFS register 16383
|=======================
8 changes: 4 additions & 4 deletions docs/datasheet/soc_crc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@ and for CRC32-mode the entire 32-bit of `POLY` and `SREG` are used.
[options="header",grid="all"]
|=======================
| Address | Name [C] | Bit(s), Name [C] | R/W | Function
.2+<| `0xffffee00` .2+<| `CTRL` <|`1:0` ^| r/w <| CRC mode select (`00` CRC8, `01`: CRC16, `10`: CRC32)
.2+<| `0xffee0000` .2+<| `CTRL` <|`1:0` ^| r/w <| CRC mode select (`00` CRC8, `01`: CRC16, `10`: CRC32)
<|`31:2` ^| r/- <| _reserved_, read as zero
| `0xffffee04` | `POLY` |`31:0` | r/w | CRC polynomial
.2+<| `0xffffee08` .2+<| `DATA` <|`7:0` ^| r/w <| data input (single byte)
| `0xffee0004` | `POLY` |`31:0` | r/w | CRC polynomial
.2+<| `0xffee0008` .2+<| `DATA` <|`7:0` ^| r/w <| data input (single byte)
<|`31:8` ^| r/- <| _reserved_, read as zero, writes are ignored
| `0xffffee0c` | `SREG` |`32:0` | r/w | current CRC shift register value (set start value on write)
| `0xffee000c` | `SREG` |`32:0` | r/w | current CRC shift register value (set start value on write)
|=======================
8 changes: 4 additions & 4 deletions docs/datasheet/soc_dma.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ register).
[options="header",grid="all"]
|=======================
| Address | Name [C] | Bit(s), Name [C] | R/W | Function
.12+<| `0xffffed00` .12+<| `CTRL` <|`0` `DMA_CTRL_EN` ^| r/w <| DMA module enable
.12+<| `0xffed0000` .12+<| `CTRL` <|`0` `DMA_CTRL_EN` ^| r/w <| DMA module enable
<|`1` `DMA_CTRL_AUTO` ^| r/w <| Enable automatic mode (FIRQ-triggered)
<|`2` `DMA_CTRL_FENCE` ^| r/w <| Issue a downstream FENCE operation when DMA transfer completes (without errors)
<|`7:3` _reserved_ ^| r/- <| reserved, read as zero
Expand All @@ -154,9 +154,9 @@ register).
<|`15` `DMA_CTRL_FIRQ_TYPE` ^| r/w <| Trigger on rising-edge (`0`) or high-level (`1`) or selected FIRQ channel
<|`19:16` `DMA_CTRL_FIRQ_SEL_MSB : DMA_CTRL_FIRQ_SEL_LSB` ^| r/w <| FIRQ trigger select (FIRQ0=0 ... FIRQ15=15)
<|`31:20` _reserved_ ^| r/- <| reserved, read as zero
| `0xffffed04` | `SRC_BASE` |`31:0` | r/w | Source base address (shows the last-accessed source address when read)
| `0xffffed08` | `DST_BASE` |`31:0` | r/w | Destination base address (shows the last-accessed destination address when read)
.6+<| `0xffffed0c` .6+<| `TTYPE` <|`23:0` `DMA_TTYPE_NUM_MSB : DMA_TTYPE_NUM_LSB` ^| r/w <| Number of elements to transfer (shows the last-transferred element index when read)
| `0xffed0004` | `SRC_BASE` |`31:0` | r/w | Source base address (shows the last-accessed source address when read)
| `0xffed0008` | `DST_BASE` |`31:0` | r/w | Destination base address (shows the last-accessed destination address when read)
.6+<| `0xffed000c` .6+<| `TTYPE` <|`23:0` `DMA_TTYPE_NUM_MSB : DMA_TTYPE_NUM_LSB` ^| r/w <| Number of elements to transfer (shows the last-transferred element index when read)
<|`26:24` _reserved_ ^| r/- <| reserved, read as zero
<|`28:27` `DMA_TTYPE_QSEL_MSB : DMA_TTYPE_QSEL_LSB` ^| r/w <| Quantity select (`00` = byte -> byte, `01` = byte -> zero-extended-word, `10` = byte -> sign-extended-word, `11` = word -> word)
<|`29` `DMA_TTYPE_SRC_INC` ^| r/w <| Constant (`0`) or incrementing (`1`) source address
Expand Down
8 changes: 4 additions & 4 deletions docs/datasheet/soc_gpio.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ be performed within a single clock cycle.
[options="header",grid="rows"]
|=======================
| Address | Name [C] | Bit(s) | R/W | Function
| `0xfffffc00` | `INPUT[0]` | 31:0 | r/- | parallel input port pins 31:0
| `0xfffffc04` | `INPUT[1]` | 31:0 | r/- | parallel input port pins 63:32
| `0xfffffc08` | `OUTPUT[0]` | 31:0 | r/w | parallel output port pins 31:0
| `0xfffffc0c` | `OUTPUT[1]` | 31:0 | r/w | parallel output port pins 63:32
| `0xfffc0000` | `INPUT[0]` | 31:0 | r/- | parallel input port pins 31:0
| `0xfffc0004` | `INPUT[1]` | 31:0 | r/- | parallel input port pins 63:32
| `0xfffc0008` | `OUTPUT[0]` | 31:0 | r/w | parallel output port pins 31:0
| `0xfffc000c` | `OUTPUT[1]` | 31:0 | r/w | parallel output port pins 63:32
|=======================
Loading