Skip to content

Releases: POETSII/tinsel

Tinsel 0.8

01 Jul 07:18
d11655e
Compare
Choose a tag to compare

This release introduces per-board programmable routers, extending mailbox-local multicast (Tinsel 0.7) with global multicast features. In particular, a thread can efficiently send a message to multiple threads distributed over multiple boards using programmable routers. Programmable routers also allow the work of sending a message to multiple destinations to be completely offloaded from the cores; a single message can be sent to the router, which will take care of the multicast. There is a trade-off here between offloading work from the cores and overloading the programmable routers. In our experience, a mix of local sending and offloading works best.

The Tinsel API is backwards compatible with Tinsel 0.7. We've added support for two send slots per thread, as requested by colleagues. See tinselSendSlotExtra() and HostLinkParams::useExtraSendSlot.

POLite has been updated to use programmable routers (currently for any message destined for multiple boards). We have also improved the runtime of the POLite mapper in a couple of ways: (1) mailbox-local routing keys are now calculated much more efficiently; (2) we use OpenMP to parallelise the hierarchical partitioning stage. Various static and dynamic parameters to POLite have been added and documented.

See the README for more details.

Tinsel 0.7

02 Dec 12:58
Compare
Choose a tag to compare

This release introduces mailbox-local multicast, i.e. the ability for a thread to send a message simultaneously to any subset of threads on a specified destination mailbox. POLite has been extended to use the feature automatically.

There have been a some changes to the Tinsel API. The tinselSlot() and tinselAlloc() API calls have been dropped. Instead, a static number of message slots have been reserved per thread for sending messages, and a pointer to one of these slots can be obtained by calling tinselSendSlot(). All other message slots are implicitly made available for receiving messages. After receiving a message via tinselRecv(), and processing it, a thread can indicate it has finished with the message via a new function tinselFree(). For termination detection, a message is considered in-flight until it has been freed.

There have also been a few minor changes to the POLite API. The PMessage type is no longer parameterised by the edge type. Edge properties are now edge states, i.e. they can be modified in the receive handler. The POLITE_MAX_FANOUT macro has been dropped. A new macro POLITE_NUM_PINS is provided, which is 1 by default (applications requiring more than one pin per device will need to set this accordingly). The mapEdgesToDRAM variable has now been split in two: mapOutEdgesToDRAM and mapInEdgesToDRAM (there are now routing tables at both sender and receiver sides).

See the Tinsel documentation and API listings for more details.

Tinsel 0.6

11 Apr 07:52
Compare
Choose a tag to compare

This release enables the Tinsel overlay to operate over multiple POETS boxes, so applications can utilise all the FPGA boards in a multi-box POETS system. Any sub-mesh of boxes can be used by an application, and applications using non-overlapping sub-meshes can run independently, i.e. at the same time, and sandboxed.

The release introduces some software changes to the HostLink and DebugLink APIs (which are not backwards-compatible) and a few hardware tweaks (none of which affect the Tinsel API).

The main hardware change is that the coordinates of each FPGA board, and the number of boards being used by an application, are now settable from software, over HostLink/DebugLink.

Each FPGA bridge board now has two connections to the FPGA mesh. All bridge boards are connected at the east and west sides of the FPGA mesh.

The main software changes are:

  • A new Board Control Daemon (boradctrld) that runs on each box, forwarding DebugLink traffic to/from the individual UARTs via a TCP port.

  • There is now a version of the HostLink constructor that lets you specify the number of boxes you wish to use.(Specifically, you provide the dimensions of the box sub-mesh).

  • The array of DebugLinks (one DebugLink per FPGA) in the HostLink class is now replaced by a single DebugLink to all FPGAs in the system.

  • DebugLink methods are now parameterised by the mesh coordinates of the board you want to talk to.

Although these changes are not backwards-compatible, they are fairly straightforward. See the Tinsel documentation and API listings for details.

This release also includes early support for custom accelerators.

Tinsel 0.5

08 Jan 11:15
Compare
Choose a tag to compare

This release provides a new DE5-Net worker image and a new DE5-Net bridge image. The new worker image provides:

  • 64 cores
  • 16 threads per core
  • (1024 threads in total)
  • 16 mailboxes
  • 16 caches
  • 16 floating-point units
  • 2D network-on-chip
  • two DDR3 DRAM controllers
  • four QDRII++ SRAM controllers
  • four 10Gbps reliable links
  • a JTAG UART
  • hardware idle-detection (new)

The clock frequency is 250MHz and the resource utilisation is 61% of the DE5-Net.

The POLite frontend has been updated to use the new idle detection feature for termination detection and (optional) synchronous execution. More POLite benchmarks have been added and the PDevice interface has been generalised to allow edge weights.

Performance counters for cache hit/miss and CPU utilisation have also been added.

Tinsel 0.4

10 Sep 15:44
Compare
Choose a tag to compare

This release provides a new DE5-Net worker image. The new worker image provides:

  • 64 cores
  • 16 threads per core
  • (1024 threads in total)
  • 16 mailboxes
  • 16 caches
  • 16 floating-point units
  • 2D network-on-chip (new)
  • two DDR3 DRAM controllers
  • four QDRII++ SRAM controllers (new)
  • four 10Gbps reliable links
  • a JTAG UART

The clock frequency is 250MHz and the resource utilisation is 135K ALMs, 58% of the DE5-Net. Notably, the DE5-Net's off-chip SRAMs are now memory mapped into the Tinsel address space.

The POLite frontend has also been extended to use the off-chip SRAMs and to support more efficient multi-casting, although there's still no support for R-devices yet.

This release also includes tweaks that make the power-reset mechanism more robust.

Tinsel 0.3

11 Jun 09:53
Compare
Choose a tag to compare

This release provides a two DE5-NET images: worker and bridge.
The worker image uses around 50% of the DE5-NET and provides:

  • 64 RV32IMF cores
  • 16 threads per core
  • (1024 threads in total)
  • 2 DDR3 controllers
  • 16 caches
  • 16 mailboxes (+ NoC)
  • 16 FPUs
  • 4 10Gbps reliable links
  • a JTAG UART

The bridge image allows a PC to be connected to a network of worker
boards. It provides:

  • 1 PCI Express core
  • 1 10GBps reliable link

The release also includes the following software:

  • Kernel driver giving bridge DMA to host memory
  • PCIe stream daemon providing stream abstraction over sockets
  • HostLink API to load programs, read/write memory, inject/receive messages
  • Sample POETS frontends: POLite and Synch
  • Sample POETS applications: heat transfer, path finding, sorting
  • Various programs for benchmarking and testing

The release has been tested on a prototype 4-board box and works with
various software stacks and applications from Imperial.

Tinsel v0.2

12 Apr 17:13
Compare
Choose a tag to compare

This release provides a single-board configuration for Terasic's DE5-NET with:

  • 64 RV32IM cores
  • 16 threads per core
  • (1024 threads in total)
  • 2 DDR3 controllers
  • 16 caches
  • 16 mailboxes
  • a JTAG UART

It uses under 40% of the DE5-NET so there's plenty of space to add interboard comms, floating-point, and more cores in the near future.

Also included is a tinsel version of the heat diffusion application used in the SpiNNaker project and a selection of micro benchmarks.

The hardware is unchanged since the January 2017 release candidate, but the accompanying software and documentation has been improved.

Tinsel v0.2 RC

17 Jan 23:08
Compare
Choose a tag to compare
Tinsel v0.2 RC Pre-release
Pre-release

This release candidate includes a single-board configuration for Terasic's DE5-NET with:

  • 64 RV32IM cores
  • 16 threads per core
  • (1024 threads in total)
  • 2 DDR3 controllers
  • 16 caches
  • 16 mailboxes
  • a JTAG UART

It uses under 40% of the DE5-NET so there's plenty of space to add interboard comms in the near future.

Also included is a tinsel version of the heat diffusion application used in the SpiNNaker project.