requirements.tex

\section{Requirements \& Design Considerations}
\label{sec:requirements}
\label{sec:design}
This section attempts to capture the design requirements considered through
the \bus design process and the rationale for the tradeoffs ultimately made.

\paragraph{Define bus PHY, MAC and application logic interface}
We take a vertical integration approach, controlling more of the complete
stack in an effort to obtain every available ounce of energy efficiency within
our further design constraints.

\paragraph{Must be fully-synthesizable}
In an effort to make \bus ubiquitous, it is important that other designers can
easily add \bus to their systems/chips. In practice, this means a
fully-synthesizable design such that simple, pure Verilog can be given to any
individual wishing to implement \bus. This provides added advantages in
pre-silicon testing and validation.

\paragraph{Optimize for synthesis simplicity over speed}
The synthesizability is paramount. As example, standard cell libraries contain
both positive and negative edge triggered flip-flops, but a dual-edge
flip-flop is not available in many processes. It is important then that no
element of \bus design require an implementation that double-clocks flops.

\paragraph{Must be low-power, low-leakage}
The \bus was designed for the Michigan Micro Mote~(M3) project. The system is
designed to run off a 0.5~$\mu$Ah battery. To serve as a feasible
communication bus, the energy consumption must be on the order of $pW$.

This extreme power constraint eliminates traditional analog open-collector
style designs as options. Sufficiently low-leakage circuits would have
untenable drive strength requirements and/or rise time constraints. The
previously stated portability requirements eliminate custom crafted analog
elements as an option, restricting \bus design to a purely digital approach.
%XXX: Ballpark expected transaction sizes, frequency, etc -- actual energy budget.

\paragraph{Support Multi-master operation / bus arbitration / avoid race
conditions}
As every layer is capable of entering an extreme low-power state, any layer
must be capable of waking the system. The waking layer must also be capable of
communicating (i) who woke the system and (ii) why. While such a system could
query (ii) given (i), determining who woke the system requires a complex
dedicated controller, or for the waking layer to simply announce it. A
multi-master bus greatly simplifies the software design and allows for things
such as DMA from an imager chip to a radio chip without necessarily requiring
CPU intervention.

Once multi-master is agreed upon, some form of arbitration scheme must be
devised, as well as consideration for races as multiple layers attempt to
acquire the bus.

\paragraph{Minimize pins}
\paragraph{Minimize gates}
\paragraph{Limit the number of I/O pads}
The physical area of the M3 system is extremely constrained. There is only
room for four I/O {\em pads} on each layer of the system (keep in mind that
forming a net/bus still requires two {\em pads} when wirebonding).

\paragraph{Minimize timing uncertainty/latency/jitter}

\paragraph{Allow for wakeup that takes time}
As the system supports extremely low-power sleep, the bus must tolerate /
account for a lag in waking layers of the system. In practice this is
mostly a focus on the bus layer controller for each layer at first, followed
by layer-specific concerns of how much local buffer should be necessary and
when/how to wake the proper layer.

\paragraph{Active vs sleep mode; want backward compatibility}
The bus needs an idle state that consumes very little power. Ideally, each
layer not participating in the current transaction also burns very little
power.

There should also be some consideration for future-proof design, that is the
introduction of a newer protocol that can run on the bus that older
controllers can silently ignore.

\paragraph{Ensure that bus reset is possible}
Stuff happens. It is important that there exists some kind of escape hatch to
rescue / reset the bus. For deployed systems with multi-year lifetimes, the
ability to reset to a known state from nearly any possible erroneous state
becomes critical. Details of reset design are discussed
in~\ref{sec:design-reset}~\nameref{sec:design-reset}.

\paragraph{Support variable-length packet sizes}
Short packet length limits (e.g. order 4~bytes) lead to unacceptably high
protocol overheads for large (order 1~kB+) messages. An upper bound on
packet-length leads to possibly complex software-level fragmentation schemes
(or worse hardware-level if the bus packet length $<$ message buffer).

A leading length field with a reasonably large upper bound would then be
acceptable, although not ideal. The goal is to allow for true arbitrary length
packets if possible.

\paragraph{Consider automatic address space management protocol?}
There is an extensibility / simplicity trade-off. I2C only allowed for 128
addresses, which for an individual bus instance was seen as more than enough
(still pretty true with the loading requirements, but repeaters / extenders
make that less true). The real issue though is address space collisions
between generic components. To build a plug 'n play bus of any arbitrary
pieces requires the whole corpus of parts to have non-conflicting addresses.
This motivates extensibility.

\bus strikes a balance in its address design, using short, 8~bit addresses in
the common case but defining a full, 32~bit address to resolve address space
collisions. Addressing details are discussed
in~\ref{sec:spec-address}~\nameref{sec:spec-address}.

\paragraph{Allow interposing of the bus to read out / insert data from
external source}
Sniffing pretty much any bus design is easy, but adding devices post-hoc is a
harder requirement. There is great motivation for transient connection of
devices (programmer, debugger, etc), and also some motivation for the ability
to permanently add new devices to an existing bus.

As \bus employs a ring style connection, injecting a new device in an existing
bus may not be trivial. Interposition of external devices is not seen as a
requirement for \bus to define, however, simply as something to caution system
implementers as desirable. For one example of bus injection with minimal
pad/pin overhead, consult the M3~Implementation document.

\paragraph{All wires must be uni-directional}
Bi-Directional analog hardware is complex and not synthesizable.

\paragraph{Granularity}
\label{sec:design-granularity}
Transmissions are required to be modulo 8 bits in length. The motivation for
this is largely a function of the arbitrary nature of the last two bits
received. It truly arbitrary lengths were allowed, an interruption occurring
after 9~bits were transmitted could be any of 0, 1, or 2~bytes long
(0~bytes: out-of-band interruption, 1~byte: End~of~Message where receiver
precedes transmitter and thus discards two bits, 2~bytes: End~of~Message where
receiver follows transmitter and thus all 9~bits are valid).

This hypothetical receiver's design is further complicated if the receiving
node is (locally) clockless. The bus interface cannot reliably hand off any
bytes until it receives the \textit{\nth{11}} bit. It cannot reliably learn
that it has received two bytes until the Begin~Control edge. There are only
three bus clocks between that edge and Idle, and only one edge until the
receiver would be obligated to ACK or NAK, a very challenging design
constraint to occasionally omit up to 7~bits from a message.

\paragraph{Endianness}
\subparagraph{Byte-Level}
No specific design consideration here per se. Given that \bus supports
messages of an arbitrary length, it simply felt more logical to send bytes
from 0{\ldots}N instead of N{\ldots}0.
\subparagraph{Bit-Level}
Again an arbitrary decision. The first test implementation elected to use MSB
and the decision was made.

\paragraph{Retransmission}
Hardware retransmission carries risks of accidentally locking up the bus,
endlessly retransmitting. While ideas such as a maximum retry count (SMBus)
mitigate this, the added complexity and risk of endless hardware
retransmission tip the scales in favor of pushing the retransmission decisions
to software.

To provide the software layer with more information, we choose to require the
indication of the number of bytes actually transmitted. While transient faults
could certainly occur, it is reasonable that the software can infer different
{\em probable} causes from the amount of the message that was transmitted.

\paragraph{Flow Control}
Any form of flow control requires communication {\em mid-transmission} between
the transmitting and receiving node. This could be accomplished by
periodically (e.g. every word) switching roles, but this introduces a large
amount of complexity. Another solution is to enable some form of clock
stretching, but these designs are either analog in nature (e.g.  I2C) or
prohibitively complex to implement.

As \bus supports arbitrary interruptions, a simpler design is possible.
If a receiving node's RX buffer is overrun, it interrupts the bus and
indicates a buffer overrun. Not interrupting the bus acts as an {\em implicit}
flow control.
In practice, transmitting nodes should consider sending some form of {\em
ping} message to a destination node to validate its presence before sending a
large transmission to minimize waste if the destination node is not actually
present. \bus is not designed for a changing topology, as such discovery of
this nature need only be performed once and the amortized cost is considered
negligible.

\paragraph{Acknowledgment granularity}
As \bus supports messages of arbitrary length, a natural question arises for
the granularity of acknowledgments. At one extreme, designs such as I2C elect
to acknowledge every byte. \bus takes the opposite approach, providing only a
single acknowledgment at the end of a transmission in an all-or-nothing
fashion. The rationale for this decision again returns to simplicity and
efficiency in design. While a receiver-driven acknowledgment cycle could
easily be added every nth bit, the relative merits of this are not great.
Assuming a receiver exists on the bus, by not electing to Interrupt
transmission, a receiver implicitly ACKs every bit sent. A receiver that has
occasion to NAK a transmission may do so at any time during transmission. The
obvious weakness of the implicit ACK is an absent receiver, however \bus
expects a static topology and thus this is not considered an important
consideration.

\paragraph{Streaming}
Some bus designs incorporate a ``streaming'' mode, which allows for a series
of physical bus transactions to be considered one contiguous message at the
receiving layer. As \bus allows for arbitrary length messages, such a feature
becomes largely unnecessary, and \bus defines no streaming primitive.

\paragraph{Avoid ratioed-logic, avoid timing constraints}

\paragraph{Idle State Should Be High}
For designs that aggressively power-gate components, it is easier to build
designs that pull low in minimal power modes than pull high. As a consequence,
the Idle state in \bus elects to leave all lines high.
%We probably don't have to mention this, but it's important that they are high
%since we can pull down a node with minimum power consumption (vs pulling a
%node high). When a node pulls low, the pmu is still in standby mode so,
%there's not much power.  I'm not sure if we want to get into the power budgets
%of the nodes in different modes - it will depend on the pmu design...

\subsection{Reset Design}
\label{sec:design-reset}
The reset mechanism is the \bus escape hatch. Its reliability is critical such
that no matter what state \bus nodes are in, they can all be reliably forced
back to a known state. Special care must be taken that no assumptions are made
about the state of any node in the system. As example, in normal operation
only one node should ever not be forwarding. The reset mechanism, however,
must correct for more broken cases where any N nodes are not forwarding and
return all nodes to a common and known state.

\subsubsection{The Reset Detector}
\bus's reset detector is advantageous in that it is completely isolated from
the \bus state machine. To reset \bus, the Interrupt sequence is entered. This
entry saturates, allowing nodes to be `endlessly-interrupted' until all of the
nodes have been interrupted. A misaligned state machine, or a timing glitch,
or other failures related to normal operation cannot affect the interrupt
entry detection, as the detector is an isolated, dedicated circuit. The exact
requirements for the Interrupt detector are discussed
in~\ref{sec:bus-interjection}~\nameref{sec:bus-interjection}.

\subsubsection{Hung Nodes}
\label{sec:reset-hung}
We define ``hung'' as an erroneous state from which a node would never depart
without external intervention. We are not concerned with how a node got into
an erroneous state---a pathological series of SEUs, whatever---the focus is on
how a node excises itself.

\paragraph{Busy During Bus Idle}
In this state, the \bus is Idle, but a node believes there is still a
transmission taking place and will wait forever for a new clock edge,
constantly reporting the \bus as busy.

There is a small window where it is possible to enter this state as a node
must error during one of the bits in the control sequence. A node will be
stuck in this state until another transmission occurs. The behavior of an
erroneous node is not well-defined.

\begin{quote}
\textit{Aside:}
A node with its own local sense of time could include a mechanism that bounds
the period of time without a pulse on the clock or data lines, but such a
mechanism is outside the scope of this document. The \bus neither prohibits
nor requires clock stretching. A minimum clock period is specified as a
function of \bus topology, but a maximum is not. If building a design with
such an escape mechanism then, designers must be conscious of the
implementation details of its master node.
\end{quote}

\paragraph{Endless TX}
\label{sec:reset-hung-endless-tx}
If a transmitter enters an error state and never interrupts the bus, other
nodes will not be able to distinguish an endless transmission from a
transmission with a long string of {\tt 0}'s (or {\tt 1}'s). \bus relies on
the controller to detect the error.  While \bus does allow for arbitrary
message lengths, a master node must define a maximum message length for a
particular \bus instantiation\footnote{
  It is suggested that this value be programmable, for maximum flexibility.
  }. The master node counts the number of bits received and can forcefully
reset the bus if necessary. Address bits count towards this limit. Interrupt
entry edges do not count towards this total as the rising clock edges will not
be seen by the master node's {\tt CLK\_IN}. The intention of the limit is not
to be used ever during correct \bus function, and thus should be set well
above the maximum expected message size. The absolute minimum limit for a
conforming \bus is 1~kb ($[1024 - addr\_len]$ data bits).

\subsection{Power Gated Nodes}
\label{sec:design-power-gated}
As \bus is designed for extremely low-power systems, it is reasonable to
address the needs for the coordinated wake-up of power gated nodes. A power
gated device requires a series of well-organized steps to awake in a known
state. This waking is much like a standard chip power-on-reset sequence, only
\bus design seeks to avoid imposing custom delay elements on the construction
of a node's bus controller.

The following sequences are designed for nodes that are possibly receiving the
transmission. Waking the transmitting node is left to alternative means by the
implementation, with the tacit assumption that if a node has something it
wishes to transmit, it is probably already awake\footnote{
  Though, a design that wished to exploit these pulses could take advantage of
  the spurious wakeup rules described in~\ref{sec:spec-spurious}. Allow the
  first transmission to fail and use the available pulses to wake itself. This
  design would still require some care, as the node would have to begin
  forwarding its data line again before the arbitration pulse. Using the
  falling edge of the clock line as the transition from pulling data low to
  forwarding would be sufficient to achieve this goal.
}.

The generalized wake-up sequence for power-gated devices requires up to four
signals:
%
\begin{itemize}
  \item Power on (remove power-gating)
  \item Warm up local oscillators
  \item De-assert Reset
  \item Remove isolation
\end{itemize}
%
Power gated \bus member nodes are able to use the bus's clock line to provide
all four of these edges. The first four positive clock edges of a \bus
transmission are: arbitration, priority drive, priority latch, and begin
transmission, none of which require receiver intervention. The fifth clock
edge latches the first bit of the transmitted address. The implication is that
a node using \bus clock edges to wake from power-gating must reset into a
state that is prepared to receive the first address bit.

Sleeping a node is a simpler process, requiring only two signals:
%
\begin{itemize}
  \item Isolate device
  \item Power off (power-gate)
\end{itemize}
%
By the time a node is sending (or forwarding) the second control bit, it has
decided whether it will be putting itself to sleep or not. It a node intends
to sleep, it can use the clock pulse that latches the second control bit and
the pulse that formally transitions the bus to idle as the two required edges.

\subsection{Design Caveats}
Here we attempt to draw attention to some of the systems-level issues that may
be presented to persons using \bus.

\subsubsection{Broadcast Message ``Acknowledgment''}
For point-to-point messages, \bus acknowledgments provide a strong guarantee
to higher layers:
%
\begin{quote}
\textit{The message was received in full or the message was not received at
all.}
\end{quote}
%
For broadcast messages, the \bus semantics are weaker:
%
\begin{quote}
\textit{The message was received in full \uline{by at least one node} or the
message was not received at all \uline{by any node}.}
\end{quote}
%
Systems cannot rely on a ``successfully''
broadcast transmission as {\em guaranteeing} that every node has received the
message.  If an absolute guarantee is required, systems must build one atop
point-to-point messages.


\subsubsection{Interruptions: Arbitrary Length Messages and Forward Progress}
\bus design provides a powerful tool: the ability for any node to interrupt
any message at any point. Such a mechanism invites risk of live-lock on the
bus, however.

As a first example, consider two nodes that simultaneously wish to send an
interrupt-worthy high-priority message. One node will win arbitration and
begin transmitting, only to be immediately interrupted by the other. The two
nodes will continuously interrupt one another in an effort to send their very
high-priority message, and neither will ever succeed.

Another form of live-lock stems from periodic low-latency messages. Consider a
system with a timer chip that needs to generate periodic 100~ms pulses that
another chip will need to respond to in a bounded latency. The responsiveness
requirement will place an interrupt-class priority message on the bus every
100~ms. On a 400~kHz instantiation, if another node (say a camera node)
attempts a bulk transfer (an image) it will only be able to send approximately
40~kilo{\em bits} before being interrupted. A picture over 5~kB will never
send.

As the only physical bus available for a complex system, \bus is required to
support both timing-critical messages and large data transfers. To provide
sufficient flexibility to efficiently handle both types of messages while
maintaining very simple node architectures (e.g. no suspending transfers),
\bus pushes additional considerations to the systems design.

\bus provides both some requirements and some guidelines to help system
designers:

\paragraph{Requirements} These are elements of the \bus specifications
designed to help systems with bus contention issues.

\begin{itemize}
  \item{The first 32~bits of a message may not be interrupted}
  \begin{itemize}
    \item There is no means in \bus arbitration to ensure all nodes can
          distinguish priority and regular messages\footnote{
            e.g. in \cref{fig:arbitration}, Node~2 would have seen the
            exact same signals if Node~3 had not participated at all.}.
          As such, \bus guarantees that at a minimum 32~bits of data may be
          transmitted without
          interruption~(\ref{sec:spec-interrupt}~\nameref{sec:spec-interrupt}).
  \end{itemize}
\end{itemize}

\paragraph{Guidelines}
These guidelines operate under the assumption that a priority message is of
sufficiently high priority that it would warrant interrupting an active
transmission. Without preemption, the livelock scenarios are averted, however
the issues of starvation remain.

\begin{itemize}
  \item{Priority Message Length}
  \begin{itemize}
    \item Priority messages are permitted to be any length, {\em however}, only
          the first 32~bits---even for priority messages---are protected from
          interruption. A design that sends two interrupt-worthy high-priority
          messages longer than 32~bits {\em will} livelock.
    \item \bus strongly {\em recommends} that all priority messages be
          restricted to 32~bits. With great care and caution designers may
          violate this constraint.
  \end{itemize}
  \item{Priority Message Frequency}
  \begin{itemize}
    \item As priority messages also have a topology-dependent component, \bus
          advises that after sending a priority message a node back off from
          sending {\em any} message (regular or high priority) for at least
          80\footnote{
            Arbitration~(4) + $t_{long}$~(2) + address~(8~or~32) + data~(32) +
            interrupt~(6) + control~(2) + idle~(1) =~79.
          }~bit-times per \bus node.
    \item This is not a formal \bus requirement as it is a recognized design
          constraint that not all member nodes are required to have any sense
          of time. Furthermore, it is believable that such time-less nodes may
          be sensors that require a low-latency response. It is not the
          intention of a bus design to define how such an issue is dealt with.
          It is, however, then the obligation of the designer of such a sensor
          to carefully consider how to be a good \bus citizen and how to avoid
          saturating the system's only communication bus.
  \end{itemize}
  \item{Normal Message Length}
  \begin{itemize}
    \item On a system with interrupts
    \begin{itemize}
      \item The first concern must be successful forward progress in the face
            of interrupts. Long messages must be packetized (most
            pathologically down to 32~bit windows, less headers). Future \bus
            designs consider more amenable solutions to this drawback,
            see:~\ref{sec:todo-extensions-resume}~\nameref{sec:todo-extensions-resume}.
      \item In practice, consider the shortest interval of interrupting
            messages your system may encounter. Multiply this window by the
            bit~time and halve the resulting value to get an approximate
            maximum packet size.
    \end{itemize}
    \item On a system without interrupts (or with infrequent ones)
    \begin{itemize}
      \item Long messages cause starvation. You must consider the acceptable
            average latency and use this coupled with the bus speed to define
            a reasonable maximum message size.
      \item Frequent short messages can cause starvation. While nodes are not
            required to have a sense of time, those that do should back off
            for a window of time between messages. Nodes without a sense of
            time should be designed with avoiding flooding in mind.
    \end{itemize}
  \end{itemize}
\end{itemize}