Skip to content

Protocol

paidforby edited this page May 11, 2020 · 20 revisions

LoRa Layer 2 Networking (LoRa Mesh)

A LoRaLayer2 (LL2) network is a hetreogenous wireless mesh network that utilizes the LoRa modulation scheme for point-to-point or point-to-multipoint connections between nodes. The following document describes the LoRaLayer2 protocol developed for use on LoRa mesh networks. The LoRaLayer2 protocol is a minimal distance vector routing protcol that incorporates elements of dynamic source routing (DSR).

Outline:

Existing Work

The LoRaLayer2 protocol is heavily reliant on existing LoRa technology and development.

Physical properties

LoRa refers to a proprietary modulation scheme developed by Semtech. For more info, the physical modulation was reverse engineered by Matt Knight, https://www.link-labs.com/blog/what-is-lora. The modulation scheme should not be confused with LoRaWAN, the protocol stack for the Things Network, https://www.link-labs.com/blog/what-is-lorawan.

Mesh routing protocols

There are a number of mesh routing protocols and alogrithms that have been implemented for TCP/IP networks. The work done on the LoRaLayer2 routing protocol is particularly inspired by jech's work on the Babel routing protocol.

Layer 2 protocol

There are a few network stacks already being devleoped for LoRa-based networks that are worth looking into,

  • Things Network - star topology, mostly for sensor networks
  • Reticulum - pre-alpha, may be capabale of ad-hoc style meshing, physical layer agnostic
  • 6LoWPAN - IPv6 network stack for low power wireless personal area networks, "industry" standard (i.e. little publicly-available documentation), related to Thread
  • IEEE 802.15.4e - MAC amendement to IEEE standard for wireless personal area networks (WPANs)
  • PJON - an open-source network protocol able to connect devices using most physical layers and media, some LoRa support

Goals, Challenges, Assumptions

Below are some of the initial goals, known limitations, and baseline assumptions for building the LoRaLayer2 protocol.

Desired capabilities

Implement a network stack for LoRa mesh networks that is capable of,

  • Communication between any two nodes on network via LoRa, without the need for a server or gateway.
  • Distance-vector, loop-avoiding, routing of packets on network with limited knowledge of network topology
  • Minimal design that could be easily reimplemented on many devices (e.g. Arduino microcontrollers, OpenWrt routers, Linux computers)
  • Connectionless protocol that does not require acknowledgements, so as not to over use air space

Challenges and limitations

Network structure

The structure of a LoRa mesh network is assumed to be x number of devices (or nodes) with a single radio transceiver interface that transmits messages omnidirectionally such that, under ideal conditions, all nodes should receive every message transmitted by a neighboring node. LoRa packets take a relatively long time (200ms - 1s) to be transmited, therefore the duty-cycle of devices should be proportional to the number of devices on the network (or number of neighbors?) to guarentee a certain RF air quality.

Network stack

This article deals mostly with the Layer2 aspects of the network, for an overview of the full stack used by disaster radio nodes, see Layered Model.

LoRa airtime

The airtime of a LoRa packet means the time it takes for a LoRa transceiver to transmit a single packet. Due to LoRa's modulation scheme it takes a fairly long time to transmit a packet. This transmit time (or packet airtime) is a factor of size of the packet being sent and a few transceiver settings that are set in advance and for our purpose represent constants. LoRa airtime for packet of size N can be calculated as follows,

airtime(N) = 8 + (max(ceil((8 * N - 4 * SF + 28 + 16 - 20 * (1 - HD)) / (4 * (SF - 2 * DR))) * CR, 0));

where, N is the number of bytes in the packet's payload
SF is the spreading factor (between 6-12) for our simulations this is currently set to 9
HD is a header bit, 0 for implicit header, 1 for explicit header
DR is a bit for low data optimization, 1 when enabled, 0 when disabled
CR is the coding rate (1 corresponding to 4/5 and 4 to 4/8)

adapted from page 31 of the Semtech's SX1276 documentation

Addressing

There are two possible ways of looking at addressing on a LoRa mesh.

  1. single node addressing
  2. multicast addressing

Single node addressing implies that messages are sent with a single destination address. Every disaster radio node has a unique 4 byte address. Intitially, we are using the last 4 bytes of MAC address of the nodes WiFi interface as its unique address. This should be unique across the network since it corresponds to the last byte of the OID and the three bytes of device ID which are given to the ESP32 mircocontroller by Espressif. If we begin supporting other microcontrollers we may need to reevalute how addresses are determines.

Multicast addressing implies that messages are sent with multiple destinations, these destinations have a shared 4 byte multicast address. This would enable nodes to subscribe to a channel by accepting messages for a specific multicast address. This may assume prior knowledge of the multicast address, though we may choose to implement some form of discovery eventually.

There are two addresses reserved for device function,

  • FF:FF:FF:FF limited broadcast to all neighboring nodes
  • aF:FF:FF:FF packets containing routing table information meant for LL2 routing table managment

Packet Structure

A packet is the Layer 2 data structure. It is rewritten at every hop in the route, while the datagram inside of the packet remains the same over the course of the route.

Semetech's specifications for LoRa tranceivers include a header (set explicit or implicit). When set in explicit mode, a header will be sent that contains the payload's length, coding rate, and the presence of a CRC, (link to RFM doc) https://revspace.nl/DecodingLora. In implicit mode, only a preamble is sent to intitiate communication.

Currently, we are using an explicit header, though we should consider switching to implicit,

Byte 0 Byte 1 Byte 2 - 5 Byte 6 - 9 Byte 10 Byte 11 - 14 Byte 15 Byte 16 Byte 17 - 255
ttl totalLength sender receiver sequence source hopCount metric datagram

where,
ttl is the "time to live", i.e. the number of hops allowed before message is discarded by a node
totalLength is the entire length of the paket (header length + datagram length)
sender is the 4 byte address of node that is transmitting the message
receiver is the 4 byte address of the intender receiver of the packet
sequence is the global message counter of messages transmitted by sender, determines packet loss rate
source is the 4 byte address of the node that orginated the datagram hopCount is the number of hops that the sender is from the source, this is incremented by the receiver metric is the quality of the link between the sender and the source datagram is the content of packet to be interpreted by Layer 3 applications

The datagram contains a destination address as well as a one byte type indicator which are intended to be written by the Layer3 client that is creates the datagram. While not formally part of the Layer 2 packet header, the destination address is inspected by a node's LoRaLayer2 code in order to route the datagram and update the routing table.

An optional CRC may be included at the end of the payload (it's unclear how or when this is checked, it may be internal to the SX1276?).

Routing

Devices on a LoRa mesh network are assumed to only have a single omni-directional radio transceiver (though we plan on adding support for two). This means that every routing decision comes down to a single question that every node asks upon receiving a packet, Should I retransmit the datagram or not? This decision is based on a few of conditions,

  1. Has the packet exceeded it's time to live?
  2. Am I the intended receiver of the packet?
  3. Is the datagram's destination in my routing table?

LL2 routing is managed by two tables that answer two questions,

  • the neighbor table - who's around me?
  • the routing table - who's around who's around me...?

There are two options for building these tables, proactive and reactive

  • proactive routing is table-driven in the sense that it involves neighbors sharing their routing tables with one another to build a complete table of the network
  • reactive routing is packet-driven in the sense that routes are built on-demand when packets containing messages are sent

Neighbor table

The neighbor table consists of node addresses that are one hop away. It is used to keep track of neighbors for the purpose of creting metrics for the links between the local node and it's neighbors. A neighbor table entry consists of the following,

struct NeighborTableEntry{
    uint8_t address[ADDR_LENGTH];
    uint8_t lastReceived;
    uint8_t packet_success;
    uint8_t metric;
};

address is 4 byte address of neighbor node
lastReceived is the sequence number of last received message from this address used to recalculate packet loss
packet_success is the instanteanous success rate of packets from this address, equal to 255-packet_loss
metric is a 0 to 255 representation of quality of link to the neighbor address

Routing table

The routing table keeps track of all possible routes avaible from the local node. It is consulted to build the packet header when a datagram is requested to be transmitted by a Layer 3 application. A routing table entry on a node in a LoRa mesh network consists of the following,

struct RoutingTableEntry{
    uint8_t destination[ADDR_LENGTH];
    uint8_t nextHop[ADDR_LENGTH];
    uint8_t distance;
    uint8_t lastReceived;
    uint8_t metric;
};

where,
destination is the 4 byte address of end point of route
nextHop is the 4 byte address of neighbor through which end point maybe reached
distance is the number of hops from end point
lastReceived is the sequence number of last received message from destination used to recalculate packet loss
metric is a 0 to 255 representation of quality of link between the destination address and its final hop (should be average metric of entire route?)

Neighbor table entries are routing entries where the destination and next hop are the same and the number of hops is known to be one (zero hops being the localhost).
When a neighbor is found and added to the neighbor table, it is also added to the routing table as a route via itself and a single hop.

Building routes

LoRaLayer2 uses reactive routing to manage and update the routing table, though proactive routing can be enabled to help build routes more quickly.
Routes are discovered by nodes listening to every packet they "hear", even if they are not the intended recipient or next hop node. A node inspect packets for all the included addresses and decides if each address is a possible routes.

Upon "hearing" a packet from a neighbor, a node will parse the packet for the following,

  • The sender address, to update the associated neighbor table entry and one hop route.

  • The receiver address, if this is not the node's local address or the broadcast address, then the receiver address is added as a possible route of unknown quality 2 hops away (although it may eventually discover that is has a shorter, better route).

  • The source address, if this is different from the sender address, then the source address is added as possible route with a distance of hopCount + 1 and metric calculated as the weighted average of the metric between listener and the neighbor and the metric between the neighbor and the source (weighted by the number of hops).

  • The destination address, if this is not the node's local address, the broadcast address, or same as the receiver address, then this is added as possible route of unknown distance and quality.

Routing table packets

If a node operator or network installer would like their network to con verge more quickly, they can enable routing table packets to be transmitted at regular intervals. A routing table packet advertises a node's routes to it's neighbors. Routing tables packets have a ttl of 1 and are addressed to the multicast address 0xaFFFFF. The data of this packet containing n routes is structured like so,

Byte 17 to 20 Byte 21 Byte 22 ... Byte n * 6 to (n * 6)+3 Byte (n * 6)+4 Byte (n * 6)+5
routeTable[0].destination routeTable[0].distance routeTable[0]. metric routeTable[n].destination routeTable[n].distance routeTable[n].metric

On reception by a neighboring node, this data is parsed, the distance values are incremented to reflect the hop by which the routes were received, and the routes contained are compared against all exisiting routes in the node's routing table. After compairsion, the node either, adds the route to its table, updates an existing route with a new metric or better nextHop, or drops the route and does nothing because it already has a better route or the route refers to the nodes local address.

TODO: Dues to limited packet size (240 bytes of data), a routing table packet is limited to 40 routes (6 bytes/route). After discovering more than 40 routes, a node needs to send multiple packets to share its entire routing table. Rather than assuming that neighboring nodes can receive sequential packet and append them to one another, we assume the worst-case scenario of a lossy network and instead send a random assortment of 40 routes at a time. This way, it is less likely neighboring node will completely miss the existence of a route because of a poor quality link to the transmitting node (e.g. what if the neighboring node is missing every other third packet? then it might miss the entire last third of the transmitting node's routing table).

Metrics

In order to avoid over using air space (exceeding duty cycle), the routing protocol is built into normal packet structure. This way every packet can be used to update the routing table and route metrics. If a node reaches a timeout period, based on acceptable duty cycle, without transmitting it should transmit a "hello, I'm alive" packet, so other nodes do not drop it from their routing table.

For route selection, a node uses two metrics,

  • hop-count
  • packet success

A node will almost always prefer the shortest number of hops, unless the packet success rate drops below a certain threshold. This threshold can be tuned on a per node basis.

Packet success is calculated from the sequence number in the header of packets and is represented by a 0 - 255 value. If a node misses a packet from neighbor, the packet success value is decreased by 16, meaning that after 16 packets are missed packet success will be reduced to 0.

It is possible to add additional metrics as weighted averages of packet success. For development purposes, we are implementing a random weighted metric that represents potential packet loss on the network.

Multi-hop metrics are an average of the metric of each hop. They are calculated as follows

hopRatio = 1/route.distance;
metric = neighborTable[entry].metric * (hopRatio) + route.metric * (1-hopRatio)

where, route.distance is the number of hops between node X and route destination, node Z neighborTable[entry] is the neighbor table entry corresponding to node Y which sent routing packet neighborTable[entry].metric is the metric for the link between node X and node Y route.metric is the "rest of route" metric sent in routing packet representing the average metric between node Y and node Z, calculated as shown above

This way the routing metrics are recursviely calculated as a average without node X knowing the metric of every link between Y and Z.

Node States

A disaster radio node can be said to be in a constant state of simulatenous learning and forwarding. WHen a node joins a LoRa mesh network it can begin sending messages to it's neighbors immeadiately, but it has to wait to hear messages sent to nodes more than 1 hop away before it can begin routing outside of it's immediate neighbors.

Learning

While learning, a node either listens for packets from or intended for other nodes (in reactive routing) or listens for routing table packets from its neighbors (in proactive routing).
The learning phase is arguably never complete, as new nodes could appear at any moment, however, after a certain point, a node should have enough routes to begin routing packets outside of it's neighbors and therefore enter the forwarding state

After learning phase has reached an adequate point, a routing table may look like so,

Routing table example:

1 hops from 26a8f7dd via 26a8f7dd metric 195 
1 hops from 77339dd7 via 77339dd7 metric 215 
1 hops from 36110216 via 36110216 metric 214 
2 hops from e5093e8c via 77339dd7 metric 205 
2 hops from 9bd3a4aa via 77339dd7 metric 187 
2 hops from 6767e872 via 26a8f7dd metric 228 
2 hops from ecca8bd9 via 36110216 metric 244 
3 hops from 65a48843 via 36110216 metric 244 
3 hops from 89d39582 via 36110216 metric 213 
5 hops from 0bfeb0f4 via 36110216 metric 246 
6 hops from 590f4402 via 36110216 metric 210 

Forwarding

In the forwarding state, a node has discovered enough neighbors and learned enough about the network to be considered a good route by some (at least one) of its neighboring node(s). This should be consider the normal operating state of a disaster radio node, as it can now be used to send addressed message and retransmit messages addressed to nodes for which it has routes.

Convergence

Convergence implies that every node has knowledge of a route to every other node in the network.

Similar to the Babel, a TCP/IP mesh networking protocol, a node in a LoRa mesh network can be said to know the address of every node in the network and the nextHop to reach that address, but it does not know the entire topology of the network. This reduces the memory required to maintain a routing table and the time needed for convergence.

Given the physical constraints of LoRa modulation, a LoRa mesh network converges relatively slowly (approx. on order of 1 minute for every 6 hops and perhaps longer under non-ideal conditions). Attempting to increase the convergence time can result in high packet loss or, on a physical network, RF interference from neighboring nodes. For convergence to occur, a route must be relayed from one end of the network to the other. At a ~10s transmission intervals, routes may take more than a minute to traverse a 6 hop wide network.
The ideal convergence time for a LoRa mesh can be calculated as follows,

convergence(n) = (airtime + delay) * (width) * 2

where, airtime is the time is takes for node to transmit a single routing table packet delay is the time between a node transmiting routing table packets width is the maximum hops possible on the network, depends on network structure, but often one half the total nodes

For example, in a 15 node network, assuming a network width of 7 hops, a message delay of 10s, and an approximate transmit time of 200ms,

converegence(15) = (0.2 + 10) * (7) * 2 = 142.8s

TODO: Add duty cycle description

Routed Packets

After entering a forwarding state (ideally once the network has converged), routed packets can be formed by adding the destination address as the first four bytes of the datagram.

Packets must be given an intended destination by a Layer 3 application. For example, if you receive a chat message from a user at specific node, the chat app will need to reply to that message by placing the correct address in the

A datagram is formed like so,

Byte 17 to 20 Byte 21 Byte 22 - 255
destination type message

The originating node checks their routing table and finds the entry which matches the datagram's destination and then creates a LL2 packet with the corresponding nextHop as the receiver in the packet header. This header is then prepended to the orginal datagram, preserving the destination, type, and message.

When this packet is received by neighboring nodes, they check if the nextHop specified matches their local address. If it does not match, the packet is dropped. If the nextHop does match their local address, they decrement the ttl and increment the hopCount in the header and then check their routing table to find the nextHop for the packet and replace their local address in the original packet with the nextHop from their routing table. The message is then pushed on to buffer and queued to be transmitted.

Route selection

Route selection is done on node-by-node basis, there is no intention to implement source-specific routing. Each node maintains only possible routes with lowest distance to a destination in it routing table. When a node receives a routed packet that it is intended to retransmit, it will refer to it's routing table find the best nextHop. By default, a node will select the nextHop that gives the lowest distance route.

Current Implementations

Actively being developed for use on ESP32 dev boards:
https://github.com/sudomesh/LoRaLayer2

The network stack and routing protocol outlined in this wiki were originally developed in a network simulator, https://github.com/sudomesh/disaster-radio-simulator
Note: the simulator is out-of-date and does not reflect many of the latest improvements to the LL2 protocol.