Experiments in mesh networking with Rust and STM32WL LoRa chips

Published on July 24, 2024.

Filed under: lora.

Using a radio on a hill Testing on a warm day in Edinburgh, Scotland. About the first time it didn't rain while testing something outside.

I've written some experimental software which has enough functionality to talk to people who are running the Meshtastic software. I encountered a few problems along the way, so I thought I would write this (my first blog post) in the hope it helps other people who are also interested in such a thing. It's maybe a bit information dense, but I hope it is useful and interesting.

Why?

When I was at school I used to love exploring packet radio bulletin boards in the UK. Hobbyists set up computers which linked via radio to other hobbyists. These in turn linked to others and so on. I found it fascinating that I could talk to people from the other side of the country and sometimes across the Atlantic via satellites- all without telephone line or phone bills.

What could be better to a young nerd than a semi-underground network of information that was inaccessible to teachers!

I've since had crazy ideas about geographic routing algorithms and things, but never got round to experimenting with them. I had some LoRa radios left over from a previous project (I built gateways to send LoRaWAN water level measurements via satellites) and thought it was time to put these to use.

I've wanted to learn some more embedded Rust-language software development so implementing a Meshtastic compatible client on Rust seemed like it would be a good learning exercise.

Meshtastic

Meshtastic seems to becoming popular. If I wanted to build onto a Mesh network it makes sense to talk with whatever is out there. Reticulum also uses LoRa and I'd like to experiment with this - but there are Meshtastic people around here but I can't find people using Reticulum. So Meshtastic it is to begin with.

Meshtastic uses little radios - I bought a Heltec v3 which talk to an app on a mobile phone. This lets you send text messages or GPS positions to other people with these radios. No mobile phone base stations or any infrastructure is needed.

How does it work?

The message is encrypted.
It is sent over a radio and received by other people with similar radios.
These radios form a "mesh network" and bounce the message between each other until it reaches the recipient.

Gossip protocols

Most computer networks are planned by an IT department, who spend time drawing up network maps and allocating IP addresses to each computer or server. These maps define who can talk to who, and where in buildings network firewalls and routers and cables are installed. People configure these routers to tell them where messages should go - whether to a server or other router in a different floor of the building, or via a fibre-optic table to another company.

Meshastic radios form what is known as an opportunistic network. Nobody is in charge of planning the network, and it can come and go as people move around carrying their radios with them. So how do you know who to talk to when you don't know who is there?

Meshtastic currently uses a mechanism sometimes known as a Gossip protocol - if you hear something from someone then shout it out again. Repeat this and when someone tells his next door neighbour what they saw last night soon the whole town knows.

However - engineering is all about trade-offs. This is fine when there are a small number of people. But if everyone is continually shouting out what they've just heard you end up with a cacophony and nobody can be heard if they try to say something new. So at some point this kind of network can't transmit as much information as if people talk orderly, and the number of people retransmitting the latest rumour is limited. So this kind of algorithm runs into problems.

Researching algorithms which can scale to lots of people but without needing a central co-ordinator is half the fun of this kind of thing.

LoRa radio modulation

Meshtastic uses LoRa radios to communicate. LoRa modulation is a way of encoding information onto a radio signal - in a similar manner to AM and FM for broadcast radio. While FM broadcast radio is typically broadcast around the 100MHz range, LoRa often uses the 869MHz frequency band (at least in Europe - the US and other countries use the 915MHz frequency band).

Where LoRa shines compared to FM is its use of some clever maths and signal processing to be able to receive signals below the noise floor. This is like being able to hear an FM radio station beneath all the hiss - which is pretty neat. However, you don't get something for nothing - LoRa trades off the amount of information that can be sent against range and power. So LoRa signals can reach very long distances - but they can't send HD video - they're more suitable for short text messages. The default settings of Meshtastic give a data rate of around 1 kilobits/second. This is something like 1/600,000th of the rate available from a 5G mobile connection nowadays - Engineering is all about trade-offs.

(As an aside, I find it fascinating that Claude Shannon was thinking of these trade-offs in 1948 before any of this was possible in practice. I just looked and his "A mathematical theory of communication" paper has 154779 citations on Google Scholar, so I have a very unlikely goal to beat with this post).

A top-down technical journey of Meshtastic

Message types

Meshtastic allows different types of messages to be sent. The type of message is identified with a "Port number". Interesting messages include:

Text messages (Port 1) - This carries a text message.
Position messages (Port 3) - This carries a latitude/longitude/altitude message, which can be plotted on a map.
User messages (Port 4) - This carries the information which is displayed for each user - such as a descriptive name and the model of radio they are using.
Routing messages (Port 5) - This carries acknowledgements to messages.
Telemetry messages (Port 67) - This carries information such as battery percentage, but can also include things like weather information.

The details of these messages are contained in the Meshtastic source

Protobufs

These messages are then encoded into binary using Protobuf encoding. There are various protobuf libraries and compilers which usually take a ".proto" file and generate code to parse these. However - as this project was a learning exercise I was curious what was involved in doing this myself - so here goes:

Protobuf messages are as a sequence of Tag/Length/Value messages.

Each field in the message has a tag (or a key) that indicates the field it represents. So the Meshtastic position message uses key 1 for latitude, key 2 for longitude, key 4 for time, etc.

The field can either be of a predefined type (e.g. a fixed-length number) or a LEN type. A LEN type is a sequence of bytes that can recursively contain a further protobuf message, or a text string.

The tag consists of a single byte. The bottom 3 bits encode the data type of the message (1 = fixed number, 2 = LEN type etc). The top 5 bits encode the tag or key of the message.

However, 5 bits only allows a number up to 32 - so this is where the VARINT encoding is used. If the number fits into 4 bits - great- it all fits and we're good.

If it doesn't, the top-bit of this byte is set to indicate "more data in the varint", and the higher bits are encoded into the next byte. This can also have its top bit set, indicating another byte, and so on.

So the VARINT encoding allows small numbers to be encoded into 1 byte, or larger numbers into as many bytes are required. This cleverly keeps messages short, while allowing flexibility.

I wrote an experimental parser, implemented as a state machine. This maps a Protobuf message into a dictionary. There are functions to access the dictionary - so to access the latitude of a position message you say "Give me key 1 as an integer".

Meshtastic top-level protobuf messages generally contain the following:

A key "1" which indicates the message type/port.
A key "2" which contains a LEN type with the payload. This payload then needs to be further protobuf encoded/decoded to access the individual fields of the message - or, in the case of a text message, decoded as a UTF-8 encoded string.

Encryption

Meshtastic messages are encrypted using AES encryption in "counter mode". A 128-bit counter is initialised with a combination of the packet identifier and sender radio address.

The message is split into 128-bit chunks.
The counter is encrypted using the encryption key and then the first 128-bits of message are XORed with the encryption output.
The counter is then incremented and the process is repeated for the next 128-bit chunk, and so on until the end of the message.

I had trouble figuring out what the encryption key is.

The default Meshtastic encryption key for the "Long Fast" channel is described as "AQ=='. This looked like a base64 encoded value to me - which encodes to 1. So how to pad this for the other 127bits? I tried various options (e.g. 0x00000....) but nothing worked. I was a rubbish hacker! Even with the encryption keys I couldn't decrypt the secrets! Grr.

After a bit of hair pulling and online searching it turns out the full key is actually 1PG7OiApB1nwvP+rz05pAQ==. Base64 decoding this lead to the following key code - which decrypted a message! Hurrah. I'm a hacker after all!

let key: [u8;16] = [ 0xd4,0xf1,0xbb,0x3a,0x20,0x29,0x07,0x59,0xf0,0xbc,0xff,0xab,0xcf,0x4e,0x69,0x01];

Packet header and addressing

Each packet contains a 16byte binary header. This header is not encrypted. This allows radios to repeat packets which they cannot decrypt.

Each radio has a 32-bit(4 byte) address which is encoded into the hardware. Messages can be directed to a specific radio.

The packet contains:

The 4 byte radio address of the source.
The 4 byte radio address of the destination (A packet can be sent to 0xFFFFFFFF to indicate a broadcast).
A 4 byte unique, random identifier for the packet.
Other details such as a count of hops through the network.
An encrypted, protobuf-encoded payload.

Further details are described here.

Meshtastic channels

The first step in my understanding was to figure out how Meshtastic uses LoRa. These are described here but there were still some settings I needed to figure out. For reference, the "Long Fast" channel which is used as a default in the UK has the following settings. It took me some trial and error fo figure out these, so hopefully this is useful.

Radio frequency: 869.525MHz
Spreading factor: SF11
Bandwidth: 250kHz
Preamble length: 16 bytes
Invert IQ: False
Coding rate: 4/5
Low data rate enable: False
Header type: Variable
Header CRC enabled: True

Seeed Studio Wio-E5 mini and STM32WLE5JC

I had some Wio-E5 and Grove Wio E5 modules which have an STM32WLE5JC microcontroller chip on them. This has a 48MHz ARM Cortex-M4 CPU, 64k RAM, 256k Flash and a LoRa radio - all on the same chip. 32 bit luxury! A 1960s super-computer in a chip costing much much less and using milliwatts of power.

Unlocking the device firmware protection

The Seeed devices come with some neat software on them that contains a LoRaWAN stack which is controlled with AT commands. The STM32 devices have a "read-out protection" enabled to try to prevent copying of this software. However, this also stops us writing to the device. So the first thing we need to do to is remove this protection.

I believe this is easier with the official ST-Cube software and programmer. I don't have this so I used openocd and the programmer part of an ST-Nucleo64 STM32L476 board I already had.

The boards are programmed using the standard ARM Single-Wire-Debug wiring. I wired up the GND, SWDIO and SWCLK wires from the board:

SWD wiring

The Nucleo64 STM32L476 programmer board I used has the following pinout on connector CN4:

VDD_TARGET VDD from application
SWCLK SWD clock
GND
SWDIO SWD data input/output
NRST RESET of target STM32
SWO Reserved

..so I wired accordingly:

Nucleo SWD wiring

My wiring complete, I fired up openocd ready to go. But disaster: no chips were found! Gah! What had I done? Had I wired it wrong and blown up my chip? Panic!

My brain never believes this, but a calm approach is usually better than panic. After some research I found that apparently the STM32WL uses a slightly different SWD protocol which is apparently only supported by later versions of openOCD. I'm using 0.12.0 and after upgrading to this version the chip was found. Progress! I hadn't blown it up.

I used what I thought was the correct openocd command line incantation to remove the protection:

openocd -f interface/stlink.cfg -f target/stm32wlx.cfg -c 'init;stm32l4x mass_erase 0'

but it didn't work! "mass erase failed"

Erase failed

I tried various things and got it unlocked eventually. To be honest, I'm not 100% sure what it was that worked. I tried wiring up the reset line, but I don't think this was necessary. I think this is the correct mechanism:

In one command line window run openocd:

openocd -f interface/stlink.cfg -f target/stm32wlx.cfg

In a second command line window, telnet to openocd. I didn't have telnet, so used nc instead:

nc 127.0.0.1 4444

...and into the window enter:

reset halt
stm32wlx unlock 0

Then power cycle the STM32, and then enter:

reset halt
stm32wlx mass_erase 0

...which should report stm32l4x mass erase complete and then power cycle again. (I think I had to power cycle a second time otherwise GDB seemed to go off into the woods after running the code).

I never like "I switched it off and on again and it seemed to work" solutions, so if anyone knows how to correct this process let me know. But some combination of the above and appropriate cursing got the chip unlocked and programmed.

After erasing the device the readout-protection is disabled, which is what we want - RDP is level 0:

after erase no RDP

Embedded Rust

Embedded Rust has some nice properties, and useful frameworks like RTIC which cunningly hacks the ARM Cortex nested interrupt handler to provide a co-operative multitasking framework, with message passing.

But this was no use if I had to write all of the hardware interfacing for this new chip myself. But luck was on my side in the form of the STM32WLXX HAL code.

I did have to make some additions to this, however:

The ability to set a custom LoRa radio sync byte, as needed for Meshtastic.
Adding code to perform the AES-CTR mode encryption on the hardware encryption unit.
Adding code to enable serial interrupts.

There was a further gotcha which had me:

The STM32WLxx HAL looks to run on the ST Nucleo board. The LoRa chip needs an external switch to switch the radio between receive, high power transmit and low power transmit. The Seeed boards only have high power transmit wired - but I was telling it to transmit using the low power output. This appeared to work, but didn't transmit very far!

A further tip was to enable the "Boost" receiver mode:

radio.set_rx_gain(PMode::Boost);

which apparently improves receiver sensitivity at the expense of power consumption. The whole thing looked to be using about 20mW of power on receive before, and I haven't measured the change yet.

Building, installing and using

The code needs 3 command line windows open to run:

Running a serial terminal emulator, at 115200 baud to the USB serial port
Running openOCD
Compiling the code and running gdb

The code can be run by running openocd in one window, with the STM32 connected:

openocd -f interface/stlink.cfg -f target/stm32wlx.cfg

and then in another:

git clone https://codeberg.org/shortcolin/stm32wl_rust_mesh
cd stm32wl-rust-mesh
cargo run

This will program the device. Once programmed it can be disconnected from the programmer and will run stand-alone.

I used a serial terminal emulator application on my Android phone connected via USB. I send a text message to it and the Meshtastic app connected to the Heltec v3 receives it. Success!

Phone showing a circuit board and Meshtastic

Bugs, and things still to do

Document, tidy up and release the code.
The encryption hardware faults when compiled into release mode.
Find the inevitable other bugs.
Don't repeat messages destined to us.
Figure out the correct timing Meshtastic uses for repeating packets.
Persistence of settings and some kind of better interface.
Think about implementing other routing algorithms.