Score:5

Authenticated encryption scheme for one-way radio telemetry

ru flag

I'm designing an authenticated encryption scheme for a noninteractive one-way radio telemetry system. A number of devices in the field send back telemetry to a base station periodically, but no communications are sent from the base station to the devices.

My requirements and threat model are as follows:

  • The devices can only send telemetry, they cannot receive anything. Once they're deployed they will operate autonomously for many years.
  • An attacker must not be able to determine the contents of the telemetry packets.
  • An attacker must not be able to identify which device sent a specific telemetry packet.
  • An attacker must not be able to tamper with or spoof telemetry packets.
  • Replay attacks are not considered critical, but resistance to replay attacks would be nice to have anyway.
  • Physical attacks against a device that sends telemetry should compromise only that device, not other devices.
  • DoS is inherently possible via jammers so is not part of the threat model here.
  • Telemetry packets should not be excessively inflated in size. Plaintext is typically 20 bytes or less. The longer the overall packet, the less likely it comes through intact, even after the error correcting code that is applied before transmission. A target size of 64 bytes would be preferable.
  • There are enough devices that schemes where the base station tries to decrypt an incoming message with every key are not feasible.

The devices run a modern ARM SoC with a HRNG. They spend most of their operating lifetime in a low-power standby state, waking only periodically to collect some data and transmit new telemetry. The total amount of data transmitted over the lifetime of the device is around 15MB split across 100k packets.

My initial thought was to use XChaCha20-Poly1305 as an AEAD for the data with a factory-programmed unique PSK for each device, encrypt the device ID using an RSA public key, and put a hash of the RSA portion into the AAD. The receiver would then decrypt the device ID using its RSA private key, use the device ID to look up the correct PSK, decrypt and authenticate the message, and check that the hash in the AAD matches the hash of the RSA blob. However, this vastly inflates the message size and feels a bit clunky.

My next thought was a semi-static ECDH (X25519) scheme. The base station would have a static ECDH key. Its public value would be programmed into the devices as part of the firmware. A device would generate a random ECDH private key for every telemetry message, combine the private key with the public key to derive a secret, use that to encrypt and authenticate the message with ChaCha20-Poly1305.

This seems perfect - this avoids RSA, avoids symmetric PSK, is simpler to decrypt, guarantees indistinguishability of devices because the devices do not have a long-term key associated with them, avoids adding a hash to the AAD, and allows the use of ChaCha20's shorter 96-bit nonce because each transmitted message uses a random key, and the messages are 68 bytes for a 20 byte plaintext. Unfortunately it has a fatal flaw: if you compromise any device and get hold of the base station's public X25519 point, you can send messages claiming to be any device ID you like, because there's no way for the base station to authenticate that a message came from a specific device.

I could replace the 32-bit device ID in the telemetry with a 96-bit random identifier that maps to the original 32-bit ID via a lookup on the base station, but that adds another 12 bytes to the message size which is not really ideal, especially when I'm already exceeding the 64 byte target size a little.

What're my options here? Is there a better scheme?

forest avatar
vn flag
_DoS is inherently possible via jammers so is not part of the threat model here._ – If using LPI/LPD techniques like DSSS with OFDM and time-hopping, you can make jamming extremely power-intensive and virtually rule it out as a threat.
forest avatar
vn flag
_An attacker must not be able to identify which device sent a specific telemetry packet._ – Would it be feasible to have each device coalesce telemetry and send it at regular intervals, whether or not the device has anything to send? That would prevent an attacker from determining which of the packets sent at regular intervals actually contains anything of interest.
ru flag
@forest the challenge is simply signposting the correct decryption key without an attacker who observes the traffic being able to tie that to a specific sender. I'm not bothered about RF triangulation or anything like that; only protocol level stuff is in scope here. the radio hardware and backhaul is already concretely designed and I have no choice over that.
Score:3
mx flag

There's two reasonable strategies:

  • symmetric crypto + database to speed up search
    • relies on nonces being mostly sequential and not too many messages getting lost
  • asymetric crypto

Note:both overheads include the 4 byte deviceID. That doesn't have to be included in the plaintext.

Symmetric is very very compelling due to low overhead 24 bytes (16 byte nonce+deviceID hint, 8 bit MAC) (16 total if you can find a fast 64 bit block cipher for the hint). You get the sequence number for free here.

Asymmetric has 44 bytes of overhead (32 byte point + 4 byte deviceID +8 byte MAC) but low code complexity on the base station/backend since there's no database or search code.

Symmetric system

Devices keep a sequential nonce counter which starts at zero.

we derive a k_nonce_hint key from the main device symmetric key.

Each nonce is used to generate a nonce_hint byte string. This can be as simple as nonce_hint(nonce)=AES_enc(nonce,k_nonce_hint)[:HINTLEN] or a hash function.

messages sent over the air look like nonce_hint || AEAD(M,nonce) where AEAD is some encrypt+MAC scheme.

Note:Device ID and message sequence number are sent implicitly via the nonce hint. The MAC tag provides message integrity and further validates the device ID and nonce.

At the other end

For each (device,nonce) pair there's an associated nonce hint that can be stored in a database. Assuming not too many messages get lost, you only need to keep some small number 10-100 for each device in the DB.

The application server after receiving a message queries the DB to find a matching (Device,Nonce) pair(s) and tries decrypting the message with that(or those) device key(s) and nonce(s). If it works, they update the list of nonce hints for that device in the DB.

Note that if the nonce hint is derived reversibly (EG:16 byte nonce length derived using AES encryption) nonce hints can be checked against all device keys as a fallback option. A lightweight 64 bit block cipher would be perfect. I suggest SPECK.

Why This Works

Symmetric crypto and databases are cheap. Adding a DB and structuring appropriately turns an O(k) problem (try all k device keys) into an O(1) problem (database lookup). For sufficiently large values of k this is worthwhile. If the hint is generated with a reversible keyed permutation, search can be very fast for the fallback case.

Code

from Crypto.Cipher import AES
from os import urandom
import struct
uint32_to_bytes=struct.Struct("!I").pack #pack 4 byte integer
bytes_to_uint=lambda s:sum(256**i * b for i,b in enumerate(bytearray(s[::-1])))

import hashlib
sha256=lambda s:hashlib.sha256(s).digest()
import hmac
sha256_hmac=lambda k,s:hmac.new(k,s,"sha256").digest()


def AES_CTR(m,k,nonce):
    return AES.new(key=k,mode=AES.MODE_CTR,nonce=nonce).encrypt(m)
def nhint_make(nonce,k):
    return AES.new(key=k,mode=AES.MODE_ECB).encrypt(
        b"\0"*12+nonce)

class device():
    def __init__(self,dev_id,dev_key,dev_key_nhint):
        self.dev_id = dev_id
        self.dev_key = dev_key
        self.dev_key_nhint = dev_key_nhint
        self.next_nonce = 0
    def encrypt_message(self,m):
        nb=uint32_to_bytes(self.next_nonce)        
        nh=nhint_make(nb, self.dev_key_nhint)
        self.next_nonce+=1
        ct=AES_CTR(m, self.dev_key[:16],b"\0"*4+nb)
        tag=sha256_hmac(self.dev_key[16:],nh+ct)[:8]
        return nh+ct+tag

class backend():
    HINT_RUN_LENGTH=10
    def __init__(self):
        self.master_secret=urandom(32)#used to derive device keys
        #FIXME:consider using a real database or explicitly designed data structure
        #python dicts and strings are oversized for this
        self.hint_db_meta={} #dev_id:(nonce_start,nonce_end)
        self.hint_db={} #nhint:dev_ID
        #in theory, nhint collisions should be handled by storing both dev_IDs in a list
        #but collisions are infrequent and this saves a lot of memory
        self.hint_keys_all=bytearray() #dense list of keys used for fallback route
        self._next_device_ID=0
    def dev_key      (self,dev_id):return sha256_hmac(self.master_secret,b"dev_key:"+dev_id)
    def dev_key_nhint(self,dev_id):return sha256_hmac(self.master_secret,b"nhint_key:"+dev_id)[:16]
    def provision_device(self):
        dev_id=uint32_to_bytes(self._next_device_ID)
        self._next_device_ID+=1
        self.update_hints(dev_id, 0)
        self.hint_keys_all+=self.dev_key_nhint(dev_id)
        return device(dev_id,
                      self.dev_key(dev_id),
                      self.dev_key_nhint(dev_id))
    def update_hints(self,dev_id,start,end=None):
        if end is None:end=start+self.HINT_RUN_LENGTH        
        assert end>=start
        p_set=set(range(*self.hint_db_meta.get(dev_id,(0,0))))
        self.hint_db_meta[dev_id]=(start,end)
        n_set=set(range(start,end))
        #derive nhint key
        key_nhint = self.dev_key_nhint(dev_id)
        for i in p_set-n_set:
            nb=uint32_to_bytes(i)
            nh=nhint_make(nb, key_nhint)
            try:del self.hint_db[nh]
            except KeyError:pass
        for i in n_set-p_set:
            nb=uint32_to_bytes(i)
            nh=nhint_make(nb, key_nhint)
            self.hint_db[nh]=dev_id
            
    def decrypt(self,data):
        assert len(data)>=(16+8)#nhint_make,tag
        nh,ct,tag=data[:16],data[16:-8],data[-8:]
        #look up the nonce hint(s)
        try:hints=[self.hint_db[nh]]
        except KeyError:
            print("warning:failed to look up hint, trying fallback")
            hints=self.fallback_nonce_candidates(nh)
        for dev_id in hints:
            #check the nonce_hint is well formed
            block=AES.new(key=self.dev_key_nhint(dev_id),mode=AES.MODE_ECB).decrypt(nh)
            if not block.startswith(b"\0"*12):continue
            nb=block[12:]
            #check the tag
            key=self.dev_key(dev_id)
            tag_correct=sha256_hmac(key[16:],nh+ct)[:8]
            if hmac.compare_digest(tag,tag_correct):
                m=AES_CTR(ct, key[:16],b"\0"*4+nb)
                nonce=bytes_to_uint(nb)
                self.update_hints(dev_id, nonce+1)
                return dev_id,nonce,m
        raise ValueError("couldn't decrypt the data")
    def fallback_nonce_candidates(self,nh):
        for i in range(0,len(self.hint_keys_all),16):
            key=self.hint_keys_all[i:i+16]
            block=AES.new(key=key,mode=AES.MODE_ECB).decrypt(nh)
            if block.startswith(b"\0"*12):yield uint32_to_bytes(i//16)

if __name__=="__main__":
    import time
    def timeinc(a=[time.monotonic()]):then,a[0]=a[0],time.monotonic();return "%.6f"%(a[0]-then)
    def printres(res):print("result:{devID:%r seq:%i, message:%r}"%res)
    base=backend()
    n=20000
    print("provisioning %i devices ... "%n,end="")
    base.HINT_RUN_LENGTH=0#don't populate hint db yet
    lots_of_devs=[base.provision_device() for i in range(n)]
    print(timeinc()+ "\nfilling hint DB ... ",end="")
    
    import psutil
    process = psutil.Process() 
    mem1=process.memory_info().rss
    del base.HINT_RUN_LENGTH#populate afterwards
    for d in lots_of_devs:
        base.update_hints(d.dev_id, 0)
    mem2=process.memory_info().rss
    mem_used=mem2-mem1
    print(timeinc()+"\nmemory used:%i (%.2f/device,%.2f/hint)"%(mem_used,mem_used/n,mem_used/n/base.HINT_RUN_LENGTH))
    
    print("check round trip for first device")
    dev1=lots_of_devs[0]
    m=b"hello world!"
    encd=dev1.encrypt_message(m)
    #check it decrypts correctly
    result=base.decrypt(encd)
    printres(result)
    #result:{devID:b'\x00\x00\x00\x00' seq:0, message:b'hello world!'}
    print("1 message processed in "+timeinc())
    assert (result[0],result[2])==(dev1.dev_id,m)
    
    m=b"some telemetry"
    dev2=lots_of_devs[n//2]
    print("\ncheck round trip for device %i"%(n//2))
    for i in range(1000):
        encd=dev2.encrypt_message(m)
        try:result=base.decrypt(encd)
        except Exception:
            print(i)
            raise
        assert result[::2]==(dev2.dev_id,m)
    print("1k messages processed in "+timeinc())
    print("\ncheck desynchronisation recovery")
    for i in range(100):
        encd=dev2.encrypt_message(m)
    print("100 messages discarded to desynchronise "+str(timeinc()));
    result=base.decrypt(encd)
    print("decryption done "+str(timeinc()))
    printres(result)
    #result:{devID:b"\x00\x00'\x10" seq:1099, message:b'some telemetry'}

Performance

I'm getting 500µs for an encrypt/decrypt round trip but this is using an in memory dict as a database with 20k simulated devices * 10 hints. Memory usage is about 120B per hint (*10 hints *20k devices = 24MB). This is ~10x larger than needed.

The optimal data structure for this is a sorted list of nonce_hint || device_ID strings. These can be stored in a radix trie allowing sorted chunks with common prefixes to lose a byte or two.

A static precomputed data structure can use sparse nonce_hint values to index into a dense array of device_IDs requiring only a few bytes per entry at the cost of needing to try a few dozen values per lookup.

When there's no database match for the hint, trying all device keys takes ~10µs per device. That's not going to get much faster since that's a single AES decryption and leading zeroes check on the result.

A block cipher without complex key setup could be a lot faster if vectorised. Maybe run it on a GPU? That plausibly gets you to 1B block cipher ops per second.

Time Based Nonce

If an attacker jams messages from one device, they can cause desynchronisation and subsequently sent nonce hints won't be in the database.

If the devices have a clock inside them, you could do something time based (EG:32-bit day || 32 bit sequence counter) to ensure resynchronisation or go fully time based with ~1 minute granularity. This gives an authenticated sending time for free. For more accurate timestamps, a few bytes of the message can hold a finer grained time offset allowing messages that use "future" time values to indicate the real sending time.

Some BLE location beacons use this to restrict use to paying customers. Beacons generate time based pseudorandom values every 15s or so and end-user devices submit these values to a commercial API. The service generates every expected value for the current time (±1 min to account for clock drift) and then does a lookup for submitted values. The cost to run such a service for a few million beacons is a fraction of a CPU core + <1KB RAM per device.

Security

Message validity for random attacker supplied garbage requires:

  • a well formed nonce hint
    • bits of security:32 - log2(device_count)
    • unless you make >4 billion of these things you won't have to cut into your MAC security margin
  • a well formed MAC tag
    • bits of security:(whatever you decide on)
    • I suggest 64 bits
  • total security margin is MAC_bits + 32 - log2(device_count)
    • attacker has 2^-margin chance of forging a message by guessing randomly
    • attacker has 2^-MAC_bits chance of modifying legitimate message payload to forge a new valid message

Note that in the case devices sending 1M messages over their lifetime, that's 12MB of DB space per device, plausibly a 12TB database for 1M devices which isn't that big allows decrypting all possible messages with just a DB lookup. Optimizations are possible of course.

Asymmetric option

Cheapest secure option that meets your requirements would be unauthenticated ECDH encryption + symmetric MAC using per device keys.

Here's some python code demonstrating this option. It's not very exciting.

from Crypto.Cipher import AES
from nacl import bindings
from os import urandom
import struct
uint32_to_bytes=struct.Struct("!I").pack #pack 4 byte integer

import hashlib
sha256=lambda s:hashlib.sha256(s).digest()
import hmac
sha256_hmac=lambda k,s:hmac.new(k,s,"sha256").digest()

def AES_CTR(m,k,nonce=b"\0"*8):
    assert len(nonce)==8
    return AES.new(key=k,mode=AES.MODE_CTR,nonce=nonce).encrypt(m)

class device():
    def __init__(self,backend_pubkey,device_ID,device_key):
        self.backend_pubkey = backend_pubkey
        self.device_ID = device_ID
        self.device_key = device_key
    def encrypt_message(self,m):
        pk_eph,sk_eph=bindings.crypto_kx_keypair()
        k=bindings.crypto_kx_client_session_keys(pk_eph, sk_eph, self.backend_pubkey)[0]
        ct=AES_CTR(self.device_ID+m, k[:16])
        tag=sha256_hmac(self.device_key,k+ct)[:8]
        return pk_eph+ct+tag

class backend():
    def __init__(self):
        self.pk_srv,self.sk_srv=bindings.crypto_kx_keypair()
        self.master_secret=urandom(32)#used to derive device keys
        self._next_device_ID=0
    def dev_mac_key(self,dev_id):return sha256_hmac(self.master_secret,b"device_mac_key:"+dev_id)
    def provision_device(self):
        dev_id=uint32_to_bytes(self._next_device_ID)
        self._next_device_ID+=1
        return device(self.pk_srv,dev_id,self.dev_mac_key(dev_id))
    def decrypt(self,data):
        assert len(data)>=(32+4+8)#pk_eph,device_id,tag
        pk_eph,ct,tag=data[:32],data[32:-8],data[-8:]
        k=bindings.crypto_kx_server_session_keys(self.pk_srv, self.sk_srv, pk_eph)[1]
        pt=AES_CTR(ct, k[:16])
        dev_id,m=pt[:4],pt[4:]
        device_mac_key=self.dev_mac_key(dev_id)
        correct_tag=sha256_hmac(device_mac_key,k+ct)[:8]
        if not hmac.compare_digest(correct_tag,tag):
            raise ValueError("bad decryption:bad MAC")
        return dev_id,m

if __name__=="__main__":
    base=backend()
    dev1=base.provision_device()
    dev2=base.provision_device()
    
    m=b"hello world!"
    encd=dev1.encrypt_message(m)
    #check it decrypts correctly
    result=base.decrypt(encd)
    assert result==(dev1.device_ID,m)
    print("result:{devID:%r, message:%r}"%result)
    # (b'\x00\x00\x00\x00', b'hello world!')
    
    m=b"some telemetry"
    encd=dev2.encrypt_message(m)
    result=base.decrypt(encd)
    assert result==(dev2.device_ID,m)
    print("result:{devID:%r, message:%r}"%result)
    # (b'\x00\x00\x00\x01', b'some telemetry')
```
Score:2
ng flag

Caution: truly anonymous transmission is hard!

It's stated this constraint:

An attacker must not be able to identify which device sent a specific telemetry packet

That's hard, because a powerful enough adversary may use one or more receiver(s) that analyze characteristics of the signal with more details than the standard receiver in the base station:

  • Operate on a lower protocol layer where device identifiers exist.
  • Measure the power of the received signal, allowing e.g. to discriminate a far sensor from a close one.
  • Measure the delay in reflections of the main signal carrying a packet, which can be characteristic enough to discriminate two sensors even when their received power is comparable.
  • Measure the received signal frequency to high precision, allowing to recognize transmitters according to their deviation from nominal.
  • Measure when a packet reaches the antenna to high precision. That may allow to recognize the origin of packets according to deviation from nominal of the clock source timing when packets are sent.
  • Use multiple receivers with the above timing capability. The location of the transmitter, the the originating device, can be pinpointed by the technique that allow GPS to work: a difference $\Delta_T$ in receive instant of a given packet by two receivers implies the source of the packet is at distances differing by $c\,\Delta_T$ from the two receivers, where $c$ is the speed of light.

In the rest of the answer I disregard these issues, and assume the attacker has only access to the packet's data, at the same protocol layer as the transmitting device that generates them; and to timing information about when the packet is received with precision similar to the one needed for proper operation of a legitimate base station receiver.

Authenticated encryption with one symmetric key per device

We should not rule out the simple solution of one Pre-Shared Key (PSK) per device, allowing to ascertain using authenticated encryption (e.g. XChaCha20-Poly1305) that a packet is from a given device. First, given how fast modern AEAD is, we may never reach the point where the base station gets hit by the premise

There are enough devices that schemes where the base station tries to decrypt an incoming message with every key are not feasible.

A first issue with many devices is that each must have a PSK somewhat known to the base station. If using one random PSK per device, there's a potential secure storage size issue, as well as a work issue for accessing the PSK being tried from secure storage. However constant secure storage is enough if we use derived keys: each device's PSK is generated as a PRF or PRP of an incremental device identifier $I$ with a single master key $K_M$ known to the base station. We could use the round function of Chacha20 (fed $K_M$ and $I$) to generate a 256-bit PSK for device $I$, so that determining the PSK costs only a fraction of the XChaCha20-Poly1305 time.

There are means to avoid the curse that the base station's work grows linearly with the number of devices (and thus PSK) to try:

  • we can arrange things such that the base station is able to tell from an earlier decoded packet from a given device when that device will send it's next packet(s), with accuracy limited only by clock drift. E.g. a packet may contain (encrypted and authenticated) a sequence number $N$ (useful anyway against replay), and the delay from one packet to the next that device sends may be function of PSK and $N$ This way, the number of candidate PSK for a given packet is low, unless collisions are frequent or clock drift high. Scheduled time for the next packet of each given device can be arranged in a priority queue in the base station, which has low work cost, and only requires a few bytes of memory per device.
  • If that's not enough to reduce PSK guesswork to an acceptable level, or not practicable (e.g. because clock drift in devices is huge, or because devices must send packets at very unpredictable intervals, or because the base station is unable to time when a packet is received), we can add to each packet a header that's a PRF of $N$ and PSK, added as a fixed-size header of the packet. The base station can precompute the next few such headers, and only try the PSK that match. A 4-byte header will reduce guesswork by up to billions.
  • If the header's added size was an issue, it can be arranged that it conveys some of the sequence number, by using as header part of the sequence number encrypted under PSK with a PRP/Format Preserving Encryption, the rest of the encrypted sequence number being part of the packet. We leak practically negligible (and controllable) information about the origin of a packet: two packets with identical headers have a slightly lower than average probability to originate from the same device.

However, using symmetric crypto and fixed long-term storage, it seems inevitable that the work necessary to recover from arbitrary long power loss of the base station is proportional to the number $n$ of devices, and that for $O(\log n)$ work in continuous operating condition the memory requirement for the base station grows linearly with $n$.

Asymmetric encryption plus one symmetric key per device

If we can tolerate the increase is packet size, we can use asymmetric encryption to have base station work and storage idependent of the number of devices. E.g with X25519 ECDH, a long-term private key in the base station with matching public key in each device, and in each device an identification $I$ and matching secret PSK generated from master key $M_K$ and $I$:

  • each device maintains a sequence number $N$ incremented at each packet
  • the current $N$ and payload data $D$ is integrity-protected by a Message Authentication Code $M$ computed with key PSK on $N\mathbin\|D$
  • device draws a random ephemeral secret to generate ephemeral public key (256-bit) sent in clear as header of the packet, and computes an ECDH ephemeral shared secret that seeds a PRG
  • rest of the packet is $I\mathbin\|N\mathbin\|D\mathbin\|M$ XORed with a keystream generated from this PRG.

Base station receiving a packet

  • extracts ephemeral public key from the packed, computes ECDH ephemeral shared secret using the private key, seeds the PRG
  • deciphers the rest of packet which is alleged $I\mathbin\|N\mathbin\|D\mathbin\|M$, extracts these fields
  • derives device PSK from $I$ and $M_K$
  • checks MAC $M$ computed on $N\mathbin\|D$ with PSK, reject packet if that fails
  • checks sequence number $N$ is higher than earlier accepted sequence number for device $I$, else rejects the packet as a replay
  • process data $D$ as genuinely from device $I$.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.