\thesubsection Side channel detection
Timing side channel detection: Bauer et al. [bauer2016towards] propose an approach that utilizes cycle-accurate behavioral emulation of relevant CPU behavior such as instruction pipeline flushes and bus connection to detect timing side channels to defense information leakage (see I20). {comment}
Project | Real-world app/Benchmark | Source |
---|---|---|
uSFI | MiBench [guthaus2001mibench] | https://github.com/mageec/beebs |
mbed TLS [pallister2013beebs] | https://github.com/ARMmbed/mbedtls | |
uSFI, TZmCFI | FreeRTOS [coremark2021] | https://github.com/FreeRTOS/FreeRTOS |
EPOXY, ACES, RAI, PicXoM | STM32Cube firmware | https://github.com/STMicroelectronics/STM32CubeF4 |
MINION | 3DR Pixhawk | https://github.com/3drobotics/Pixhawk_OS_Hardware |
uTango | BenchIoT [almakhdhub2019benchiot] | https://github.com/embedded-sec/BenchIoT |
Silhouette, TZmCFI | CoreMark [coremark2021] | https://github.com/eembc/coremark |
Silhouette, PicXoM | BEEBS [guthaus2001mibench] | https://github.com/mageec/beebs |
CaRE | OTP, HMAC | |
Dhrystone [weicker1984dhrystone] | https://github.com/Keith-S-Thompson/dhrystone | |
uXOM | RIOT-OS [RIOT] | https://github.com/RIOT-OS/RIOT |
PicXoM | CoreMark-Pro [coremarkpro2021] | https://github.com/eembc/coremark-pro |
C-FLAT, OAT, Tiny-CFA, DIALED | Open syringe pump | https://github.com/naroom/OpenSyringePump/blob/master/syringePump/syringePump.ino |
OAT | House alarm system | https://github.com/ddrazir/alarm4pi |
Remote movement controller | https://github.com/bskari/pirc/tree/pi2 | |
Recover controller | http://github.com/Gwaltrip/RoverPi/tree/master/tcpRover | |
Light Controller | https://github.com/Barro/light-controller | |
DIAT | Pixhawk (flight controller for drones) | https://pixhawk.org |
VRASED, APEX | original OpenMSP430 core | openMSP430, 2009 |
DIALED | FireSensor | https://github.com/Seeed-Studio/LaunchPad_Kit/tree/master/Grove_Modules/temp_humi_senso |
UltrasonicRanger | https://github.com/Seeed-Studio/LaunchPad_Kit/tree/master/Grove_Modules/ultrasonic_ranger | |
p2IM | NuttX, RIOT, and Arduino | |
Self-balancing robot | https://github.com/mbocaneg/Inverted-Pendulum-Robot | |
Programmable logic controller | https://github.com/CONTROLLINO-PLC/CONTROLLINO_Library/tree/master/MAXI | |
Gateway | https://github.com/firmata/arduino | |
Drone | https://github.com/heethesh/eYSIP-2017_Control_and_Algorithms_development_for_Quadcopter | |
CNC | https://github.com/deadsy/grbl_stm32f4 | |
Reflow oven | https://github.com/rocketscream/Reflow-Oven-Controller | |
RIOT console | https://github.com/RIOTOS/RIOT/tree/master/examples/default | |
Steering control | https://github.com/jabelone/car-controller | |
Soldering iron | https://github.com/Ralim/ts100 | |
Heat press | https://en.wikipedia.org/wiki/Heat_press | |
DICE | too much | |
ASSURED | ?? |
\thesubsection Other IoT Platforms
Some papers are talking about other embedded devices, not specific for Cortex-M processors. nesCheck [midi2017memory] provides a low-overhead spatial memory safety enforcement for TinyOS [TinyOS], which is a popular embedded OS for wireless sensor network nodes that developed in the nesC language. It uses static analysis including type tracking to identify memory safety vulnerabilities at compile time. Then it maintains the metadata for each pointer that contains the memory area where the pointer points to, checks the bounds of the memory address at run-time. nesCheck is specifically designed for TinyOS applications, which is not support the Cortex-M now. HART [du2020hart] proposes an ETM-assisted kernel module on Arm processors. It reconstructs the program execution by decoding the ETM trace packets. Built based-on HART, HASAN can detect memory corruptions. {comment}
Class | Subclass | \rotatebox90# Bugs | \rotatebox90Critical | \rotatebox90High | \rotatebox90Medium |
Hardware Limitation | Information leakage though cache | 2 | 2 | ||
Improper stack selection | 1 | 1 | |||
Architectural Issue | Unprotected hardware block | 1 | 1 | ||
Inf. leakage through state switch | 1 | 1 | |||
\multirow4*Software | Network communication | 61 | 33 | 19 | 9 |
\multirow4*Implementation Issue | Validation bugs in privileged code | 28 | 4 | 20 | 4 |
Vulnerable crypto implementation | 4 | 1 | 3 | ||
Functional implementation bugs | 23 | 2 | 14 | 7 | |
Timing side channel | 15 | 1 | 1 | 13 |
CVEs reports for Mbed, RIOT-OS, TockOS, FreeRTOS, Contiki-ng, Nuttx, Zephyr, wolfSSL, and STM32 from 2015 to 2021 for software implementation issues.
Class | Subclass | # Bugs | Critical | High | Medium |
---|---|---|---|---|---|
Hardware Limitation | Information leakage though cache | 2 | - | - | 2 (100%) |
Architectural Issue | Improper stack selection | 1 | - | - | 1 (100%) |
\multirow6*Implementation Issue | Network Communication | 61 | 33 (54.10%) | 19 (31.15%) | 9 (14.75%) |
Other privileged code | 28 | 4 (14.29%) | 20 (71.43%) | 4 (14.29%) | |
Vulnerable Crypto Implementation | 4 | - | 1 (25%) | 3 (75%) | |
Other implementation bugs | 23 | 2 (8.70%) | 14 (60.87%) | 7 (30.43%) | |
Side channel | 15 | 1 (6.67%) | 1 (6.67%) | 13 (86.67%) |
Type | OS/App/Hardware | CVE ID | Description |
---|---|---|---|
Access uninitialized pointer | FreeRTOS | CVE-2018-16522 | |
Zephyr Project | CVE-2020-10060 | ||
Buffer overflow | Mbed TLS | CVE-2018-0487 | |
RIOT-OS | CVE-2019-1000006 | ||
FreeRTOS | CVE-2018-16526, CVE-2018-16525 | ||
Contiki-NG | CVE-2020-24336, CVE-2018-19417 | ||
CVE-2018-16665, CVE-2018-16664 | |||
CVE-2018-1000804 | |||
Mbed TLS | CVE-2015-8036, CVE-2015-5291 | Heap-based | |
Mbed CoAP lib | CVE-2020-12883, CVE-2019-17212 | ||
RIOT-OS | CVE-2021-27698, CVE-2021-27697 | Buffer Copy without Checking Size of Input | |
CVE-2021-27357, CVE-2020-15350 | |||
RIOT-OS | CVE-2017-8289 | Stack based | |
Zephyr Project | CVE-2020-10071, CVE-2020-10070 | ||
CVE-2020-10023, CVE-2020-10022 | |||
CVE-2020-10019, CVE-2017-14202 | |||
CVE-2017-14199 | |||
Confused Deputy | FreeRTOS | CVE-2018-16598 | |
Divide by zero | FreeRTOS | CVE-2018-16523 | |
Side channel attack | Mbed TLS | CVE-2020-16150, CVE-2020-10941 | |
CVE-2020-10932, CVE-2019-18222 | |||
CVE-2019-16910, CVE-2018-0498 | |||
CVE-2018-0497, | |||
Out-of-bounds read | Mbed TLS | CVE-2018-9989, CVE-2018-9988 | |
Mbed CoAP lib | CVE-2020-12886, CVE-2020-12884 | ||
FreeRTOS | CVE-2019-13120 | ||
Contiki-NG | CVE-2020-24335, CVE-2020-24334 | ||
CVE-2020-14937, CVE-2018-16667 | |||
Improper privilege management | Mbed TLS | CVE-2018-19608 | |
Out-of-bounds write | Mbed TLS | CVE-2018-0488 | |
Mbed CoAP lib | CVE-2019-17212 | ||
Contiki-NG | CVE-2020-14937, CVE-2020-14936 | ||
CVE-2020-14935, CVE-2020-14934 | |||
CVE-2019-8359, CVE-2018-20579 | |||
CVE-2018-16666, CVE-2018-16663 | |||
NuttX | CVE-2020-17528, CVE-2020-17529 | ||
wolfSSL | CVE-2020-36177 | ||
Zephyr Project | CVE-2020-10021 | ||
Improper Certificate Validation | Mbed TLS | CVE-2017-2784 | |
Zephyr Project | CVE-2020-10059 | ||
Integer Overflow or wraparound | Mbed OS | CVE-2021-27435, CVE-2021-27433 | |
Mbed TLS | CVE-2017-18187 | Bound-check bypass | |
Mbed CoAP lib | CVE-2019-17211 | ||
FreeRTOS | CVE-2021-31572, CVE-2021-31571 | ||
CVE-2018-16601 | |||
Contiki-NG | CVE-2019-9183 | ||
RIOT OS | CVE-2021-27427 | ||
TI-RTOS | CVE-2021-27429, CVE-2021-22636 | ||
CVE-2021-27502 | |||
NuttX | CVE-2021-26461 | ||
Zephyr Project | CVE-2020-10067, CVE-2020-10063 | ||
Improper Authentication | Mbed TLS | CVE-2017-14032 | |
Improper Input Validation | Mbed TLS | CVE-2016-3739 | |
Mbed MQTT lib | CVE-2019-17210 | ||
FreeRTOS | CVE-2018-16528 | ||
Zephyr Project | CVE-2020-10068, CVE-2020-10058 | ||
CVE-2020-10028 | |||
Incorrect comparison | Zephyr Project | CVE-2020-10027, CVE-2020-10024 | |
Off-by-one Error | Zephyr Project | CVE-2020-10062 | |
Socket related | RIOT-OS | CVE-2019-17389 | |
NULL Pointer Dereference | RIOT-OS | CVE-2019-16754 | |
NuttX | CVE-2020-1939 | ||
Zephyr Project | CVE-2018-1000800 | ||
Infinite loop | Mbed CoAP lib | CVE-2020-12885 | |
RIOT-OS | CVE-2019-15702 | ||
NuttX | CVE-2018-20578 | ||
Memory/data leakage | Mbed CoAP lib | CVE-2020-12887 | |
RIOT-OS | CVE-2019-15134 | ||
FreeRTOS | CVE-2018-16603, CVE-2018-16602 | ||
CVE-2018-16600, CVE-2018-16599 | |||
CVE-2018-16527, CVE-2018-16524 | |||
TockOS | CVE-2018-1000660 | ||
Unknown | FreeRTOS | CVE-2021-32020 | |
Use after free | FreeRTOS | CVE-2019-18178 | |
Zephyr Project | CVE-2017-14201 | ||
Inadequate Encryption Strength | Stm32Cube crypto extension | CVE-2020-20949 | |
Stack underflow | Armv8-M TrustZone | CVE-2020-16273 |
OAT [sun2020oat] formulates the property of operation execution integrity, in which an operation represents a sub-application code and data for a specific task (e.g., moving a robotic arm). OAT enables a remote verifier to detect if there is a control-flow or data-only attack on embedded devices. Compared to C-FLAT, OAT enables complete control-flow verification as well as attack flow reconstruction by combining hashes and execution traces in the measurement report. OAT instruments a trampoline function corresponding to conditional branches and indirect transfers to report the destination information to the trusted measurement engine, which runs in the secure state. OAT also instruments load instructions that are supposed to access the control-dependent variables, such as branch/loop condition variables, and compares the variable with the copy recorded in TrustZone to attest the data integrity. OAT was evaluated on Cortex-A with a performance overhead of 2.73% and binary size overhead of 13%. {comment} \defenseHybrid attestation: VRASED [nunes2019vrased] instantiates hardware/software remote attestation co-design aimed at low-end embedded systems, which include Cortex-M devices. The hardware module enforces access control to the prover’s key and secures the execution of the software implementation, which computes the attestation report. The security of VRASED is formally verified using linear temporal logic [cimatti2002nusmv]. The hardware module of VRASED requires 122 and 37 additional LUTs and registers respectively, and it takes 0.45 second to attest 4KB of RAM on an 8MHz device. A line of work builds on VRASED to support proof of execution, i.e., APEX [nunes2020apex], control-flow attestation, i.e., Tiny-CFA [nunes2020tiny], and data integrity attestation, i.e., DIALED [nunes2021dialed]. APEX [nunes2020apex] provides unforgeable remote proofs of execution based on VRASED. implements a verified architecture of PoX, which protects the software system even it is fully compromised, on embedded devices. It designs a hardware module that allows the verifier to request unforgeable remote proofs to show the successful execution of attested software and produce certain authenticated output additionally. It composes with VRASED [nunes2019vrased] and monitors a fixed memory region called METADATA to store the information about software execution status, including addresses/pointers to memory boundaries of the executable region, the memory boundaries of expected output, and an EXEC flag that indicates to verifier that the attested code executes successfully. APEX uses the same technique as VRASED to verify and prove its implementation. Compared with the pure VRASED, APEX incurs extra 2% registers and 12% LUTs and takes around 900ms on an 8MHz device. TinyCFA [nunes2020tiny] is designed based on APEX [nunes2020apex]. It instruments all control-flow instructions and conditional branches to log the destination to CF-Log, which stores in METADATA memory region. TinyCFA incurs 80% code size overhead and 50% run-time overhead on three real-world applications. DIALED [nunes2021dialed] uses APEX to securely log and authenticate data inputs by instrumenting the TinyCFA enabled program to log all data inputs of the program during execution along with its control-flow to the input log region. On top of TinyCFA, DIALED incurs 20% overhead of code size and run-time.
1 Future Directions
A section to point out interesting research directions and difficult level and possible solutions.
Maybe a table here.
\zimingFiner-grained stack cookies.
A future work direction is collecting unbiased Cortex-M firmware dataset.
0. New attacks
1. Just implement efficient versions of existing mitigations on Cortex-M
1.1 trace-based control-flow violation
2. Redesign existing versions of mitigations for Cortex-M
3. protect platform-specific asset. PC and mobile may not have such asset.
4. What issues will we have when we implement it in Rust? performance?
ret2ns Countermeasures:
To mitigate this attack, one feasible way is to let the compiler to instrument checks before the BXNS
and BLXNS
instructions.
The dynamic checks can lookup the non-secure state’s MPU to identify code privilege level at destination, and change the CONTROL_NS.nPRIV
bit accordingly.
The address masking can be used as static checks with less runtime overhead.
To facilitate this, the privileged and unprivileged programs in non-secure world need to be reorganized in separate address spaces.
Microkernel.
Would this be a new direction? https://www.cs.columbia.edu/ rgu/publications/osdi22-li.pdf. In general, explore successful strategies on other platform on Cortex-M and IoT.
PACCTI for cortex-m, how are they different from the Cortex-A counterpart?
2 Discussion
\thesubsection What else we can do with existing hardware features?
Besides those projects we discussed in §LABEL:sec:projects, which are mainly focusing on utilizing MPU and TrustZone, there are several hardware features on Cortex-M architecture that can be utilized to build other secure designs. For example, by learning the functionalities of MTB and ETM, we can trace and reconstruct the control flow execution. DWT provides specific data and instruction access checking, which can be used to monitor sensitive information. The newly introduced PACBTI also can be used to detect temporal memory safety violation like PTAuth [farkhani2021ptauth].
\thesubsection What else we can do to mitigate the implementation bugs?
Learn from Table \thetable, most vulnerabilities are caused by implementation bugs. To mitigate such bugs, there are several discussions.
\thesubsubsection Formal verification
Formally verified software systems can provide high assurance of isolation between applications, such as sel4 [klein2009sel4], which does not support Cortex-M processors now.
\thesubsubsection Bugs finding
The most challenge to implement bugs finding techniques of embedded software systems is how to emulate varietal peripherals and different hardware-dependent operations. Though there are several pieces of research that are seeking for efficient and accurate peripheral emulation to help firmware dynamic analysis, there still need a long way to meet the fast-growing hardware diversity.
\thesubsubsection Software fault isolation
MPU-assisted techniques always suffer from the limitation of configurable region numbers, how to build a pure but efficient software-assisted isolation and confinement is still a worthwhile discussion. Software fault isolation [tan2017principles] protects program partitions from affecting by the adversary part (see LABEL:No_memory_virtualization:). It has the untrusted fault domain (aka. compartment) and a secure external to manage those domains. SFI defines a data and code access policy to constrain the read/write access inside the data region, and a control transfers policy to keep the control-flow stay within the code region or a safe external address. But the control transfer policy still exposes to ROP attacks.
\thesubsubsection Other artificial software diversity mechanism
As discussed in LABEL:No_memory_virtualization:, ASLR used on modern operating systems is hard to deploy on Cortex-M systems because of the low entropy. Coarse-grind ASLR is another research direction. Besides ASLR, which is program-level diversity, other diversity methods such as instruction-level diversity including instructions reordering, equivalent instruction substitution, and garbage code insertion also show potential security protection.
\thesubsubsection Address or memory sanitizer
\thesubsection Attack prediction
Some Cortex-M processors (e.g., Cortex-M55) equip cache and branch prediction, which expose the attack surface to Meltdown and Spectre attacks.
\futureFormally verified embedded systems:
Like sel4 [klein2009sel4]
\futureSoftware fault isolation:
Software fault isolation [tan2017principles] protects program partitions from affecting by the adversary part (see I01
).
It has the untrusted fault domain (aka. compartment) and a secure external to manage those domains.
SFI defines a data and code access policy to constrain the read/write access inside the data region, and a control transfers policy to keep the control-flow stay within the code region or a safe external address.
But the control transfer policy still exposes to ROP attacks.
\futureInstruction diversity:
Discuss why other diversity does not work.
Instruction level diversity includes instruction reordering, equivalent instruction substitution, and garbage code insertion, etc.
\futureVirtualization:
\futureHardware Security Enhancement:
Notes
To-do list: (1) Crawl as many cortex-m firmware images as possible. The idea is to generate intelligence of real-world Cortex-M systems. (2) Also think about crawling Cortex-M projects from GitHub. (3) May add back the binary trick section ans justify our claim that ”document is incomplete”. To-do list Nov 21st 2002 [] work on the big figure and [3.6] [X] Rewrite the summary for Section 4. [X] Exokernel - TrustZone based. [X] Update Section 6 with new CVEs. [X] I20 and Table 1 needs to be updated [] Section 7.7 [] Implement PSPLIM identification in Ghidra [] All the summary -
\thesubsection Adversary model
We consider any thing can be malicious. secure code is malicious. Privileged code is malicious. Entities: (1) unprivileged software in secure, (2) privileged software in secure (do not need to differentiate handler or thread), (3) unprivileged software in non-secure, (4) privileged software in non-secure (do not need to differentiate handler or thread), (5) peripheral (including DMA). Potential problems: 1. How difficulty is it to achieve non-executable stack on Cortex-M because we do not have enough MPU regions to do so? 2. Confused deputy. A NSC function can be called by secure and the non-secure state. Then, how does the function know who call it? The origin could be very useful for some cases. 3. Multiple entry kernel by implementing kernel in the secure state privileged instead of non-secure privileged. But, app runs at (U, N). lack of abstraction, no OS, some bare-metal. Vendors take compromises on security to reduce footprint. For example, commonplace class key usage versus supporting key provisioning and per-device secrets. We should have a section to discuss deployed software security defenses in existing OS and compilers. We should compare the code generated by different compilers (especial ARMCC). Use paper [SoK: Eternal War in Memory] as an exhaust list of available software protection approaches. A comparison table about threats for x86, Cortex-A, and RISC-V. Then explain why they are unique for Cortex-M.
Bug reports
Instruction diversity: Artificial software diversity [larsen2014sok] randomizes the programs from different levels to hide the real information from attackers. (i) Instruction level diversity includes instruction reordering, equivalent instruction substitution, and garbage code insertion, etc. (ii) Basic block level diversity includes basic block reordering, opaque predicate insertion, and branch function insertion; (iii) Function level diversity includes stack layout randomization, function parameter randomization, and control flow flattening, etc.; (iv) Program level diversity includes functions reordering, randomizing the base addresses (e.g., ASLR), program encoding, data, or even library entry point;
3 Test Cases
\thesubsection Our Test Cases
A table of all test cases and how they map to different sections in the paper. \zimingWhere to output? only to LCD? some devices may not have LCD. Can we set to serial port/JTAG debug as well? To check the output from the debugging view (Keil IDE windows), we can use printf() via the ITM (debug and trace feature of Cortex-M, may need more description about ITM). Some processors did not implement ITM, we can retargeting the I/O to UART, and use the series windows via serios port. But the interrupt of UART has a lower priority than HardFault_handler or other exceptions which we want to print out the stack or register information while at the handler mode. The I/O retargeting to UART can not work when a hardware fault happens. One possible solution is to use the UART without the interrupt, such as DMA. Another solution is to change the UART interrupt priority higher than fault handlers. The later solution may have a potential security problem. Or if the device supports the LCD screen, we can also utilize the LCD to print out the debug information. \zimingwe should redesign to code to run a ton of tests in one image. If a test generates a fault and the tests cannot continue, we software reset the system. \zimingThe beginning of the test case suite should read system registers and print out the hardware configurations of the system: 1) CPU Model, frequency, SRAM/SSRAM. Also print out a banner that CactiLab developed this tool. \zimingEach test case should print out the purpose of the test, test number, the expected outcome.
\thesubsection Cortex-M projects
Target | Project | Prototype |
---|---|---|
Fuzzing | P2IM [feng2020p] | Cortex-M |
DICE [mera2020dice] | Cortex-M/MIPS M4K/M-Class | |
Symbolic Execution | FIE [davidson2013fie] | TI MSP430 |
Deep learning | MCUNet [lin2020mcunet] | Cortex-M4, M7 |
Neutral network | CMSIS-NN [lai2018cmsis] | Cortex-M7 |
Secure software | Keccak-based secure PRNG [van2014software] | Cortex-M0(+), M3, and M4(F |
Teaching real-time DSP [wickert2015using] | Cortex-M4 | |
Communications of Smart Meters [abbasinezhad2017ultra] | Cortex-M3 | |
Modular Multiplication [seo2020memory] | Cortex-M4 | |
Secure software updating | ASSURED [asokan2018assured] | HYDRA/Cortex-M33 |
Cryptography | ECC [de2014ultra] | Cortex-M0+ |
NewHope [alkim2016newhope] | Cortex-M0, M4 | |
AES [wardhani2017fast] | Cortex-M3 | |
Round5 [saarinen2018shorter] | Cortex-M4 | |
Round2 [seo2019sike] | Cortex-M4 | |
Benchmark | BEEBS [pallister2013beebs] | Cortex-M0 |
CoreMark [coremark2021] | ||
CoreMark-Pro [coremarkpro2021] | ||
BenchIoT [] | ||
pqm4 [kannwischer2019pqm4] | Cortex-M4 |
\thesubsection Cortex-M TrustZone-enabled Platforms
Show a complete table of devices that can use Cortex-M TrustZone.
Platform | SoC | Processor | Multicore | Publicly? | Price |
---|---|---|---|---|---|
GD32E23x [] | GigaDevice | M23 | single-core | Yes | |
GD32E235 [] | GigaDevice | M33 | single-core | Yes | |
M2351 [] | NuMicro | M23 | single-core | Yes | |
SAML11 Xplained Pro [] | Microchip SAML11 | M23 | single-core | Yes | |
Renesas S1JA [] | Renesas | M23 | |||
Renesas RA2A1 [] | Renesas | M23 | |||
Arm MSP2+ FPGA [] | Arm | M23/M33 | Yes | - | |
Arm MSP3 FPGA [] | Arm | M23/M33 | Yes | - | |
Cortex-M33 | - |
Company | Series | CPU |
---|---|---|
STM32 | STM32WB55xx | M4 + M0+ |
NXP | i.MXx | A9/7/53 + M4/7 |
LPC435x/3x/2x/1x | M4 + M0 | |
Texas Instruments | OMAP4430, OMAP4460 | A9 + M3 |
OMAP5xx | A15 + M4 | |
Xilinx | Zynq-7000 | A9 + FPGA (M1/3 soft core) |
Microsoft | MediaTek 3620 chip | A7 + M4 |
Benchmarks
Dhrystone [weicker1984dhrystone] is a synthetic systems programming benchmark to measure processor and compiler performance. It models the distribution of different types of high-level language statements, operators, operand types and locality sourced from contemporary systems programming statistics to present actual programming practice. MiBench [guthaus2001mibench] is adapted to Arm instruction set to characterize the embedded programs via instruction distribution, memory behavior, and available parallelism. It contains 35 embedded applications in C language that are divided into six suites, which target to automotive and industrial control, consumer devices, office automation, networking, security, and telecommunications. BEEBS [pallister2013beebs] measures the energy consumption of embedded devices. It can also be used for evaluating performance and code size overhead (see Silhouette [zhou2020silhouette] and PicoXOM [shen2020fast]) since it contains a wide range of embedded applications, such as AES, integer and floating-point matrix multiplications, etc. BenchIoT [almakhdhub2019benchiot] is designed to evaluate the security, performance, memory usage, and energy consumption for MUCs. The security evaluation contains minimizing privileged execution (increasing SVC cycles), enforcing memory isolation, and control-flow hijacking protection. It provides a curated set of five real-world applications that can run on both bare-metal or an OS, which includes smart light, smart locker, etc. BenchIoT supports Armv7-M now. CoreMark [coremark2021] and CoreMark-Pro [coremarkpro2021] is a processor benchmark suite that supports both high-performance processors and low-end processors. CoreMark-Pro evaluates the CPU and memory with five integer workload (e.g., JPEG image compression, SHA-256) and four floating-point workload (e.g., neural network and fast Fourier transform). The support library accompanying the Dhrystone benchmark contains both directed indirect subroutine calls, and indirect returns. \archBare-metal systems and unikernels: In the privileged application architecture all codes run at the privileged level, and the non-privileged level of the microcontroller is not utilized as shown in Figure LABEL:fig:overview(a). Both bare-metal applications, which directly run on the hardware without an operating system layer, and RTOSes that execute themselves and applications at the privileged level fall into the category of privileged application architecture. These applications are compiled and statically linked with libraries, e.g., libc, Cortex Microcontroller Software Interface Standard (CMSIS), etc., and RTOSes, e.g., Mbed bare-metal profile [MbedOS], into one big executable. This architecture provides efficient execution and is easy to implement. But, it has many security issues, which we discuss in LABEL:No_or_weak_privilege_separation:, LABEL:No_or_weak_memory_access_control;_executable_stack:, LABEL:No_or_weak_stack_separation:, LABEL:Statically_linked_executables_and_no_dynamic_linker:. \archMonolithic kernels: In the monolithic kernel architecture the RTOSes or privileged services run at the privileged level, whereas applications run at the non-privileged level as shown in Figure LABEL:fig:overview(b). In this architecture, the code base executing at the privileged level, e.g., Hardware Abstraction Layer (HAL), is significantly larger than the non-privileged application, resulting in a monolithic model. Example RTOSes that adopt this architecture include FreeRTOS with MPU enabled, Tock [levy2017multiprogramming], uClinux [uclinux], RT-thread [RTthread], and Mbed OS [MbedOS]. \archExokernels: As shown in Figure LABEL:fig:overview(c), in this architecture the size of the code executing at the privileged level is significantly reduced from LABEL:Monolithic_kernel_architecture:. Only the sensitive services and a trusted separation kernel have the highest privileged level, the unprivileged level is divided into multiple zones to support RTOSes or bare-metal applications, which are managed by the separation kernel. We will discuss a software-based virtualization system, namely Hermes [klingensmith2018hermes], in LABEL:Software-based_virtualization:, a least kernel privilege system, namely EPOXY [clements2017EPOXY], and MultiZone [pinto2020multi] in LABEL:MPU-assisted_isolation_and_confinement:. \archDual-world privileged application architecture: Architecture A01 is extended into two worlds if it utilizes TrustZone as shown in Figure LABEL:fig:overview(d). In this architecture, a secure application is loaded first, then gives control to the non-secure application. With non-secure callable functions, the non-secure application can access secure services. Most startup projects use this architecture to help developers to learn the TrustZone technique, such as the TrustZone Lab on SAM L11 Xplained Pro board [saml11demo], TrustZone Blinky project on Arm V2M MPS2+ board [mps2plusiotkit], etc. \archDual-world systems: As shown in Figure LABEL:fig:overview(e), it is the extension of architecture A02. It has more sophisticated isolation levels with privileged separation and Trusted Execution Environment (TEE). RTOSes and applications run in the non-secure state, whereas the secure services run in the secure state. Arm Trusted Firmware for Cortex-M (TF-M) [ATFM] provides a HAL to use Cortex-M TrustZone, which is a representation of this architecture. TF-M consists of (i) a secure boot module running at the privileged level to authenticate the integrity of the secure state and non-secure state images; (ii) a core module running at the secure state privileged level that controls the isolation, communication, and execution; (iii) other security services running at secure state unprivileged level including crypto, internal trusted storage, protected storage, and attestation [TFMtech]. Keil RTX5 [rtx5], Mbed OS [MbedOS], FreeRTOS [freertoskernel], RT-Thread [RTthread], Zephyr [Zephyr], etc., integrate ATF-M [ATFM]. A05 can be an advance design to build a secure software system. However, it only supports one TEE and suffers from the increasing size of TEE TCBs, which promote itself more prone to vulnerabilities [van2019tale]. \archMulti-world systems: As shown in Figure LABEL:fig:overview(f), this architecture extends the dual-world design into multiple equally-secure TEEs within the non-secure state. The trusted kernel at the secure privileged level handles non-secure environment switches and resources access control. We will discuss uTango [oliveira2021utango], one example to build multiple TEEs by utilizing Cortex-M TrustZone, in LABEL:TrustZone-assisted_multiple_TEEs:.
\thesubsubsection (
SVC misuse)
\issueNo or weak stack separation:
RTOSs, including FreeRTOS [freertosstack] and Zephyr [zephyrstack], support multi-tasking, so each task has its own stack.
However, stack separation between the kernel and application is rarely used in bare-metal firmware.
10 samples that adopt privilege separation (discussed in LABEL:No_or_weak_privilege_separation:) leverage both the MPS-
and PSP-
based stacks.
In addition, another 124 samples in our dataset use both the MPS-
and PSP-
based stacks without privilege separation.
All other samples (1,663; 92.54%) only adopt a single MSP
-based stack.
\issueNo or weak memory access control; executable stack:
Even though some Cortex-M devices have MPU, previous research believes that most real-world systems do not use it [clements2017EPOXY, zhou2019good, clements2018aces].
We confirm that 1,773 of the 1,797 firmware in our dataset do not use MPU, which means the address space in code, SRAM, and RAM is executable.
For the same reason, data execution prevention (DEP) is not enforced on these systems.
Without memory access control,
malicious code can also read and write arbitrary memory. Out of the 24 firmware samples that use MPU in our dataset, 5 samples use the MPU defined by Arm. The remaining 19 samples use a vendor-specific implementation (i.e., Nordic’s simplified MPU (sMPU) [nordicmpu]), which only supports
a subset of features defined by Arm.
Specifically, sMPU only supports read and write permissions and can only divide memory into two protection domains.
{comment}
\issueWeak readback protection:
Readback protection
prevents the leakage of data on the memory and flash through the debugging interface.
Besides completely disabling the hardware debug interface [sultan2020readback], some systems include specific features for readback protection.
For example, sMPU can disable the debugger access to the flash memory region.
However, it still allows register access and single step** of the processor, which was exploited to dump the full flash content [kris2015dum**].
This flaw was addressed in the new processors [nordicnrf52].
But for the Nordic firmware in our dataset, only 34 out of 1462 firmware have enabled the readback protection.
Empirical Analysis on Real-world Firmware:
Even if the aforementioned compilers offer the canary mechanism, only one of the 1,797 firmware samples in our dataset adopts it.
{comment}
On FreeRTOS, each task has its own stack. It has a compiler option configCHECK_FOR_STACK_OVERFLOW
to enable the task stack overflow checking [freertostaskovfchk].
When switching the tasks, the RTOS kernel can check that either the processor stack pointer remains within the valid stack space,
or the last 16 bytes within the valid stack range remains same when the stack was first created by the task.
The latter method is stronger however less efficient than the former one.
Mbed-OS uses the Keil-RTX5 RTOS kernel, which implements a software stack overflow checking that can be enabled with defining OS_STACK_CHECK
[rtx5config].
During the thread switch, the kernel will check both that the current running thread stack pointer is within the stack space,
and the stack magic word at bottom of the stack is intact.
The stack magic word is a fixed value that defined in the kernel header file.
\thesubsubsection Missing barrial instructions
Statically linked executables and no dynamic linker: Most software systems on Cortex-M are statically linked into one big executable, namely the firmware, and no dynamic linker is available in most RTOSs. Therefore, a load-time ASLR, which has been another standard feature in modern operating systems for more than a decade, cannot be implemented. Even if boot time ASLR is possible, many IoT devices will not reboot for a very long time. {comment} (1) Cortex-M processors do not support some techniques we take for granted on x86/64 or Cortex-A, and their security features are less known. For example, virtual memory management is not available for lack of a Memory Management Unit (MMU), but Cortex-M does offer a Memory Protection Unit (MPU) that provides access control in the physical memory space. Also, Cortex-M has recently introduced its own version of TrustZone, which has different underlying mechanisms from its Cortex-A counterpart; (2) Even if Cortex-M offers some security-related features, existing embedded and IoT software systems barely use them and largely lack protection against code injection, control-flow hijack, data corruption, and other attacks. In other words, software technologies for security on such devices significantly lag behind the development of not only mobile and personal computer security but also their own hardware security offerings; In other words, software technologies for security on such devices significantly lag behind the development of not only mobile and personal computer security but also their own hardware security offerings. {comment} \issueWeak security configuration: As discussed in argXtract [sivakumaran2021argxtract], configuration information that includes device and protocols or special hardware they use, application repositories, and website interfaces, can indicate the security status and introduce vulnerabilities of the firmware binaries. For example, the man in the middle attack because the Bluetooth communication is not encrypted or authenticated [hackerbluetooth]. Default root password configuration [2880] exposes root privilege to attackers.