\thetable Comparative evaluation of security projects for Cortex-M or other embedded and IoT systems

{comment}

Table \thetable: Comparative evaluation of security projects for Cortex-M or other embedded and IoT systems

\scalebox

0.935 {threeparttable} Project Defenses Hardware Run-time overhead (%) \rotatebox90Input (C: source code; B: binary) \rotatebox90Target (B: bare-metal; R: RTOS; M: Mbed) \rotatebox90Prototype implementation (ISA) \rotatebox90MPU-assisted isolation and confinement \rotatebox90TrustZone-assisted multiple TEEs \rotatebox90Software-based virtualization \rotatebox90Shadow Stack \rotatebox90SafeStack \rotatebox90Return address integrity \rotatebox90Forward control flow enforcement \rotatebox90ROP gadget removal \rotatebox90Artificial software diversity \rotatebox90Execute only memory \rotatebox90Software-based control flow attestation \rotatebox90Software-based data integrity attestation \rotatebox90Hybrid code integrity attestation \rotatebox90Hybrid Proof of Execution \rotatebox90Hybrid control flow attestation \rotatebox90Hybrid data integrity attestation \rotatebox90Secure software update \rotatebox90Peripheral-aware static analysis \rotatebox90Peripheral-aware symbolic execution \rotatebox90Peripheral-aware fuzzing \rotatebox90MPU \rotatebox90Unprivileged store/load instructions \rotatebox90TrustZone \rotatebox90DWT \rotatebox90Synthesized hardware \rotatebox90Code, binary size increasement (Flash %) \rotatebox90Memory overhead (RAM %) \rotatebox90Energy consumption overhead (%) \rotatebox90Bare-metal applications \rotatebox90RTOSs \rotatebox90BEEBS [pallister2013beebs] \rotatebox90CoreMark [coremark2021] \rotatebox90CoreMark-Pro [coremarkpro2021] \rotatebox90MiBench [guthaus2001mibench] \rotatebox90Dhrystone [weicker1984dhrystone] \rotatebox90BenchIoT [almakhdhub2019benchiot] uSFI [aweke2018usfi] C R v7 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue10 \textcolordarkblue0.7 \textcolordarkblue9.6 \textcolordarkblue1.1 $\mu$ Visor [uVisor] C M v7 \supportlevel10 \supportlevel10 \supportlevel10 EPOXY [clements2017EPOXY] C B v7 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue8 \textcolordarkblue1.1 \textcolordarkblue1.8 \textcolordarkblue1.6 ACES [clements2018aces] C B v7 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkorange26 \textcolordarkblue5.7 MINION [kim2018securing] C R v7 \supportlevel10 \supportlevel10 \textcolordarkblue2 Utango [oliveira2021utango] B R v8 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue4.6 \textcolordarkgreen.05 Hermes [klingensmith2018hermes] B R v7 \supportlevel10 Silhouette [zhou2020silhouette] C B v7 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue8.9 \textcolordarkblue3.4 \textcolordarkblue1.3 CaRE [nyman2017cfi] B B v8 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkred513 TZmCFI [kawada2020TZmCFI] C R v8 \supportlevel10 \supportlevel10 \textcolordarkorange84 \textcolordarkorange14 $\mu$ RAI [almakhdhubmurai2020] C B v7 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkorange54.1 \textcolordarkblue15.2 \textcolordarkgreen.1 \textcolordarkgreen.1 uXOM [kwon2019uxom] C R v7 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue15.7 \textcolordarkblue7.5 \textcolordarkblue7.3 PicoXOM [shen2020fast] C R v7 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue5.89 \textcolordarkgreen.02 \textcolordarkgreen.46 \textcolordarkgreen.11 C-FLAT [abera2016c] B B A \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkorange25 \textcolordarkorange76 LAPE [huo2020lape] C B v7 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkorange30 \textcolordarkblue7.5 \textcolordarkblue2.2 OAT [sun2020oat] C B A \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkblue13 \textcolordarkblue2.7 DIAT [abera2019diat] C R v7 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 VRASED [nunes2019vrased] C B * \supportlevel10 \supportlevel10 \textcolordarkblue3.5 APEX [nunes2020apex] C B * \supportlevel10 \supportlevel10 \textcolordarkgreen.01 TinyCFA [nunes2020tiny] B B * \supportlevel10 \supportlevel10 \supportlevel10 \textcolordarkorange80 \textcolordarkorange50 DIALED [nunes2021dialed] B B * \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 ASSURED [asokan2018assured] B R v8 \supportlevel10 \supportlevel10 \textcolordarkorange67 \textcolordarkblue6 PASAN [kim2021pasan] C R v7 \supportlevel10 argXtract [sivakumaran2021argxtract] B R v6/7 \supportlevel10 FIE [davidson2013fie] C R * \supportlevel10 P2IM [feng2020p] C R v7 \supportlevel10 DICE [mera2020dice] C R v7 \supportlevel10 Laelaps [cao2020device] B R v7 \supportlevel10 \supportlevel10 $\mu$ Emu [zhou2021automatic] B R v7 \supportlevel10 \supportlevel10 {tablenotes} v7: Armv7-M, v8: Armv8-M, A: Cortex-A, *: TI MSP430 [MSP430]. \supportlevel10: Implement a specific security feature or need a specific hardware support. \colordarkgreen green: $\leq$ 1% negligible overhead, \colordarkblue blue: $<$ 20% practical, \colordarkorange orange: $<$ 100% noticeable, \colordarkred red: $>$ 100% heavy overhead.

{comment}

\thesubsection Side channel detection

\defense

Timing side channel detection: Bauer et al. [bauer2016towards] propose an approach that utilizes cycle-accurate behavioral emulation of relevant CPU behavior such as instruction pipeline flushes and bus connection to detect timing side channels to defense information leakage (see I20). {comment}

Table \thetable: Real-world project

Project	Real-world app/Benchmark	Source
uSFI	MiBench [guthaus2001mibench]	https://github.com/mageec/beebs
	mbed TLS [pallister2013beebs]	https://github.com/ARMmbed/mbedtls
uSFI, TZmCFI	FreeRTOS [coremark2021]	https://github.com/FreeRTOS/FreeRTOS
EPOXY, ACES, $\mu$ RAI, PicXoM	STM32Cube firmware	https://github.com/STMicroelectronics/STM32CubeF4
MINION	3DR Pixhawk	https://github.com/3drobotics/Pixhawk_OS_Hardware
uTango	BenchIoT [almakhdhub2019benchiot]	https://github.com/embedded-sec/BenchIoT
Silhouette, TZmCFI	CoreMark [coremark2021]	https://github.com/eembc/coremark
Silhouette, PicXoM	BEEBS [guthaus2001mibench]	https://github.com/mageec/beebs
CaRE	OTP, HMAC
	Dhrystone [weicker1984dhrystone]	https://github.com/Keith-S-Thompson/dhrystone
uXOM	RIOT-OS [RIOT]	https://github.com/RIOT-OS/RIOT
PicXoM	CoreMark-Pro [coremarkpro2021]	https://github.com/eembc/coremark-pro
C-FLAT, OAT, Tiny-CFA, DIALED	Open syringe pump	https://github.com/naroom/OpenSyringePump/blob/master/syringePump/syringePump.ino
OAT	House alarm system	https://github.com/ddrazir/alarm4pi
	Remote movement controller	https://github.com/bskari/pirc/tree/pi2
	Recover controller	http://github.com/Gwaltrip/RoverPi/tree/master/tcpRover
	Light Controller	https://github.com/Barro/light-controller
DIAT	Pixhawk (flight controller for drones)	https://pixhawk.org
VRASED, APEX	original OpenMSP430 core	openMSP430, 2009
DIALED	FireSensor	https://github.com/Seeed-Studio/LaunchPad_Kit/tree/master/Grove_Modules/temp_humi_senso
	UltrasonicRanger	https://github.com/Seeed-Studio/LaunchPad_Kit/tree/master/Grove_Modules/ultrasonic_ranger
p²IM	NuttX, RIOT, and Arduino
	Self-balancing robot	https://github.com/mbocaneg/Inverted-Pendulum-Robot
	Programmable logic controller	https://github.com/CONTROLLINO-PLC/CONTROLLINO_Library/tree/master/MAXI
	Gateway	https://github.com/firmata/arduino
	Drone	https://github.com/heethesh/eYSIP-2017_Control_and_Algorithms_development_for_Quadcopter
	CNC	https://github.com/deadsy/grbl_stm32f4
	Reflow oven	https://github.com/rocketscream/Reflow-Oven-Controller
	RIOT console	https://github.com/RIOTOS/RIOT/tree/master/examples/default
	Steering control	https://github.com/jabelone/car-controller
	Soldering iron	https://github.com/Ralim/ts100
	Heat press	https://en.wikipedia.org/wiki/Heat_press
DICE	too much
ASSURED	??

{comment}

\thesubsection Other IoT Platforms

Some papers are talking about other embedded devices, not specific for Cortex-M processors. nesCheck [midi2017memory] provides a low-overhead spatial memory safety enforcement for TinyOS [TinyOS], which is a popular embedded OS for wireless sensor network nodes that developed in the nesC language. It uses static analysis including type tracking to identify memory safety vulnerabilities at compile time. Then it maintains the metadata for each pointer that contains the memory area where the pointer points to, checks the bounds of the memory address at run-time. nesCheck is specifically designed for TinyOS applications, which is not support the Cortex-M now. HART [du2020hart] proposes an ETM-assisted kernel module on Arm processors. It reconstructs the program execution by decoding the ETM trace packets. Built based-on HART, HASAN can detect memory corruptions. {comment}

Table \thetable: CVEs for Cortex-M Software Systems

{threeparttable}

Class	Subclass	\rotatebox90# Bugs	\rotatebox90Critical	\rotatebox90High	\rotatebox90Medium
Hardware Limitation	Information leakage though cache	2			2
	Improper stack selection	1			1
Architectural Issue	Unprotected hardware block	1			1
	Inf. leakage through state switch	1			1
\multirow4*Software	Network communication	61	33	19	9
\multirow4*Implementation Issue	Validation bugs in privileged code	28	4	20	4
	Vulnerable crypto implementation	4		1	3
	Functional implementation bugs	23	2	14	7
	Timing side channel	15	1	1	13

{tablenotes}

CVEs reports for Mbed, RIOT-OS, TockOS, FreeRTOS, Contiki-ng, Nuttx, Zephyr, wolfSSL, and STM32 from 2015 to 2021 for software implementation issues.

{comment}

Table \thetable: Summary of CVEs for Cortex-M Software

Class	Subclass	# Bugs	Critical	High	Medium
Hardware Limitation	Information leakage though cache	2	-	-	2 (100%)
Architectural Issue	Improper stack selection	1	-	-	1 (100%)
\multirow6*Implementation Issue	Network Communication	61	33 (54.10%)	19 (31.15%)	9 (14.75%)
	Other privileged code	28	4 (14.29%)	20 (71.43%)	4 (14.29%)
	Vulnerable Crypto Implementation	4	-	1 (25%)	3 (75%)
	Other implementation bugs	23	2 (8.70%)	14 (60.87%)	7 (30.43%)
	Side channel	15	1 (6.67%)	1 (6.67%)	13 (86.67%)

{comment}

Table \thetable: CVEs

Type	OS/App/Hardware	CVE ID	Description
Access uninitialized pointer	FreeRTOS	CVE-2018-16522
	Zephyr Project	CVE-2020-10060
Buffer overflow	Mbed TLS	CVE-2018-0487
	RIOT-OS	CVE-2019-1000006
	FreeRTOS	CVE-2018-16526, CVE-2018-16525
	Contiki-NG	CVE-2020-24336, CVE-2018-19417
		CVE-2018-16665, CVE-2018-16664
		CVE-2018-1000804
	Mbed TLS	CVE-2015-8036, CVE-2015-5291	Heap-based
	Mbed CoAP lib	CVE-2020-12883, CVE-2019-17212
	RIOT-OS	CVE-2021-27698, CVE-2021-27697	Buffer Copy without Checking Size of Input
		CVE-2021-27357, CVE-2020-15350
	RIOT-OS	CVE-2017-8289	Stack based
	Zephyr Project	CVE-2020-10071, CVE-2020-10070
		CVE-2020-10023, CVE-2020-10022
		CVE-2020-10019, CVE-2017-14202
		CVE-2017-14199
Confused Deputy	FreeRTOS	CVE-2018-16598
Divide by zero	FreeRTOS	CVE-2018-16523
Side channel attack	Mbed TLS	CVE-2020-16150, CVE-2020-10941
		CVE-2020-10932, CVE-2019-18222
		CVE-2019-16910, CVE-2018-0498
		CVE-2018-0497,
Out-of-bounds read	Mbed TLS	CVE-2018-9989, CVE-2018-9988
	Mbed CoAP lib	CVE-2020-12886, CVE-2020-12884
	FreeRTOS	CVE-2019-13120
	Contiki-NG	CVE-2020-24335, CVE-2020-24334
		CVE-2020-14937, CVE-2018-16667
Improper privilege management	Mbed TLS	CVE-2018-19608
Out-of-bounds write	Mbed TLS	CVE-2018-0488
	Mbed CoAP lib	CVE-2019-17212
	Contiki-NG	CVE-2020-14937, CVE-2020-14936
		CVE-2020-14935, CVE-2020-14934
		CVE-2019-8359, CVE-2018-20579
		CVE-2018-16666, CVE-2018-16663
	NuttX	CVE-2020-17528, CVE-2020-17529
	wolfSSL	CVE-2020-36177
	Zephyr Project	CVE-2020-10021
Improper Certificate Validation	Mbed TLS	CVE-2017-2784
	Zephyr Project	CVE-2020-10059
Integer Overflow or wraparound	Mbed OS	CVE-2021-27435, CVE-2021-27433
	Mbed TLS	CVE-2017-18187	Bound-check bypass
	Mbed CoAP lib	CVE-2019-17211
	FreeRTOS	CVE-2021-31572, CVE-2021-31571
		CVE-2018-16601
	Contiki-NG	CVE-2019-9183
	RIOT OS	CVE-2021-27427
	TI-RTOS	CVE-2021-27429, CVE-2021-22636
		CVE-2021-27502
	NuttX	CVE-2021-26461
	Zephyr Project	CVE-2020-10067, CVE-2020-10063
Improper Authentication	Mbed TLS	CVE-2017-14032
Improper Input Validation	Mbed TLS	CVE-2016-3739
	Mbed MQTT lib	CVE-2019-17210
	FreeRTOS	CVE-2018-16528
	Zephyr Project	CVE-2020-10068, CVE-2020-10058
		CVE-2020-10028
Incorrect comparison	Zephyr Project	CVE-2020-10027, CVE-2020-10024
Off-by-one Error	Zephyr Project	CVE-2020-10062
Socket related	RIOT-OS	CVE-2019-17389
NULL Pointer Dereference	RIOT-OS	CVE-2019-16754
	NuttX	CVE-2020-1939
	Zephyr Project	CVE-2018-1000800
Infinite loop	Mbed CoAP lib	CVE-2020-12885
	RIOT-OS	CVE-2019-15702
	NuttX	CVE-2018-20578
Memory/data leakage	Mbed CoAP lib	CVE-2020-12887
	RIOT-OS	CVE-2019-15134
	FreeRTOS	CVE-2018-16603, CVE-2018-16602
		CVE-2018-16600, CVE-2018-16599
		CVE-2018-16527, CVE-2018-16524
	TockOS	CVE-2018-1000660
Unknown	FreeRTOS	CVE-2021-32020
Use after free	FreeRTOS	CVE-2019-18178
	Zephyr Project	CVE-2017-14201
Inadequate Encryption Strength	Stm32Cube crypto extension	CVE-2020-20949
Stack underflow	Armv8-M TrustZone	CVE-2020-16273

{comment}

OAT [sun2020oat] formulates the property of operation execution integrity, in which an operation represents a sub-application code and data for a specific task (e.g., moving a robotic arm). OAT enables a remote verifier to detect if there is a control-flow or data-only attack on embedded devices. Compared to C-FLAT, OAT enables complete control-flow verification as well as attack flow reconstruction by combining hashes and execution traces in the measurement report. OAT instruments a trampoline function corresponding to conditional branches and indirect transfers to report the destination information to the trusted measurement engine, which runs in the secure state. OAT also instruments load instructions that are supposed to access the control-dependent variables, such as branch/loop condition variables, and compares the variable with the copy recorded in TrustZone to attest the data integrity. OAT was evaluated on Cortex-A with a performance overhead of 2.73% and binary size overhead of 13%. {comment} \defenseHybrid attestation: VRASED [nunes2019vrased] instantiates hardware/software remote attestation co-design aimed at low-end embedded systems, which include Cortex-M devices. The hardware module enforces access control to the prover’s key and secures the execution of the software implementation, which computes the attestation report. The security of VRASED is formally verified using linear temporal logic [cimatti2002nusmv]. The hardware module of VRASED requires 122 and 37 additional LUTs and registers respectively, and it takes 0.45 second to attest 4KB of RAM on an 8MHz device. A line of work builds on VRASED to support proof of execution, i.e., APEX [nunes2020apex], control-flow attestation, i.e., Tiny-CFA [nunes2020tiny], and data integrity attestation, i.e., DIALED [nunes2021dialed]. APEX [nunes2020apex] provides unforgeable remote proofs of execution based on VRASED. implements a verified architecture of PoX, which protects the software system even it is fully compromised, on embedded devices. It designs a hardware module that allows the verifier to request unforgeable remote proofs to show the successful execution of attested software and produce certain authenticated output additionally. It composes with VRASED [nunes2019vrased] and monitors a fixed memory region called METADATA to store the information about software execution status, including addresses/pointers to memory boundaries of the executable region, the memory boundaries of expected output, and an EXEC flag that indicates to verifier that the attested code executes successfully. APEX uses the same technique as VRASED to verify and prove its implementation. Compared with the pure VRASED, APEX incurs extra 2% registers and 12% LUTs and takes around 900ms on an 8MHz device. TinyCFA [nunes2020tiny] is designed based on APEX [nunes2020apex]. It instruments all control-flow instructions and conditional branches to log the destination to CF-Log, which stores in METADATA memory region. TinyCFA incurs 80% code size overhead and 50% run-time overhead on three real-world applications. DIALED [nunes2021dialed] uses APEX to securely log and authenticate data inputs by instrumenting the TinyCFA enabled program to log all data inputs of the program during execution along with its control-flow to the input log region. On top of TinyCFA, DIALED incurs 20% overhead of code size and run-time.

Table \thetable: Vulnerability discovery for Cortex-M firmware

\rowcolors

2whitegray!15 {threeparttable} \rowcolorwhite Project Year \cellcolorwhite \rotatebox[origin=c]90Input \cellcolorwhite \rotatebox[origin=c]90Target \cellcolorwhite \rotatebox[origin=c]90ISA \cellcolorwhitePeripheral \cellcolorwhiteModeling \cellcolorwhiteI/O \cellcolorwhiteInteraction \cellcolorwhiteSupport \cellcolorwhiteFuzzing \TBstrut4ex-2.7ex \cellcolorwhite FirmWare [firmwire] 2022 \cellcolorwhite P²IM [feng2020p] 2020 C R v7 \supportlevel10 \cellcolorwhite DICE [mera2020dice] 2020 C R v7 \supportlevel10 \cellcolorwhite Laelaps [cao2020device] 2020 B R v7 \supportlevel10 \cellcolorwhite $\mu$ Emu [zhou2021automatic] 2021 B R v7 \supportlevel10 \cellcolorwhite Jetset [jetset] 2021 \cellcolorwhite Fuzzware [fuzzware] 2022 \supportlevel10 \cellcolorwhite SEmu [semu] 2022 \multirow-9*\cellcolorwhiteLABEL:Full_firmware_rehosting: HALucinator [clements2020halucinator] 2020 \supportlevel10 \cellcolorwhite Avatar [avatar2] 2018 \cellcolorwhite Inception [cor2018] 2018 C R v7 \multirow-3*\cellcolorwhiteLABEL:Hardware-in-the-loop_rehosting: Frankenstein [Frankenstein] 2020 \cellcolorwhiteLABEL:On-device_fuzzing: $\mu$ AFL [li2022mu] 2022 B B v7 \supportlevel10 \cellcolorwhite PASAN [kim2021pasan] 2021 C R v7 \cellcolorwhite FirmXRay [wen2020firmxray] 2020 B B v7 \multirow-3*\cellcolorwhiteLABEL:Static_methods: HEAPSTER [gritti2022heapster] 2022 B B - {tablenotes} v6: Armv6-M, v7: Armv7-M, A: Cortex-A, R: Cortex-R, M: Cortex-M. \supportlevel10: Implement a specific security feature or need specific hardware support.

Table \thetable: Cortex-M MCUs and Their Security-related Features

\rowcolors

2gray!15white {threeparttable} CPU ISA \rotatebox90Max MPU Regions \rotatebox90TrustZone-M \rotatebox90Max SAU Regions \rotatebox90PACBTI \rotatebox90PXN \rotatebox90XOM \rotatebox90MSPLIM/PSPLIM \rotatebox90UP Store/Load Ins. \rotatebox90Branch Prediction \rotatebox90ETM \rotatebox90MTB \rotatebox90DWT \rotatebox90FPB \rotatebox90Year Introduced \Tstrut18ex M0 v6-M \supportlevel10 \supportlevel5 2009 M1 v6-M \supportlevel5 \supportlevel5 2007 M0+ v6-M 8 \supportlevel10 \supportlevel10 \supportlevel5 2012 M3 v7-M 8 \supportlevel10 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel10 2004 M4 v7-M 8 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel10 2010 M7 v7-M 16 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 2014 M23 v8-M Baseline 16 \supportlevel10 8 \supportlevel10 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel10 \supportlevel10 2016 M33 v8-M Mainline 16 \supportlevel10 8 \supportlevel10 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel10 \supportlevel5 2016 M35P v8-M Mainline 16 \supportlevel10 8 \supportlevel10 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel10 \supportlevel10 2018 M55 v8.1-M Mainline 16 \supportlevel10 8 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel5 2020 M85 v8.1-M Mainline 16 \supportlevel10 8 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel10 \supportlevel5 \supportlevel10 \supportlevel5 2022 {tablenotes} Whitespace: Not support, \supportlevel10: Support, \supportlevel5: Partially Support. On M23, the ETM and the MTB are exclusive to each other.

{comment}

1 Future Directions

\ziming

A section to point out interesting research directions and difficult level and possible solutions. Maybe a table here. \zimingFiner-grained stack cookies. A future work direction is collecting unbiased Cortex-M firmware dataset. 0. New attacks 1. Just implement efficient versions of existing mitigations on Cortex-M 1.1 trace-based control-flow violation 2. Redesign existing versions of mitigations for Cortex-M 3. protect platform-specific asset. PC and mobile may not have such asset. 4. What issues will we have when we implement it in Rust? performance? ret2ns Countermeasures: To mitigate this attack, one feasible way is to let the compiler to instrument checks before the BXNS and BLXNS instructions. The dynamic checks can lookup the non-secure state’s MPU to identify code privilege level at destination, and change the CONTROL_NS.nPRIV bit accordingly. The address masking can be used as static checks with less runtime overhead. To facilitate this, the privileged and unprivileged programs in non-secure world need to be reorganized in separate address spaces. Microkernel. Would this be a new direction? https://www.cs.columbia.edu/ rgu/publications/osdi22-li.pdf. In general, explore successful strategies on other platform on Cortex-M and IoT. PACCTI for cortex-m, how are they different from the Cortex-A counterpart?

2 Discussion

\thesubsection What else we can do with existing hardware features?

Besides those projects we discussed in §LABEL:sec:projects, which are mainly focusing on utilizing MPU and TrustZone, there are several hardware features on Cortex-M architecture that can be utilized to build other secure designs. For example, by learning the functionalities of MTB and ETM, we can trace and reconstruct the control flow execution. DWT provides specific data and instruction access checking, which can be used to monitor sensitive information. The newly introduced PACBTI also can be used to detect temporal memory safety violation like PTAuth [farkhani2021ptauth].

\thesubsection What else we can do to mitigate the implementation bugs?

Learn from Table \thetable, most vulnerabilities are caused by implementation bugs. To mitigate such bugs, there are several discussions.

\thesubsubsection Formal verification

Formally verified software systems can provide high assurance of isolation between applications, such as sel4 [klein2009sel4], which does not support Cortex-M processors now.

\thesubsubsection Bugs finding

The most challenge to implement bugs finding techniques of embedded software systems is how to emulate varietal peripherals and different hardware-dependent operations. Though there are several pieces of research that are seeking for efficient and accurate peripheral emulation to help firmware dynamic analysis, there still need a long way to meet the fast-growing hardware diversity.

\thesubsubsection Software fault isolation

MPU-assisted techniques always suffer from the limitation of configurable region numbers, how to build a pure but efficient software-assisted isolation and confinement is still a worthwhile discussion. Software fault isolation [tan2017principles] protects program partitions from affecting by the adversary part (see LABEL:No_memory_virtualization:). It has the untrusted fault domain (aka. compartment) and a secure external to manage those domains. SFI defines a data and code access policy to constrain the read/write access inside the data region, and a control transfers policy to keep the control-flow stay within the code region or a safe external address. But the control transfer policy still exposes to ROP attacks.

\thesubsubsection Other artificial software diversity mechanism

As discussed in LABEL:No_memory_virtualization:, ASLR used on modern operating systems is hard to deploy on Cortex-M systems because of the low entropy. Coarse-grind ASLR is another research direction. Besides ASLR, which is program-level diversity, other diversity methods such as instruction-level diversity including instructions reordering, equivalent instruction substitution, and garbage code insertion also show potential security protection.

\thesubsubsection Address or memory sanitizer

\thesubsection Attack prediction

Some Cortex-M processors (e.g., Cortex-M55) equip cache and branch prediction, which expose the attack surface to Meltdown and Spectre attacks. \futureFormally verified embedded systems: Like sel4 [klein2009sel4] \futureSoftware fault isolation: Software fault isolation [tan2017principles] protects program partitions from affecting by the adversary part (see I01). It has the untrusted fault domain (aka. compartment) and a secure external to manage those domains. SFI defines a data and code access policy to constrain the read/write access inside the data region, and a control transfers policy to keep the control-flow stay within the code region or a safe external address. But the control transfer policy still exposes to ROP attacks. \futureInstruction diversity: Discuss why other diversity does not work. Instruction level diversity includes instruction reordering, equivalent instruction substitution, and garbage code insertion, etc. \futureVirtualization: \futureHardware Security Enhancement:

Notes

To-do list: (1) Crawl as many cortex-m firmware images as possible. The idea is to generate intelligence of real-world Cortex-M systems. (2) Also think about crawling Cortex-M projects from GitHub. (3) May add back the binary trick section ans justify our claim that ”document is incomplete”. To-do list Nov 21st 2002 [] work on the big figure and [3.6] [X] Rewrite the summary for Section 4. [X] Exokernel - TrustZone based. [X] Update Section 6 with new CVEs. [X] I20 and Table 1 needs to be updated [] Section 7.7 [] Implement PSPLIM identification in Ghidra [] All the summary -

\thesubsection Adversary model

We consider any thing can be malicious. secure code is malicious. Privileged code is malicious. $\mathcal{S}_{<U,T,S>}$ Entities: (1) unprivileged software in secure, (2) privileged software in secure (do not need to differentiate handler or thread), (3) unprivileged software in non-secure, (4) privileged software in non-secure (do not need to differentiate handler or thread), (5) peripheral (including DMA). Potential problems: 1. How difficulty is it to achieve non-executable stack on Cortex-M because we do not have enough MPU regions to do so? 2. Confused deputy. A NSC function can be called by secure and the non-secure state. Then, how does the function know who call it? The origin could be very useful for some cases. 3. Multiple entry kernel by implementing kernel in the secure state privileged instead of non-secure privileged. But, app runs at (U, N). lack of abstraction, no OS, some bare-metal. Vendors take compromises on security to reduce footprint. For example, commonplace class key usage versus supporting key provisioning and per-device secrets. We should have a section to discuss deployed software security defenses in existing OS and compilers. We should compare the code generated by different compilers (especial ARMCC). Use paper [SoK: Eternal War in Memory] as an exhaust list of available software protection approaches. A comparison table about threats for x86, Cortex-A, and RISC-V. Then explain why they are unique for Cortex-M.

Bug reports

Instruction diversity: Artificial software diversity [larsen2014sok] randomizes the programs from different levels to hide the real information from attackers. (i) Instruction level diversity includes instruction reordering, equivalent instruction substitution, and garbage code insertion, etc. (ii) Basic block level diversity includes basic block reordering, opaque predicate insertion, and branch function insertion; (iii) Function level diversity includes stack layout randomization, function parameter randomization, and control flow flattening, etc.; (iv) Program level diversity includes functions reordering, randomizing the base addresses (e.g., ASLR), program encoding, data, or even library entry point;

3 Test Cases

\thesubsection Our Test Cases

A table of all test cases and how they map to different sections in the paper. \zimingWhere to output? only to LCD? some devices may not have LCD. Can we set to serial port/JTAG debug as well? To check the output from the debugging view (Keil IDE windows), we can use printf() via the ITM (debug and trace feature of Cortex-M, may need more description about ITM). Some processors did not implement ITM, we can retargeting the I/O to UART, and use the series windows via serios port. But the interrupt of UART has a lower priority than HardFault_handler or other exceptions which we want to print out the stack or register information while at the handler mode. The I/O retargeting to UART can not work when a hardware fault happens. One possible solution is to use the UART without the interrupt, such as DMA. Another solution is to change the UART interrupt priority higher than fault handlers. The later solution may have a potential security problem. Or if the device supports the LCD screen, we can also utilize the LCD to print out the debug information. \zimingwe should redesign to code to run a ton of tests in one image. If a test generates a fault and the tests cannot continue, we software reset the system. \zimingThe beginning of the test case suite should read system registers and print out the hardware configurations of the system: 1) CPU Model, frequency, SRAM/SSRAM. Also print out a banner that CactiLab developed this tool. \zimingEach test case should print out the purpose of the test, test number, the expected outcome.

\thesubsection Cortex-M projects

Table \thetable: Projects Based on Cortex-M

Target	Project	Prototype
Fuzzing	P2IM [feng2020p]	Cortex-M
	DICE [mera2020dice]	Cortex-M/MIPS M4K/M-Class
Symbolic Execution	FIE [davidson2013fie]	TI MSP430
Deep learning	MCUNet [lin2020mcunet]	Cortex-M4, M7
Neutral network	CMSIS-NN [lai2018cmsis]	Cortex-M7
Secure software	Keccak-based secure PRNG [van2014software]	Cortex-M0(+), M3, and M4(F
	Teaching real-time DSP [wickert2015using]	Cortex-M4
	Communications of Smart Meters [abbasinezhad2017ultra]	Cortex-M3
	Modular Multiplication [seo2020memory]	Cortex-M4
Secure software updating	ASSURED [asokan2018assured]	HYDRA/Cortex-M33
Cryptography	ECC [de2014ultra]	Cortex-M0+
	NewHope [alkim2016newhope]	Cortex-M0, M4
	AES [wardhani2017fast]	Cortex-M3
	Round5 [saarinen2018shorter]	Cortex-M4
	Round2 [seo2019sike]	Cortex-M4
Benchmark	BEEBS [pallister2013beebs]	Cortex-M0
	CoreMark [coremark2021]
	CoreMark-Pro [coremarkpro2021]
	BenchIoT []
	pqm4 [kannwischer2019pqm4]	Cortex-M4

\thesubsection Cortex-M TrustZone-enabled Platforms

Show a complete table of devices that can use Cortex-M TrustZone.

Table \thetable: TrustZone-enabled Platforms

Platform	SoC	Processor	Multicore	Publicly?	Price
GD32E23x []	GigaDevice	M23	single-core	Yes
GD32E235 []	GigaDevice	M33	single-core	Yes
M2351 []	NuMicro	M23	single-core	Yes
SAML11 Xplained Pro []	Microchip SAML11	M23	single-core	Yes
Renesas S1JA []	Renesas	M23
Renesas RA2A1 []	Renesas	M23
Arm MSP2+ FPGA []	Arm	M23/M33		Yes	-
Arm MSP3 FPGA []	Arm	M23/M33		Yes	-
		Cortex-M33			-

Table \thetable: multi-core

Company	Series	CPU
STM32	STM32WB55xx	M4 + M0+
NXP	i.MXx	A9/7/53 + M4/7
	LPC435x/3x/2x/1x	M4 + M0
Texas Instruments	OMAP4430, OMAP4460	A9 + M3
	OMAP5xx	A15 + M4
Xilinx	Zynq-7000	A9 + FPGA (M1/3 soft core)
Microsoft	MediaTek 3620 chip	A7 + M4

Benchmarks

Dhrystone [weicker1984dhrystone] is a synthetic systems programming benchmark to measure processor and compiler performance. It models the distribution of different types of high-level language statements, operators, operand types and locality sourced from contemporary systems programming statistics to present actual programming practice. MiBench [guthaus2001mibench] is adapted to Arm instruction set to characterize the embedded programs via instruction distribution, memory behavior, and available parallelism. It contains 35 embedded applications in C language that are divided into six suites, which target to automotive and industrial control, consumer devices, office automation, networking, security, and telecommunications. BEEBS [pallister2013beebs] measures the energy consumption of embedded devices. It can also be used for evaluating performance and code size overhead (see Silhouette [zhou2020silhouette] and PicoXOM [shen2020fast]) since it contains a wide range of embedded applications, such as AES, integer and floating-point matrix multiplications, etc. BenchIoT [almakhdhub2019benchiot] is designed to evaluate the security, performance, memory usage, and energy consumption for MUCs. The security evaluation contains minimizing privileged execution (increasing SVC cycles), enforcing memory isolation, and control-flow hijacking protection. It provides a curated set of five real-world applications that can run on both bare-metal or an OS, which includes smart light, smart locker, etc. BenchIoT supports Armv7-M now. CoreMark [coremark2021] and CoreMark-Pro [coremarkpro2021] is a processor benchmark suite that supports both high-performance processors and low-end processors. CoreMark-Pro evaluates the CPU and memory with five integer workload (e.g., JPEG image compression, SHA-256) and four floating-point workload (e.g., neural network and fast Fourier transform). The support library accompanying the Dhrystone benchmark contains both directed indirect subroutine calls, and indirect returns. \archBare-metal systems and unikernels: In the privileged application architecture all codes run at the privileged level, and the non-privileged level of the microcontroller is not utilized as shown in Figure LABEL:fig:overview(a). Both bare-metal applications, which directly run on the hardware without an operating system layer, and RTOSes that execute themselves and applications at the privileged level fall into the category of privileged application architecture. These applications are compiled and statically linked with libraries, e.g., libc, Cortex Microcontroller Software Interface Standard (CMSIS), etc., and RTOSes, e.g., Mbed bare-metal profile [MbedOS], into one big executable. This architecture provides efficient execution and is easy to implement. But, it has many security issues, which we discuss in LABEL:No_or_weak_privilege_separation:, LABEL:No_or_weak_memory_access_control;_executable_stack:, LABEL:No_or_weak_stack_separation:, LABEL:Statically_linked_executables_and_no_dynamic_linker:. \archMonolithic kernels: In the monolithic kernel architecture the RTOSes or privileged services run at the privileged level, whereas applications run at the non-privileged level as shown in Figure LABEL:fig:overview(b). In this architecture, the code base executing at the privileged level, e.g., Hardware Abstraction Layer (HAL), is significantly larger than the non-privileged application, resulting in a monolithic model. Example RTOSes that adopt this architecture include FreeRTOS with MPU enabled, Tock [levy2017multiprogramming], uClinux [uclinux], RT-thread [RTthread], and Mbed OS [MbedOS]. \archExokernels: As shown in Figure LABEL:fig:overview(c), in this architecture the size of the code executing at the privileged level is significantly reduced from LABEL:Monolithic_kernel_architecture:. Only the sensitive services and a trusted separation kernel have the highest privileged level, the unprivileged level is divided into multiple zones to support RTOSes or bare-metal applications, which are managed by the separation kernel. We will discuss a software-based virtualization system, namely Hermes [klingensmith2018hermes], in LABEL:Software-based_virtualization:, a least kernel privilege system, namely EPOXY [clements2017EPOXY], and MultiZone [pinto2020multi] in LABEL:MPU-assisted_isolation_and_confinement:. \archDual-world privileged application architecture: Architecture A01 is extended into two worlds if it utilizes TrustZone as shown in Figure LABEL:fig:overview(d). In this architecture, a secure application is loaded first, then gives control to the non-secure application. With non-secure callable functions, the non-secure application can access secure services. Most startup projects use this architecture to help developers to learn the TrustZone technique, such as the TrustZone Lab on SAM L11 Xplained Pro board [saml11demo], TrustZone Blinky project on Arm V2M MPS2+ board [mps2plusiotkit], etc. \archDual-world systems: As shown in Figure LABEL:fig:overview(e), it is the extension of architecture A02. It has more sophisticated isolation levels with privileged separation and Trusted Execution Environment (TEE). RTOSes and applications run in the non-secure state, whereas the secure services run in the secure state. Arm Trusted Firmware for Cortex-M (TF-M) [ATFM] provides a HAL to use Cortex-M TrustZone, which is a representation of this architecture. TF-M consists of (i) a secure boot module running at the privileged level to authenticate the integrity of the secure state and non-secure state images; (ii) a core module running at the secure state privileged level that controls the isolation, communication, and execution; (iii) other security services running at secure state unprivileged level including crypto, internal trusted storage, protected storage, and attestation [TFMtech]. Keil RTX5 [rtx5], Mbed OS [MbedOS], FreeRTOS [freertoskernel], RT-Thread [RTthread], Zephyr [Zephyr], etc., integrate ATF-M [ATFM]. A05 can be an advance design to build a secure software system. However, it only supports one TEE and suffers from the increasing size of TEE TCBs, which promote itself more prone to vulnerabilities [van2019tale]. \archMulti-world systems: As shown in Figure LABEL:fig:overview(f), this architecture extends the dual-world design into multiple equally-secure TEEs within the non-secure state. The trusted kernel at the secure privileged level handles non-secure environment switches and resources access control. We will discuss uTango [oliveira2021utango], one example to build multiple TEEs by utilizing Cortex-M TrustZone, in LABEL:TrustZone-assisted_multiple_TEEs:.

\thesubsubsection (

SVC misuse) \issueNo or weak stack separation: RTOSs, including FreeRTOS [freertosstack] and Zephyr [zephyrstack], support multi-tasking, so each task has its own stack. However, stack separation between the kernel and application is rarely used in bare-metal firmware. 10 samples that adopt privilege separation (discussed in LABEL:No_or_weak_privilege_separation:) leverage both the MPS- and PSP-based stacks. In addition, another 124 samples in our dataset use both the MPS- and PSP-based stacks without privilege separation. All other samples (1,663; 92.54%) only adopt a single MSP-based stack. \issueNo or weak memory access control; executable stack: Even though some Cortex-M devices have MPU, previous research believes that most real-world systems do not use it [clements2017EPOXY, zhou2019good, clements2018aces]. We confirm that 1,773 of the 1,797 firmware in our dataset do not use MPU, which means the address space in code, SRAM, and RAM is executable. For the same reason, data execution prevention (DEP) is not enforced on these systems. Without memory access control, malicious code can also read and write arbitrary memory. Out of the 24 firmware samples that use MPU in our dataset, 5 samples use the MPU defined by Arm. The remaining 19 samples use a vendor-specific implementation (i.e., Nordic’s simplified MPU (sMPU) [nordicmpu]), which only supports a subset of features defined by Arm. Specifically, sMPU only supports read and write permissions and can only divide memory into two protection domains. {comment} \issueWeak readback protection: Readback protection prevents the leakage of data on the memory and flash through the debugging interface. Besides completely disabling the hardware debug interface [sultan2020readback], some systems include specific features for readback protection. For example, sMPU can disable the debugger access to the flash memory region. However, it still allows register access and single step** of the processor, which was exploited to dump the full flash content [kris2015dum**]. This flaw was addressed in the new processors [nordicnrf52]. But for the Nordic firmware in our dataset, only 34 out of 1462 firmware have enabled the readback protection. Empirical Analysis on Real-world Firmware: Even if the aforementioned compilers offer the canary mechanism, only one of the 1,797 firmware samples in our dataset adopts it. {comment} On FreeRTOS, each task has its own stack. It has a compiler option configCHECK_FOR_STACK_OVERFLOW to enable the task stack overflow checking [freertostaskovfchk]. When switching the tasks, the RTOS kernel can check that either the processor stack pointer remains within the valid stack space, or the last 16 bytes within the valid stack range remains same when the stack was first created by the task. The latter method is stronger however less efficient than the former one. Mbed-OS uses the Keil-RTX5 RTOS kernel, which implements a software stack overflow checking that can be enabled with defining OS_STACK_CHECK [rtx5config]. During the thread switch, the kernel will check both that the current running thread stack pointer is within the stack space, and the stack magic word at bottom of the stack is intact. The stack magic word is a fixed value that defined in the kernel header file.

\thesubsubsection Missing barrial instructions

{comment}\issue

Statically linked executables and no dynamic linker: Most software systems on Cortex-M are statically linked into one big executable, namely the firmware, and no dynamic linker is available in most RTOSs. Therefore, a load-time ASLR, which has been another standard feature in modern operating systems for more than a decade, cannot be implemented. Even if boot time ASLR is possible, many IoT devices will not reboot for a very long time. {comment} (1) Cortex-M processors do not support some techniques we take for granted on x86/64 or Cortex-A, and their security features are less known. For example, virtual memory management is not available for lack of a Memory Management Unit (MMU), but Cortex-M does offer a Memory Protection Unit (MPU) that provides access control in the physical memory space. Also, Cortex-M has recently introduced its own version of TrustZone, which has different underlying mechanisms from its Cortex-A counterpart; (2) Even if Cortex-M offers some security-related features, existing embedded and IoT software systems barely use them and largely lack protection against code injection, control-flow hijack, data corruption, and other attacks. In other words, software technologies for security on such devices significantly lag behind the development of not only mobile and personal computer security but also their own hardware security offerings; In other words, software technologies for security on such devices significantly lag behind the development of not only mobile and personal computer security but also their own hardware security offerings. {comment} \issueWeak security configuration: As discussed in argXtract [sivakumaran2021argxtract], configuration information that includes device and protocols or special hardware they use, application repositories, and website interfaces, can indicate the security status and introduce vulnerabilities of the firmware binaries. For example, the man in the middle attack because the Bluetooth communication is not encrypted or authenticated [hackerbluetooth]. Default root password configuration [2880] exposes root privilege to attackers.