\addbibresource

software.bib \addbibresourcesample-base.bib \addbibresourcebiblio.bib

ASCENT: Amplifying Power Side-Channel Resilience via Learning & Monte-Carlo Tree Search

Jitendra Bhandari1*,Animesh Basak Chowdhury1*, Ozgur Sinanoglu2, Siddharth Garg1,
Ramesh Karri1, Johann Knechtel2
1New York University, USA, 2New York University Abu Dhabi, UAE
(2024)
Abstract.

Power side-channel (PSC) analysis is pivotal for securing cryptographic hardware. Prior art focused on securing gate-level netlists obtained as-is from chip design automation, neglecting all the complexities and potential side-effects for security arising from the design automation process. That is, automation traditionally prioritizes power, performance, and area (PPA), sidelining security. We propose a “security-first” approach, refining the logic synthesis stage to enhance the overall resilience of PSC countermeasures. We introduce ASCENT, a learning-and-search-based framework that (i) drastically reduces the time for post-design PSC evaluation and (ii)  explores the security-vs-PPA design space. Thus, ASCENT enables an efficient exploration of a large number of candidate netlists, leading to an improvement in PSC resilience compared to regular PPA-optimized netlists. ASCENT is up to 120x faster than traditional PSC analysis and yields a 3.11x improvement for PSC resilience of state-of-the-art PSC countermeasures.

Hardware Security, Power Side-Channel, Logic Synthesis, Design-Space Exploration, Monte Carlo Tree Search.
conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NYisbn: 978-1-4503-XXXX-X/18/06copyright: acmlicensedjournalyear: 2024doi: XXXXXXX.XXXXXXX11footnotetext: J. Bhandari and A. B. Chowdhury contributed equally to this work.

1. Introduction

Hardware implementations of cryptographic and other sensitive algorithms are well-known to be vulnerable to side-channel attacks. Kocher et. al (kocher) first demonstrated the power side-channel attack (PSCA) by exploiting variations in power profiles to extract the secret keys; various advanced PSCA versions followed throughout the years (survey-PSC). To counter such attacks, there are many ongoing efforts to enhance the PSCA resilience of hardware implementations. For example, the state-of-the-art (SOTA) countermeasures (mask-2005; Moos_Moradi_2021; 7324539) augment secret data with random noise to obscure power profiles from secret data. However, all these countermeasures incur large power, performance, and area (PPA) overheads.

Refer to caption
Figure 1. PSCA resilience post-integration of the QuadSeal countermeasure (7324539) vs. area-delay of various AES netlists.

To tackle PPA overheads, security researchers typically take the outputs of chip design automation processes—which are optimized for PPA by default—as a starting point, perform PSCA analysis, and then propose/apply countermeasures to mitigate PSCAs. However, such an approach can easily overlook circuit configurations that might be inherently more more effective to support the resilience of PSCA countermeasures, even if they have some PPA disadvantages. We demonstrate this in Figure 1, where we show the area-delay plot of various synthesized netlists versus the final PSCA resilience, for a representative AES hardware implementation with the QuadSeal countermeasure (7324539) applied (see also Sec. 2.3 for the latter).

Our findings in Figure 1 clearly demonstrates that optimizing the baseline netlist for PPA alone does not guarantee the strongest possible defense against PSCAs in the end. This highlights the need to explore alternative circuit configurations for inherently better PSCA resilience, leading to our core research questions:

  1. (1)

    What is the right circuit implementation to start with, such that—post-countermeasure application—the hardware would have the best PSCA resilience?

  2. (2)

    How can we guide chip design automation to generate such highly PSCA-resilient circuits, yet with low PPA overheads?

In this work, we systematically tackle the fact that different logic synthesis approaches can significantly impact PSCA resilience (for better or worse). This is because different optimization steps result in different netlists with varying characteristics for the type, number, driver strengths, etc. of the standard cells used. Naturally, all these directly impact the power profiles, thereby impacting the prospects of PSCAs as well. However, the related design-space exploration is a tedious, parameter-rich challenge. Furthermore, with commercial synthesis tools, we face a black-box optimization problem, where the relationship between synthesis choices and PSC resilience is difficult to model.

Refer to caption
Figure 2. High-level view on ASCENT framework for optimized PSC resilience of countermeasures.

To address these challenges, we propose ASCENT, a framework for amplifying PSC resilience via learning and Monte-Carlo tree search (MCTS) (Figure 2). Our key contributions are:

  • ASCENT provides a hybrid learning-and-search approach, enabling efficient and effective exploration of the security-centric design space. ASCENT enables us to explore 120×\times× more configurations compared to a naive search.

  • ASCENT helps us to achieve up to 3.11×\times× improvements in PSC resilience for SOTA countermeasure integration when compared to regular, PPA-optimized baseline netlists. At the same time, the PPA impact is well controlled and limited by ASCENT, namely only up to 6.61% more area.

  • We open-source ASCENT as a commitment towards reproducible research. All the algorithms, experiments, and benchmarks are publicly available at https://github.com/NYU-MLDA/scarl.git.

2. Background and Motivation

2.1. Power Side-Channel Attacks (PSCAs)

Cryptographic hardware is vulnerable to PSCAs, which exploit the fluctuations in a device’s power consumption to reveal sensitive information like keys (survey-PSC). Thus, these attacks leverage the fundamental connection between a device’s power consumption and its internal state during operations. There are different types of PSCAs, including simple power analysis (SPA), differential power analysis (DPA), and correlation power attack (CPA) (brier2004CPA).111SPA directly interprets the power profiles during specific cryptographic operations, aiming to deduce sensitive data. DPA goes further by comparing power consumption across multiple similar operations with varying inputs. This technique seeks to isolate variations in power profiles that directly correlate with the influence of the secret key. CPA employs statistical tools, most commonly the Pearson correlation coefficient (PCC), to match hypothesized power consumption patterns against the actual power measurements. The highest correlation often reveals the correct key. Note that we utilize CPA in this work; more details are provided further below.

Technology Implications. With continuous advancements for the ever-shrinking technology nodes, the threat of static power side-channel attacks (S-PSCAs) has significantly increased over the years (leakage-2010-TCAS; amstatic2014; giorgetti2007). Unlike dynamic PSCAs, which focus on power fluctuations during active computations, S-PSCAs exploit the relationship between stored data and static power consumption (leakage power). More specifically, advanced nodes utilize standard cells of various types with different power-performance characteristics, e.g., low-threshold voltage (LVT) and ultra-low threshold voltage (ULVT) cells are faster than the regular (RVT) cells, thereby hel** to meet faster timing constraints, albeit at the expense of significantly higher leakage power. LVT and ULVT cells are essential for timing closure, i.e., the careful final-stage efforts in design automation. In short, these implications highlight the need for dedicated countermeasures that protect against S-PSCAs.

Countermeasures. To combat S-PSCAs, security-aware designers can employ masking (mask-2005; bhandari2024lightweight), shuffling, and/or balancing (Moos_Moradi_2021; 7324539) schemes.222Further countermeasures and details are discussed in Section 2.2. Also note that the specific countermeasures employed for this work are discussed in Section 2.3. In general, these strategies seek to obscure the power profiles. Despite their demonstrated effectiveness, they all increase design overheads considerably, necessitating to strive a careful balance between security and PPA during the design process.

As indicated, here we tackle this challenge through our novel, learning-and-search based framework for logic synthesis.

Simulation-Based Power Analysis. This is crucial for understanding a design’s vulnerability to PSCAs before investing into actual tape-outs. Commonly utilized procedures work along the following lines; we employ such a procedure as well in this work.

First, through gate-level simulations, a value change dump (VCD) file is obtained. This captures all the relevant state information of the device under test. In combination with the post-synthesis netlist and the library files, this allows for accurate power analysis. Second, power simulation tools calculate the power consumption of each cell, including static/leakage power and internal power from input/output switching transitions. Importantly, for S-PSCA assessments, zero-delay simulations enable precise static power capture during specific operations (unlike an averaged leakage power analysis provided by full-timing simulations).

Correlation Power Analysis (CPA). This attack hinges on identifying a correlation between a device’s power consumption and the intermediate data processed throughout cryptographic operations. An attacker collects power consumption data and then hypothesizes on all possible intermediate data values, which often involve direct correlations to parts or derivatives of the secret key.

More specifically, power consumption is predicted for each hypothetical intermediate value, typically using models like Hamming weight or Hamming distance, which relate binary data representation to power. The core of CPA involves calculating the PCC between actual power measurements and the predictions for each hypothesis. The hypothesis yielding the highest correlation often represents the correct assignment for some part/derivative of the key, allowing the attacker to reconstruct the full key eventually.

2.2. Related Work

S-PSC Attacks and Countermeasures. (giorgetti2007) showed, for the first time, the potential of S-PSCAs as a severe threat. (amstatic2014) have conducted one of the first practical experiments for S-PSCAs using FPGAs, with some follow-up work presented in (7927198). (leakage-2010-TCAS) highlighted the importance of leakage power and its effect on PSCAs especially for advanced nodes. (Moos_Moradi_2021) proposed various countermeasures against S-PSCAs, albeit with considerable PPA overheads. (bhandari2023lightweight) have shown the impact of various types of standard cells on S-PSCAs. (bhandari2024lightweight) proposed a lightweight masking scheme against S-PSCAs. (Karimi_Moos_Moradi_2019) have demonstrated the important side-effect of aging for S-PSCAs in advanced technology nodes. (10.1007/978-3-319-57339-7_5) studied multivariate techniques focused on leakage power consumption to enhance cryptographic security assessments. (9040870; cryptography5030016) proposed standard-cell, delay-based dual-rail pre-charge logic (SC-DDPL) as countermeasure. However, due to its structural complexity, this countermeasure is incompatible with commercial design flows.

Design Frameworks for Advancing PSC Resilience. (karna) introduced a framework that scores and optimizes design parts to minimize PSC vulnerabilities. This approach is limited by the impractical assumption of timing slacks being ubiquitously available. It also lacks an actual PSC evaluation. (rtl_psc) proposed a framework for assessing PSC vulnerabilities at the register-transfer level (RTL), with the goal to aid countermeasure implementation. (7364404) studied circuit replication and SRAM sharing for PSC resilience in FPGAs. This approach notably increases design costs but significantly improves security while maintaining FPGA configurability. (10.1145/3488932.3517415) emphasized the importance of automated modeling for early, system-level detection of potential leaks. (Tiri2004SecureLS) proposed a methodology for so-called wave dynamic differential logic (WDDL) on FPGAs.

Summary. Prior art for S-PSC countermeasures is focused on detailed empirical studies, with only limited considerations for generalized design-time integration of the countermeasures. At the same time, prior art for frameworks is limited to D-PSCAs, not S-PSCAs, and was proposed for high-level design stages and/or for FPGAs. Thus, there is no prior art that proposes a security-first approach toward the complex, yet critical, challenge of design-space exploration for S-PSCA countermeasure integration in ASICs. Also recall the exploratory finding from Section 1. This gap provides the main motivation for our work.

2.3. Representative Countermeasures

As motivated in Sections 2.1 and 2.2, S-PSCAs are becoming ever-more relevant for advanced technology nodes, and various countermeasures have been proposed. In our study, we consider the following two representative, SOTA countermeasures against S-PSCAs.333While these two SOTA countermeasures appear similar from a high-level view, their implementation details and, thus, efficiency to hinder S-PSCAs still differ (Moos_Moradi_2021, Table 5). Importantly, ASCENT is agnostic to the countermeasures a designer likes to explore and eventually integrate.

Quadruple Algorithmic Symmetrizing (QuadSeal) (7324539). This technique can protect against both dynamic and static PSCAs. It focuses on achieving a balance in Hamming weights/distances for the cryptographic operations in hardware. It operates by quadrupling the unprotected circuit structure and balancing the arrangement of the so-called substitution boxes (S-Boxes) in three of those circuit copies in a specific manner. Additionally, it involves rotating inputs to the resulting balanced structure, to mitigate other real-world dependencies introduced by, e.g., manufacturing process variations, timing-path imbalances, aging, etc.

Exhaustive Logic Balancing (ELB) (Moos_Moradi_2021). To reduce the correlation between input data and the leakage current of standard cells, selected sensitive cells are duplicated and fed with inverted input data, which is akin to differential logic. Cell duplication is scaled based on the number of possible input vectors—a single-input cell is duplicated once, while two-input cells are quadrupled. This, along with the inverted inputs, ensures a constantly uniform input distribution across all related cells.

Refer to caption
Figure 3. An exemplary PSC-aware logic synthesis framework. Despite the focus on security, conventional flows utilize PPA-optimized netlists as starting point to integrate PSC countermeasures right-away on top (red arrows), without further feedback loops. The latter omission is due to the significant runtime taken by PSC evaluation (light-red, dotted box). Naturally, this can result in sub-optimal countermeasure integration. In contrast, our work utilizes such essential feedback (green arrows) and achieves this by an efficient learning-and-search approach (detailed in Figure 4).

2.4. Logic Synthesis

Logic synthesis transforms a high-level hardware design (e.g., in RTL) into an optimized and technology-specific gate-level netlist. This process offers significant flexibility, as any single design can be mapped to many functionally equivalent but structurally different netlists, all with distinct PPA characteristics. Toward that end, so-called recipes are devised, which are sequences of optimization steps. However, the sheer number of possible recipes and netlists makes this a complex problem, in fact a Σ2PsubscriptsuperscriptΣ𝑃2\Sigma^{P}_{2}roman_Σ start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-complete problem.

Setting. Commercial tools like Synopsys DC and Cadence Genus are tuned for PPA optimization and use proprietary optimization algorithms toward that end. Aside from scripting interfaces tailored for such PPA optimization, these tools lack direct mechanisms to tune synthesis for other objectives like PSC resilience.

Thus, research into novel optimization techniques, including our work on security-first synthesis, often relies on the open-source and customizable Yosys framework (yosys). In fact, Yosys is the most widely adopted, SOTA synthesis framework for and by academia.

AIG Representation, Generic Problem Formulation. Within Yosys, ABC (abc) is used for combinational optimization. First, ABC converts the design into a homogeneous logic-network implementation called and-inverted graph (AIG). Next, ABC’s algorithms employ various transformations at the sub-graph-level (abc). Importantly, users are free to tweak these algorithms in general and the selection and order of transformations in particular, all to optimize the AIG circuit representation according to their objectives.

More formally, in line with earlier works (chowdhury2021openabc; chowdhury2022bulls; chowdhury2024retrieval) a synthesis recipe aTsuperscript𝑎𝑇a^{T}italic_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is a sequence of T𝑇Titalic_T transformation steps operating on an AIG structure to optimize for PPA and/or other metrics (e.g. security (basak2023almost)), all while preserving the original functionality. We denote 𝒜𝒜\mathcal{A}caligraphic_A of 𝐋𝐋\mathbf{L}bold_L unique synthesis transformations, {a0,a1,,aL1}subscript𝑎0subscript𝑎1subscript𝑎𝐿1\{a_{0},a_{1},\ldots,a_{L-1}\}{ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT } (ai𝒜subscript𝑎𝑖𝒜a_{i}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_A), in a synthesis recipe aTsuperscript𝑎𝑇a^{T}italic_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Thus, the number of synthesis recipes of length T𝑇Titalic_T is 𝐋𝐓superscript𝐋𝐓\mathbf{L}^{\mathbf{T}}bold_L start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT, including repeatable transformations. This search space is denoted by 𝒜Tsuperscript𝒜𝑇\mathcal{A}^{T}caligraphic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. The problem of generating a PPA-optimal synthesis recipe for an AIG is:

(1) argmaxaT𝒜TPPA(AIGT),s.t.AIGt+1=η(AIGt,at)t[0,T1]formulae-sequencesubscriptargmaxsuperscript𝑎𝑇superscript𝒜𝑇𝑃𝑃𝐴𝐴𝐼subscript𝐺𝑇𝑠𝑡𝐴𝐼subscript𝐺𝑡1𝜂𝐴𝐼subscript𝐺𝑡subscript𝑎𝑡for-all𝑡0𝑇1\displaystyle\operatorname*{arg\,max}_{a^{T}\in\mathcal{A}^{T}}PPA(AIG_{T}),\ % \ s.t.\,\,AIG_{t+1}=\eta(AIG_{t},a_{t})\,\forall t\in[0,T-1]start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P italic_P italic_A ( italic_A italic_I italic_G start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , italic_s . italic_t . italic_A italic_I italic_G start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_η ( italic_A italic_I italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∀ italic_t ∈ [ 0 , italic_T - 1 ]

where η𝜂\etaitalic_η is the synthesis function defined as η:AIG×𝒜AIG:𝜂𝐴𝐼𝐺𝒜𝐴𝐼𝐺\eta:AIG\times\mathcal{A}\longrightarrow AIGitalic_η : italic_A italic_I italic_G × caligraphic_A ⟶ italic_A italic_I italic_G.

2.5. Monte-Carlo Tree Search

MCTS is an optimization algorithm best suited for tree-structured search-space exploration. It has been used in selected prior art for logic synthesis (yu2020flowtune; pei2023alphasyn; chowdhury2024retrieval; delorenzo2024make), albeit only for PPA optimization.

Structure. The MCTS search tree contains a root node representing the initial state (S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT). A node is called leaf if there exist an at𝒜subscript𝑎𝑡𝒜a_{t}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A that still remains unexplored and the node is terminal state. Each node preserves two attributes: (1) node visit count N(St,at)𝑁subscript𝑆𝑡subscript𝑎𝑡N(S_{t},a_{t})italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and (2) cumulative reward R(St,at)𝑅subscript𝑆𝑡subscript𝑎𝑡R(S_{t},a_{t})italic_R ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).444N(St,at)𝑁subscript𝑆𝑡subscript𝑎𝑡N(S_{t},a_{t})italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the number of times the nodes is visited during exploration. R(St,at)𝑅subscript𝑆𝑡subscript𝑎𝑡R(S_{t},a_{t})italic_R ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the total reward obtained while exploring the sub-tree rooted at that node.

MCTS operates in four stages as follows.

1) Selection: of the “most promising node” in the MCTS tree until a leaf node is reached for further exploration. The selection is based on the upper confidence tree (UCT) computation as follows:

(2) πMCTS(St)=argmaxat𝒜(R(St,at)N(St,at)Exploitation+clogat𝒜N(St)N(St,at)Exploration˙)subscript𝜋𝑀𝐶𝑇𝑆subscript𝑆𝑡subscriptargmaxsubscript𝑎𝑡𝒜subscript𝑅subscript𝑆𝑡subscript𝑎𝑡𝑁subscript𝑆𝑡subscript𝑎𝑡Exploitation𝑐˙subscriptsubscriptsubscript𝑎𝑡𝒜𝑁subscript𝑆𝑡𝑁subscript𝑆𝑡subscript𝑎𝑡Exploration\pi_{MCTS}(S_{t})=\operatorname*{arg\,max}_{a_{t}\in\mathcal{A}}\left(% \underbrace{\frac{R(S_{t},a_{t})}{N(S_{t},a_{t})}}_{\text{Exploitation}}+c\dot% {\underbrace{\sqrt{\frac{\log\sum_{a_{t}\in\mathcal{A}}N(S_{t})}{N(S_{t},a_{t}% )}}}_{\text{Exploration}}}\right)italic_π start_POSTSUBSCRIPT italic_M italic_C italic_T italic_S end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT ( under⏟ start_ARG divide start_ARG italic_R ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG end_ARG start_POSTSUBSCRIPT Exploitation end_POSTSUBSCRIPT + italic_c over˙ start_ARG under⏟ start_ARG square-root start_ARG divide start_ARG roman_log ∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG end_ARG end_ARG start_POSTSUBSCRIPT Exploration end_POSTSUBSCRIPT end_ARG )

That is, UCT computation considers exploitation and exploration.

Exploitation computes the ratio of the reward R(St,at)𝑅subscript𝑆𝑡subscript𝑎𝑡R(S_{t},a_{t})italic_R ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) accumulated over the sub-tree rooted at that node and the visits count N(St,at)𝑁subscript𝑆𝑡subscript𝑎𝑡N(S_{t},a_{t})italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). This average reward is obtained by exploring the sub-tree which, in turn, is done by picking an action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from state Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Exploration prioritizes nodes which have been less explored so far. It loosely computes the ratio of the parent’s visit count (at𝒜N(St,at)subscriptsubscript𝑎𝑡𝒜𝑁subscript𝑆𝑡subscript𝑎𝑡\sum_{a_{t}\in\mathcal{A}}N(S_{t},a_{t})∑ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )) and the current node’s visit count (N(St,at)𝑁subscript𝑆𝑡subscript𝑎𝑡N(S_{t},a_{t})italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )). Thus, the score increases if the parent node has been frequently visited whereas the current node has been less explored.

Starting from state S0subscript𝑆0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, πMCTSsubscript𝜋𝑀𝐶𝑇𝑆\pi_{MCTS}italic_π start_POSTSUBSCRIPT italic_M italic_C italic_T italic_S end_POSTSUBSCRIPT navigates the search space by selecting the “best” action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that achieves the maximum score combining exploitation and exploration terms (Equation 2), effectively performing a best-first exploration. The selection process continues until a leaf node is encountered.

2 and 3) Expansion and Rollout: Once a leaf node is selected, an action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is chosen (at random) from the set of unexplored actions. A node is added to the MCTS tree and node attributes are initialized. We call the trajectory τ𝜏\tauitalic_τ a sequence of nodes visited during MCTS selection and expansion, which is represented by {S0,(S0,a0),(S1,a1)..,(Sτ,aτ)}\{S_{0},(S_{0},a_{0}),(S_{1},a_{1})..,(S_{\tau},a_{\tau})\}{ italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) . . , ( italic_S start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) }.

4) Backpropagation: After computing the score for the terminal state STsubscript𝑆𝑇S_{T}italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, the trajectory τ𝜏\tauitalic_τ is backtracked from the leaf node till the root node. The cumulative reward R(St,at)𝑅subscript𝑆𝑡subscript𝑎𝑡R(S_{t},a_{t})italic_R ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and node visit count N(St,at)𝑁subscript𝑆𝑡subscript𝑎𝑡N(S_{t},a_{t})italic_N ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for each node in τ𝜏\tauitalic_τ are updated with R(ST)𝑅subscript𝑆𝑇R(S_{T})italic_R ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) and 1111, respectively. Next, the process repeats from Stage 1) again.

3. Problem Formulation

We believe that logic synthesis offers ample opportunities to discover inherently more S-PSCA-resilient netlist structures that can amplify the resilience of SOTA countermeasure even further. However, while traditional PPA-focused synthesis does impact S-PSCA outcomes, recall that a dedicated security-first approach is fundamentally missing (Section 2.2). We understand and emphasize that realizing such an approach requires enormous efforts in practice. This is due to two key facts:

  1. (1)

    synthesis in general is already complex and its search-space computationally expansive to explore (see Section 2.4);
    and, coming on top,

  2. (2)

    actual S-PSC evaluation, which is essential for accurate guidance for a security-first synthesis method, incurs significant further computation cost (see further below).

Figure 3 illustrates this problem for an exemplary, PSC-aware synthesis framework. Next, we formalize this problem in general. Subsequently, we evidence the practical challenges outlined above in more detail. We also indicate on the techniques we utilize to address these. Finally, we provide the specific problem formulation, leading to the proposed ASCENT framework.

Refer to caption
Figure 4. The ASCENT framework. ➊ Train a PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT predictor for ultra-fast PSC evaluation of synthesized netlists. ➋ MCTS-based search-space exploration to obtain a synthesis recipe that maximizes PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT. ➌ use the obtained recipe for security-first synthesis, apply the countermeasure of choice on top, and validate PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT of the final netlist.

Formulation. Our goal is to find a netlist that maximizes S-PSCA resilience after application of SOTA S-PSCA countermeasure. To address these challenges, we formulate security-first synthesis as an optimization problem, guided by a Markov decision process (MDP), with distinct states, actions, transitions, and rewards.

  • State Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at step t𝑡titalic_t is the AIG of the design D𝐷Ditalic_D after applying a partial synthesis recipe of length t𝑡titalic_t. AIG0𝐴𝐼subscript𝐺0AIG_{0}italic_A italic_I italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the initial AIG extracted from D𝐷Ditalic_D. The terminal state AIGT𝐴𝐼subscript𝐺𝑇AIG_{T}italic_A italic_I italic_G start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is the AIG generated after applying a synthesis recipe of maximum length T𝑇Titalic_T.

  • Actions 𝒜𝒜\mathcal{A}caligraphic_A is the set of L𝐿Litalic_L functionality-preserving transformations {a0,a1,,aL1}subscript𝑎0subscript𝑎1subscript𝑎𝐿1\{a_{0},a_{1},\ldots,a_{L-1}\}{ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT } (ai𝒜subscript𝑎𝑖𝒜a_{i}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_A) provided by a synthesis tool.

  • State transition η(St+1|St,at)𝜂conditionalsubscript𝑆𝑡1subscript𝑆𝑡subscript𝑎𝑡\eta(S_{t+1}|S_{t},a_{t})italic_η ( italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the transformation by applying action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT on state Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT resulting in state St+1subscript𝑆𝑡1S_{t+1}italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT. Here, the transition function yields deterministic AIG.

  • Reward expresses the S-PSCA resilience as so-called PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT, i.e., the number of power traces required for successful key extraction, of the post-countermeasure netlist. We consider a delayed reward model and assign zero reward to every action until we reach terminal state STsubscript𝑆𝑇S_{T}italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

We define the overall problem as PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT maximization:

(3) argmaxaT𝒜TPTscore(𝒞(ST)),s.t.St+1=η(St,at)t[0,T1]formulae-sequencesubscriptargmaxsubscript𝑎𝑇superscript𝒜𝑇𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒𝒞subscript𝑆𝑇𝑠𝑡subscript𝑆𝑡1𝜂subscript𝑆𝑡subscript𝑎𝑡for-all𝑡0𝑇1\displaystyle\operatorname*{arg\,max}_{a_{T}\in\mathcal{A}^{T}}PT_{score}(% \mathcal{C}(S_{T})),\,\,s.t.\,\,S_{t+1}=\eta(S_{t},a_{t})\,\,\forall t\in[0,T-1]start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( caligraphic_C ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ) , italic_s . italic_t . italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_η ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∀ italic_t ∈ [ 0 , italic_T - 1 ]

where 𝒞𝒞\mathcal{C}caligraphic_C is the countermeasure applied on the synthesized netlist.

Practical Challenges. As indicated, there are critical barriers in terms of computational complexity associated with this problem. For example, the search space for synthesis in general is of complexity LTsuperscript𝐿𝑇L^{T}italic_L start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where L=13𝐿13L=13italic_L = 13 and T=18𝑇18T=18italic_T = 18,555This exemplary choice of L=13𝐿13L=13italic_L = 13 and T=18𝑇18T=18italic_T = 18 is in line with the length of synthesis recipes and unique synthesis transformations available for Yosys’ compress2rs recipe. is approximately 1019similar-toabsentsuperscript1019\sim 10^{19}∼ 10 start_POSTSUPERSCRIPT 19 end_POSTSUPERSCRIPT.

As our problem has a non-analytical form and no closed-form solution is available, we must rely on gradient-free optimization methods. This mandates means for inexpensive reward evaluation. However, our experimentation shows that running an accurate PSC attack evaluation on post-countermeasure netlists requires up to 100k test vectors, which takes 6absent6\approx 6≈ 6 hours of simulation runtime. Thus, even when evaluating only 100 samples using any gradient-free optimizer, the process would take 25absent25\approx 25≈ 25 days.

These important observations raise the following questions toward computationally-efficient S-PSC-aware logic synthesis:

  • Given a runtime budget, how can we quickly, yet accurately, evaluate S-PSC attacks for some post-countermeasure netlist?

  • How can we efficiently explore the search space of synthesis recipes to obtain S-PSC-resilient netlists?

Outline of Method. Addressing these challenges necessitate an optimization approach that balances search efficiency with accurate S-PSCA assessment. This leads us to the design of ASCENT (Section 4), which employs a hybrid learning-and-search strategy. More specifically, ASCENT utilizes (i) a zero-shot predictor PT^score(𝒞(ST),θ)subscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒𝒞subscript𝑆𝑇𝜃\hat{PT}_{score}(\mathcal{C}(S_{T}),\theta)over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( caligraphic_C ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , italic_θ ) to significantly speed up the PSC evaluation, without loss of accuracy, and (ii) MCTS (Section 2.5) to explore the large and complex search space for security-first synthesis in an effective and efficient manner.

Extended Formulation. Assume a predictor PT^score(𝒞(ST),θ)subscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒𝒞subscript𝑆𝑇𝜃\hat{PT}_{score}(\mathcal{C}(S_{T}),\theta)over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( caligraphic_C ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , italic_θ ), which predicts the number of power traces required for PSCAs for a design 𝐃𝐃\mathbf{D}bold_D synthesized using a T𝑇Titalic_T-length recipe. Then, we seek to solve the following problem:

(4) argmaxaT𝒜TPT^score(𝒞(ST),θ),s.t.St+1=η(St,at)t[0,T1]formulae-sequencesubscriptargmaxsubscript𝑎𝑇superscript𝒜𝑇subscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒𝒞subscript𝑆𝑇𝜃𝑠𝑡subscript𝑆𝑡1𝜂subscript𝑆𝑡subscript𝑎𝑡for-all𝑡0𝑇1\displaystyle\operatorname*{arg\,max}_{a_{T}\in\mathcal{A}^{T}}\hat{PT}_{score% }(\mathcal{C}(S_{T}),\theta),\,\,s.t.\,\,S_{t+1}=\eta(S_{t},a_{t})\,\,\forall t% \in[0,T-1]start_OPERATOR roman_arg roman_max end_OPERATOR start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( caligraphic_C ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , italic_θ ) , italic_s . italic_t . italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_η ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∀ italic_t ∈ [ 0 , italic_T - 1 ]

where θ𝜃\mathbf{\theta}italic_θ represents the predictor’s parameters. In simple terms, the predictor shall serve as a computationally efficient surrogate for direct S-PSCA evaluation.

4. ASCENT Framework

We solve the problem formulation in Equation 4 in three steps, as illustrated in Figure 4. Next, we outline these three steps.

First, we provide an exploratory experiment which clearly demonstrates that maximizing the S-PSCA resilience of a netlist post-countermeasure integration, i.e., maximizing PTscore(𝒞(ST))𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒𝒞subscript𝑆𝑇PT_{score}(\mathcal{C}(S_{T}))italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( caligraphic_C ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ), is the same as maximizing the PTscore(ST)𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒subscript𝑆𝑇PT_{score}(S_{T})italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). Therefore, we train a zero-shot PT^scoresubscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒\hat{PT}_{score}over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT predictor using PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values obtained from diverse synthesized netlists. We show that it suffices to start with some representative data points to train such zero-shot predictor.

Second, we employ the zero-shot predictor as S-PSCA evaluator for the MCTS-based exploration of the search space. This predictor within MCTS is essential to drive the sequential decision-making process. Note that we also improve the predictor on the fly: we simulate the actual power traces for the top-k𝑘kitalic_k and bottom-k𝑘kitalic_k recipes during the MCTS process and accordingly fine-tune the predictor.

Third, we put all parts together and conduct an end-to-end optimization process, including a final validation of the PSC resilience by actual PSC evaluation after the optimization process is done.

4.1. Zero-Shot Predictor (➊)

To obtain a zero-shot predictor, we have to collect historical S-PSCA data in a one-time effort. Key challenges here are (i) high runtime cost of S-PSCA evaluation even for such one-time effort and (ii) ensuring a representative dataset of diverse netlists with varying S-PSCA resilience. Next, we discuss how we tackle these challenges and finally outline the actual training approach.

4.1.1. Pre- vs Post-Countermeasure Evaluation

We observe a monotonic relationship between pre- and post-countermeasure PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values. Figure 5 and Figure 6 show PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values for 50 randomly synthesized netlists pre- and post-countermeasure application for the two representative countermeasures of choice.666The gaps between 6k and 7k for the baseline/pre-countermeasure netlists are only due to chance, i.e., none of the 50 random recipes provide scores in that ranges.

Importantly, this confirms that aiming for pre-countermeasure resilience suffices for a guided design-space exploration toward best post-countermeasure resilience. This observation helps to significantly limit the runtime cost for obtaining training data. In fact, running S-PSCA evaluations on pre-countermeasure netlists provides a 7.2x speed-up: it takes only 40–50 minutes as compared to around 6 hours for post-countermasure netlists. This is due to the lower resilience of the pre-countermeasure netlists; S-PSCAs can find the correct key with less traces and in shorter time.

Refer to caption
Figure 5. PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT on AES before and after applying QuadSeal.
Refer to caption
Figure 6. PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT on AES before and after applying ELB.

4.1.2. Dataset Diversity

We are inspired by the sampling approaches described in (bai2023towards). These works inspire us to focus on exploring netlists that result in diverse scores. Without loss of generality, we utilize simulated annealing (SA) toward this end. Note that we tune the annealing scheduling for more diversity during exploration.

4.1.3. Training of the Predictor

Our predictor is a regressor model. It uses the pre-countermeasure netlists, three handcrafted features, and PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values (obtained by actual S-PSCA evaluation) as labels. The features are: (1) Overall Diversity: count of the various cell types found in the synthesized netlists; (2, 3) Specific Diversity: percentage of area consumed by LVT (2) and HVT (3) cells. We trained and evaluated various predictor versions, exploring a wide range of other hand-crafted features (e.g., area, delay, etc.) as well as more synthesized design and more varied recipes. However, the best performance, as in the best prediction of post-countermeasure resilience based on pre-countermeasure resilience, was obtained for the three features above.

4.2. Monte-Carlo Tree Search & Fine-Tuning (➋)

Having established a fast and accurate zero-shot predictor, we now have to integrate it into an search strategy for maximizing PSCA resilience. \AcfMCTS, with its ability to intelligently balance exploration and exploitation, is a natural fit for this complex task. Its delayed reward model aligns with our problem.

4.2.1. MCTS Implementation

We employ MCTS as outlined in Sections 2.5 and 3. More details are provided next.

We implement the critical reward component as follows. We assign normalized PTscore(ST)𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒subscript𝑆𝑇PT_{score}(S_{T})italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) values, obtained from the zero-shot predictor PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT, to a terminal state STsubscript𝑆𝑇S_{T}italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

(5) PTscorenorm(ST)={0if PTscorePTthresholdPTscorePTthresholdotherwise𝑃subscriptsuperscript𝑇𝑛𝑜𝑟𝑚𝑠𝑐𝑜𝑟𝑒subscript𝑆𝑇cases0if 𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒𝑃subscript𝑇𝑡𝑟𝑒𝑠𝑜𝑙𝑑𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒𝑃subscript𝑇𝑡𝑟𝑒𝑠𝑜𝑙𝑑otherwisePT^{norm}_{score}(S_{T})=\begin{dcases}0&\text{if }PT_{score}\leq PT_{% threshold}\\ \frac{PT_{score}}{PT_{threshold}}&\text{otherwise}\end{dcases}italic_P italic_T start_POSTSUPERSCRIPT italic_n italic_o italic_r italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = { start_ROW start_CELL 0 end_CELL start_CELL if italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT ≤ italic_P italic_T start_POSTSUBSCRIPT italic_t italic_h italic_r italic_e italic_s italic_h italic_o italic_l italic_d end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT end_ARG start_ARG italic_P italic_T start_POSTSUBSCRIPT italic_t italic_h italic_r italic_e italic_s italic_h italic_o italic_l italic_d end_POSTSUBSCRIPT end_ARG end_CELL start_CELL otherwise end_CELL end_ROW

In plain words, we compare the predicted scores against a user-defined threshold (PTthreshold𝑃subscript𝑇𝑡𝑟𝑒𝑠𝑜𝑙𝑑PT_{threshold}italic_P italic_T start_POSTSUBSCRIPT italic_t italic_h italic_r italic_e italic_s italic_h italic_o italic_l italic_d end_POSTSUBSCRIPT) and assign 00 reward if it is less than the threshold. This reward formulation allows MCTS to skip any unpromising paths in the search space. The reward for promising paths are normalized so that the UCT computation maintains the balance for exploitation and exploration.

For the expansion and rollout stages, note the following. A sequence of actions are taken which in turn synthesize the netlist until the terminal state is reached. Then, the PT^scoresubscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒\hat{PT}_{score}over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT predictor is used to obtain PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values of the netlist. A new node is added and assigned the updated R(St,at)𝑅subscript𝑆𝑡subscript𝑎𝑡R(S_{t},a_{t})italic_R ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) vale.

4.2.2. Online Fine-Tuning of Predictor

As indicated, we simulate the actual power traces for the top-k𝑘kitalic_k and bottom-k𝑘kitalic_k recipes during the MCTS process and accordingly fine-tune the predictor PT^scoresubscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒\hat{PT}_{score}over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT. This is done to ensure the predictor is continually updated with relevant corner-case data points, all without hampering the ongoing search-space exploration. For efficiency, we conduct the S-PSCA evaluation for these netlist in parallel.

4.2.3. Justification of MCTS

We note that reinforcement learning (RL)-based methods like (hosny2020drills; zhu2020exploring) are gaining significant attention for advancing logic synthesis. Important key differences for our work over these works are as follows, making MCTS uniquely suitable.

In RL-based methods, the asynchronous actor-critic approach encounters challenges in delayed reward systems. Thus, RL-based methods tuned for PPA optimization typically assign immediate rewards, like reduced depth of AIGs, which also correlate well with the final PPA. However, there is no such direct correlation between AIGs and S-PSCA resilience. Therefore, we cannot utilize immediate rewards and, thus, chose MCTS. In fact, its notion of back-propagation (Section 2.5) helps to accurately estimate average rewards even for intermediate nodes in the search tree.

4.3. Integration and End-to-End Validation (➌)

After completion of the MCTS process, we obtain the best synthesis recipe using a greedy approach following standard procedures (most-rewarding-node selection and most-visited-node selection). This recipe is expected to generate the most S-PSCA resilient design post-countermeasure application. Finally, we synthesize the circuit using this recipe, apply the countermeasure of choice, and run an actual S-PSCA evaluation to report the final PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT.

5. Experiments

5.1. Setup

AES Implementation. We use non-masked Electronic Code Book (ECB) mode AES with S-Box as a look-up table. ECB mode pads data until the length of the block which is 128.

We use a commercial 28nm technology library, focusing on the TT corner (25 degrees Celsius, 0.9V), and using different VT cells.

S-PSCA Setup. We implement a C++-based CPA attack following (knechtel22_PSC). It incrementally increases the number of traces through coarse and thorough sampling for 128 trials, aiming for a 90% success rate with thorough sampling. In other words, the attack is thoroughly assessed in multiple runs, not only one-shot trials.

The total number of traces depends on the case study; for baseline AES, up to 10K traces are sufficient, whereas QuadSeal and ELB countermeasures on top require 50K and 100K traces, respectively.

PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT Predictor. We collected a corpus of 1,000 random, different synthesis recipes for the baseline AES, with features extracted as in Section 4. This took us \approx30 days. We split the dataset into 80% and 20% for training and testing, respectively.

We used XGBoost (chen2016xgboost), a scalable and distributed, gradient-boosted decision tree (GBDT) library to implement the predictor model. We trained our model on an AMD EPYC 7542 server equipped with 128 CPUs and 1TB RAM, running Red Hat Enterprise Linux Server Release 7.9 (Maipo). The CAD flow utilized both open-source and commercial tools, including Synopsys VCS M-2017.03-SP1 for RTL and gate-level simulations, Yosys for logic synthesis, and Synopsys PrimeTime PX M-2017.06 for power simulations.

ASCENT Framework. We developed ASCENT in-house by implementing MCTS algorithm and plugged it with PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT . We compute the handcrafted features generated from the synthesized netlist and pass it to PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT which provides quick feedback with a latency of 100 ms. We use Yosys (v0.38) to perform end-to-end synthesis which takes roughly similar-to\sim2.8 minutes on a single thread CPU run on our Intel server (Frequency: 2.3GHz, Memory: 256GB).

5.2. Results

5.2.1. PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT Predictor

Refer to caption
Figure 7. Performance of PT^scoresubscript^𝑃𝑇𝑠𝑐𝑜𝑟𝑒\hat{PT}_{score}over^ start_ARG italic_P italic_T end_ARG start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT predictor on test data.

Figure 7 shows our predictor’s performance on the test datasets, where the x-axis and y-axis denote “Ground Truth” and “Predicted Scores”, respectively. The Root Mean Squared Error (RMSE) score for the model on the test datasets is 565.58. However, recall that the predictor will be continuously improved by fine-tuning to further enhance its performance.

5.2.2. ASCENT Framework

Table 1. Comparison of PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT across recipes. Shown are only recipes with maximal PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT.
Synthesis recipes Baseline AES AES + QuadSeal AES + ELB
PT Area Power Delay PT Area Power Delay PT Area Power Delay
score (μ𝜇\muitalic_μm2) (mW) (ns) score (μ𝜇\muitalic_μm2) (mW) (ns) score (μ𝜇\muitalic_μm2) (mW) (ns)
compress2rs 2800 12391 0.65 0.28 12500 16430 0.83 0.31 28000 24321 1.31 0.36
resyn2rs 2900 12380 0.62 0.3 13000 16712 0.86 0.33 29000 24627 1.39 0.39
SA 4900 12542 0.73 0.31 17500 17123 0.91 0.34 42000 25242 1.48 0.41
MCTS 5300 13723 0.76 0.31 18500 18204 0.93 0.35 47000 26111 1.54 0.43
ASCENT 9100 13155 0.74 0.3 42500 18097 0.92 0.35 87000 25928 1.53 0.44

We started with running the S-PSCA evaluation for the compress2rs recipe, which we are considering as the baseline. This will be compared against for all the methods used in this work. The recipe for compress2rs is provided in (github). We consider three different settings for our experiments, namely {AES, AES+QuadSeal, AES+ELB}, where AES denotes the baseline without countermeasures, AES+QuadSeal denotes AES with the countermeasure QuadSeal, and AES+ELB represents AES with countermeasure ELB incorporated (Section 2.3).

Table 1 details our results and comparison across the various methods studied in this work. To better understand the results, we have compared the PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT and overheads with the results from the baseline compress2rs run. For example for the baseline AES, we would compare the results of each method with the corresponding values from compress2rs. Here we observed {1.04×\times×, 1.04×\times×, 1.04×\times×} improvement in the PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values, along with an additional {-0.09%, 1.7%, 1.26%} area, {-4.6%, 3.61%, 6.11%} power, and {7.14%, 6.45%, 8.33%} delay overhead across those 3 different scenarios, respectively. Thus, with the introduction of the countermeasures, we can see that the PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT is improved; this is expected.

We then utilize the same black-box optimizer that was used for our predictor model, SA, to also explore the security-first search space. Allowing a timeout of 3 days, there is an improvement across all three scenarios: {1.75×\times×, 1.4×\times×, 1.5×\times×} higher scores, albeit at an overhead of {1.22%, 4.22%, 3.79%} area, {12.31%, 9.64%, 12.98%} power, and {10.71%, 9.68%, 13.89%} delay, respectively.

We then utilized our ASCENT framework to obtain the synthesis recipes with maximized PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT values. Here, an increase in resilience of {1.89×\times×, 1.48×\times×, 1.68×\times×}, along with {10.75%, 10.78%, 7.36%} area, {16.92%, 12.05%, 17.56%} power, and {10.71%, 12.90%, 19.44%} delay overheads, respectively, are obtained.

Table 2. Runtime performance achieved by ASCENT and competitive methods. Iterations denote the number of times the full S-PSCA evaluation can be run. We set a common timeout of 72 hours for fair comparison.
Method Iterations Speed-up
SA (AES with countermeasure) 12 1.0×\times×
MCTS (AES with countermeasure) 12 1.0×\times×
SA (baseline AES) 108 9.0×\times×
MCTS (baseline AES) 107 8.3×\times×
ASCENT 1460 121.3×\times×

Finally, we used our full ASCENT framework to more effectively explore the design space, again with timeout of 3 days. Recall that, thanks to the predictor model, the time-consuming actual S-PSCA evaluation can be bypassed in the MCTS exploration phase. We pick the top-3 recipes in terms of max PTscore𝑃subscript𝑇𝑠𝑐𝑜𝑟𝑒PT_{score}italic_P italic_T start_POSTSUBSCRIPT italic_s italic_c italic_o italic_r italic_e end_POSTSUBSCRIPT and run actual S-PSCA evaluation on them, for final validation. From Table 1, it can be seen that, relative to the baseline, we achieved on average across these top-3 recipes {3.25×\times×, 3.40×\times×, 3.11×\times×} higher PSC resilience with an overhead of only {6.16%, 10.15%, 6.61%} area, {13.85%, 10.84%, 16.79%} power, and {7.14%, 12.90%, 22.22%} delay, respectively.

Table 2 reports the runtime comparisons of various methods. The first two cases, { SA (AES with countermeasures), MCTS (AES with countermeasures) } require running the S-PSCA evaluation end-to-end, with countermeasures in place. These required to collect 50k and 100k traces for QuadSeal and ELB, respectively, compared to only 10k traces for the baseline AES. Thus, 5×\times× and 10×\times× more time is required, respectively. For the remaining two cases, namely the baseline AES for SA versus MCTS, we obtained a speedup of 9.0×\times× and 8.3×\times×, respectively. In short, ASCENT is able to explore a much larger design space much more quickly when compared to other methods; e.g., 120×\times× faster than the default black-box optimizer SA.

6. Conclusion

In this work, we proposed ASCENT, a novel synthesis framework. We have successfully enhanced the final resilience of existing power side-channel countermeasures by carefully guided, yet efficient, exploration of the complex search space. In fact, ASCENT enables a 3.11×3.11\times3.11 × post-countermeasure improvement when compared to conventional synthesis (tailored for PPA optimization). For future work, we plan to tailor ASCENT to harden circuits also against other threats like fault injection.

\printbibliography