Practical Boolean Decomposition for Delay-driven LUT Map**

Alessandro Tempia Calvino1, Alan Mishchenko2, Giovanni De Micheli1, Robert Brayton2 1Integrated Systems Laboratory, EPFL, Lausanne, Switzerland
2Department of EECS, University of California, Berkeley, USA
Abstract

Ashenhurst-Curtis decomposition (ACD) is a decomposition technique used, in particular, to map combinational logic into lookup tables (LUTs) structures when synthesizing hardware designs. However, available implementations of ACD suffer from excessive complexity, search-space restrictions, and slow run time, which limit their applicability and scalability. This paper presents a novel fast and versatile technique of ACD suitable for delay optimization. We use this new formulation to compute two-level decompositions into a variable number of LUTs and enhance delay-driven LUT map** by performing ACD on the fly. Experiments with heavily optimized benchmarks show an average delay improvement of 12.3912.39\mathbf{12.39}bold_12.39% and an area reduction of 2.202.20\mathbf{2.20}bold_2.20% compared to state-of-the-art LUT map**, with affordable run time. Additionally, our method improves the best-known delay for 𝟒4\mathbf{4}bold_4 benchmarks in the EPFL synthesis competition.

Index Terms:
Logic synthesis, Boolean decomposition, technology map**, FPGA

I Introduction

Ashenhurst-Curtis decomposition (ACD) [1, 2], also known as Roth-Karp decomposition [3], is a powerful technique that finds a decomposition of a Boolean function into a set of sub-functions and a composition function with reduced support. ACD finds applications in logic optimization and technology map**. The noteworthy use cases of ACD are in map** into standard cells [4] and field-programmable gate arrays (FPGA) [5], decomposition of multi-valued relations [6], and encoding of multi-valued networks [7].

Traditional applications rely on the original formulation of ACD [1, 2, 3], breaking the input variables into two groups: the bound set (BS) and the free set (FS). Other approaches to ACD [5] allow for a shared set (SS) when one or more LUTs in terms of the BS variables are single-variable functions (buffers). The larger the SS size, the fewer LUTs are required. For instance, Figure 1 shows an ACD of a function with BS, FS, and SS resulting in three 5555-input LUTs. In [5], maximizing the SS is implemented using binary decision diagrams (BDDs) [8]. More recently, truth-table-based implementations eliminated the need for explicitly constructing a BDD, resulting in a faster decomposition [9, 10].

ACD has been applied to map into fixed lookup table (LUT) structures [10] as a way to mitigate structural bias and improve the quality of standard LUT map**. This approach utilizes heuristic variable re-ordering to find an ACD, supporting up to 1111 SS variable. Additionally, ACD has been used in post-map** resynthesis [9], when logic cones composed of several LUTs are collapsed into single-output Boolean functions and re-expressed using fewer LUTs. The authors proposed to use disjoint-support decomposition (DSD) and Shannon’s expansion to pack logic into LUTs while supporting up to 3333 SS variables.

Bound setShared setFree setL1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTL2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTL3subscript𝐿3L_{3}italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTx0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTx1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTx4subscript𝑥4x_{4}italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTx0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTx1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTx5subscript𝑥5x_{5}italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTx6subscript𝑥6x_{6}italic_x start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPTx7subscript𝑥7x_{7}italic_x start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT
Figure 1: ACD of an 8888-input Boolean function into three 5555-input LUTs with a 5555-variable bound set (BS), a 1111-variable shared set (SS), and a 2222-variable free set (FS).

Since ACD is often applied only to functions up to 11111111 or 16161616 inputs (for LUT structures composed of two or three 6666-LUTs, respectively), state-of-the-art LUT map** is performed through local substitutions applied to an initial graph representation, called subject graph. Generally, delay-optimal map** w.r.t. the subject graph is feasible in polynomial time [11], while area-optimal map** is NP-hard [12]. However, the structure of the subject graph highly impacts the result. This phenomenon is known as structural bias. To mitigate structural bias, methods in the literature generate a set of structural choices (or decompositions) available during map** [13, 14, 15].

This paper offers two main contributions. First, we revisit the formulation of ACD with SS to enhance its computationally efficiency in LUT mappers and post-map** resynthesis engines performing delay optimization. Based on the ideas presented in [16], our algorithm is truth-table-based and flexible in the number of FS, BS, and SS variables, and in the number of BS functions. Our ACD runs up to 2222x faster, compared to [10], and up to 80808080x faster, compared to [9] when performing decompositions into two 6666-LUTs. Furthermore, it also finds considerably more solutions.

Second, we use ACD for the delay optimization of LUT networks. The idea is to compute functional decompositions using the timing-critical variables in the FS and the rest of the variables in the BS and SS. We integrate our ACD into the state-of-the-art LUT mapper for delay optimization. To our knowledge, this is the first practical and scalable work that uses ACD for delay-driven LUT map**.

We experimentally evaluate the performance of ACD and compare map** based on Boolean decomposition against state-of-the-art methods:

  1. 1.

    We compare our ACD method against other decomposition methods in ABC, showing better quality with a competitive or better run time.

  2. 2.

    We demonstrate that map** with ACD can efficiently mitigate structural bias and considerably reduce the delay. We compare the default LUT mapper in ABC, the LUT mapper with Boolean decomposition in ABC, and the proposed mapper with integrated ACD. We show that map** with ACD outperforms the other mappers in delay by 7.52%percent7.527.52\%7.52 % on average with and without structural choices [15]. Moreover, we show that an additional map** round using the network obtained by ACD as a structural choice can further improve the delay, compared to the standard LUT mapper, by 12.3912.3912.3912.39% with an area reduction of 2.202.202.202.20%.

  3. 3.

    We present 4444 new best results in the EPFL competition.

II Preliminaries

This section introduces the basic notations and background related to logic networks, decomposition, and LUT map**.

II-A Definitions

A Boolean function is a map** from a k𝑘kitalic_k-dimensional Boolean space into a 1111-dimensional one: {0,1}k{0,1}superscript01𝑘01\{0,1\}^{k}\rightarrow\{0,1\}{ 0 , 1 } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → { 0 , 1 }.

A truth table representation of a k𝑘kitalic_k-input Boolean function f:{0,1}k{0,1}:𝑓superscript01𝑘01f:\{0,1\}^{k}\rightarrow\{0,1\}italic_f : { 0 , 1 } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → { 0 , 1 } can be encoded as a bit string b=bl1b0𝑏subscript𝑏𝑙1subscript𝑏0b=b_{l-1}\dots b_{0}italic_b = italic_b start_POSTSUBSCRIPT italic_l - 1 end_POSTSUBSCRIPT … italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, i.e., a sequence of bits, of length l=2k𝑙superscript2𝑘l=2^{k}italic_l = 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. A bit bi{0,1}subscript𝑏𝑖01b_{i}\in\{0,1\}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } at position 0i<l0𝑖𝑙0\leq i<l0 ≤ italic_i < italic_l is equal to the value taken by f𝑓fitalic_f under the input assignment a=(a0,,ak1)𝑎subscript𝑎0subscript𝑎𝑘1\vec{a}=(a_{0},\dots,a_{k-1})over→ start_ARG italic_a end_ARG = ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) where

2k1ak1++20a0=i.superscript2𝑘1subscript𝑎𝑘1superscript20subscript𝑎0𝑖2^{k-1}\cdot a_{k-1}+\dots+2^{0}\cdot a_{0}=i.2 start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ⋅ italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + ⋯ + 2 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⋅ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_i .

The positive cofactor of a Boolean function f𝑓fitalic_f with respect to a variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, represented as fxisubscript𝑓subscript𝑥𝑖f_{x_{i}}italic_f start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, is the Boolean function obtained by setting xi=1subscript𝑥𝑖1x_{i}=1italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1. Similarly, the negative cofactor fx¯isubscript𝑓subscript¯𝑥𝑖f_{\bar{x}_{i}}italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the Boolean function obtained by setting xi=0subscript𝑥𝑖0x_{i}=0italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.

In the classical representation, we refer to the leftmost input column of a truth table as the most significant variable (ak1)a_{k-1})italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) and the rightmost input column as the least significant variable (a0)a_{0})italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). A swap of two variables results in the interchange of the corresponding two-variable cofactors, thereby altering the truth table.

Figure 2 depicts two truth tables represented as bit strings, one in binary and one in hexadecimal. Notably, the rightmost truth table can be derived from the leftmost one by swap** the variables x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Marked next to both truth tables are the cofactors with respect to two most significant variables.

x2subscript𝑥2{x_{2}}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx1subscript𝑥1{x_{1}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx0subscript𝑥0{x_{0}}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTf𝑓{f}italic_f00{0}00{0}00{0}11{1}100{0}00{0}11{1}100{0}00{0}11{1}100{0}11{1}100{0}11{1}111{1}100{0}11{1}100{0}00{0}11{1}111{1}100{0}11{1}111{1}111{1}111{1}100{0}00{0}11{1}111{1}111{1}111{1}1f=10110101𝑓10110101f=10110101italic_f = 10110101fx¯1x¯2subscript𝑓subscript¯𝑥1subscript¯𝑥2f_{\bar{x}_{1}\bar{x}_{2}}italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTfx1x¯2subscript𝑓subscript𝑥1subscript¯𝑥2f_{x_{1}\bar{x}_{2}}italic_f start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTfx¯1x2subscript𝑓subscript¯𝑥1subscript𝑥2f_{\bar{x}_{1}x_{2}}italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTfx1x2subscript𝑓subscript𝑥1subscript𝑥2f_{x_{1}x_{2}}italic_f start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPTx0x2subscript𝑥0subscript𝑥2x_{0}\leftrightarrow x_{2}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↔ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx0subscript𝑥0{x_{0}}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTx1subscript𝑥1{x_{1}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2{x_{2}}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTf𝑓{f}italic_f00{0}00{0}00{0}11{1}100{0}00{0}11{1}111{1}100{0}11{1}100{0}11{1}100{0}11{1}111{1}100{0}11{1}100{0}00{0}00{0}11{1}100{0}11{1}111{1}111{1}111{1}100{0}00{0}11{1}111{1}111{1}111{1}1f=𝑓absentf=italic_f = 0xA7777fx¯0x¯1subscript𝑓subscript¯𝑥0subscript¯𝑥1f_{\bar{x}_{0}\bar{x}_{1}}italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTfx¯0x1subscript𝑓subscript¯𝑥0subscript𝑥1f_{\bar{x}_{0}x_{1}}italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTfx0x¯1subscript𝑓subscript𝑥0subscript¯𝑥1f_{x_{0}\bar{x}_{1}}italic_f start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPTfx0x1subscript𝑓subscript𝑥0subscript𝑥1f_{x_{0}x_{1}}italic_f start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT
Figure 2: Truth table representations and their encoding, cofactor extraction w.r.t. the two most significant variables, and variable swap** of x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

A completely specified Boolean function f𝑓fitalic_f essentially depends on a variable v𝑣vitalic_v if there exists an input combination such that the value of the function changes when the variable is toggled (fv=1𝑓𝑣1\frac{\partial f}{\partial v}=1divide start_ARG ∂ italic_f end_ARG start_ARG ∂ italic_v end_ARG = 1). The support of f𝑓fitalic_f is the set of all variables on which function f𝑓fitalic_f essentially depends. The supports of two functions are disjoint if they do not contain common variables. A set of functions is disjoint if their supports are pair-wise disjoint.

A Boolean network is modeled as a directed acyclic graph (DAG) with nodes represented by Boolean functions. The sources of the graph are the primary inputs (PIs), the sinks are the primary outputs (POs). For any node n𝑛nitalic_n, the fanins of n𝑛nitalic_n is a set of nodes driving n𝑛nitalic_n, i.e. nodes that have an outgoing edge towards n𝑛nitalic_n. Similarly, the fanouts of n𝑛nitalic_n is a set of nodes driven by node n𝑛nitalic_n, i.e., nodes that have an incoming edge from n𝑛nitalic_n. A k𝑘kitalic_k-LUT network is a Boolean network composed of k𝑘kitalic_k-input lookup tables (k𝑘kitalic_k-LUTs) capable of realizing any k𝑘kitalic_k-input Boolean function. An and-inverter graph (AIG) [17] is a Boolean network where nodes are 2222-input ANDs and edges may implement inverters.

A cut C𝐶Citalic_C of a Boolean network is a pair (n𝑛nitalic_n, 𝒦𝒦\mathcal{K}caligraphic_K), where n𝑛nitalic_n is a node called root, and 𝒦𝒦\mathcal{K}caligraphic_K is a set of nodes, called leaves, such that 1) every path from any PI to node n𝑛nitalic_n passes through at least one leaf and 2) for each leaf v𝒦𝑣𝒦v\in\mathcal{K}italic_v ∈ caligraphic_K, there is at least one path from a PI to n𝑛nitalic_n passing through v𝑣vitalic_v and not through another leaf. The size of a cut is the number of leaves. A cut is k𝑘kitalic_k-feasible if its size does not exceed k𝑘kitalic_k.

II-B Ashenhurst-Curtis decomposition

Ashenhurst-Curtis decomposition (ACD) [1, 2, 3], of a single-output Boolean function f𝑓fitalic_f can be expressed as follows:

f(xbs,xss,xfs)=g(h(xbs,xss),xss,xfs),𝑓subscript𝑥𝑏𝑠subscript𝑥𝑠𝑠subscript𝑥𝑓𝑠𝑔subscript𝑥𝑏𝑠subscript𝑥𝑠𝑠subscript𝑥𝑠𝑠subscript𝑥𝑓𝑠f(\vec{x}_{bs},\vec{x}_{ss},\vec{x}_{fs})=g(\vec{h}(\vec{x}_{bs},\vec{x}_{ss})% ,\vec{x}_{ss},\vec{x}_{fs}),italic_f ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_b italic_s end_POSTSUBSCRIPT , over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_s italic_s end_POSTSUBSCRIPT , over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT ) = italic_g ( over→ start_ARG italic_h end_ARG ( over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_b italic_s end_POSTSUBSCRIPT , over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_s italic_s end_POSTSUBSCRIPT ) , over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_s italic_s end_POSTSUBSCRIPT , over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT ) , (1)

where xbssubscript𝑥𝑏𝑠\vec{x}_{bs}over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_b italic_s end_POSTSUBSCRIPT is the bound set (BS), xsssubscript𝑥𝑠𝑠\vec{x}_{ss}over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_s italic_s end_POSTSUBSCRIPT is shared set (SS), and xfssubscript𝑥𝑓𝑠\vec{x}_{fs}over→ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT is the free set (FS). These sets are disjoint variable subsets, which together form the support of f𝑓fitalic_f. The function h\vec{h}over→ start_ARG italic_h end_ARG may be multi-output with the number of outputs less than the BS size. The single-output functions in h\vec{h}over→ start_ARG italic_h end_ARG are referred to as BS functions. The function g𝑔gitalic_g is referred to as the composition function. When decomposing into k𝑘kitalic_k-LUTs, the composition function is typically chosen to fit into one k𝑘kitalic_k-input LUT. Figure 1 shows an ACD of an 8888-input function into three 5555-input LUTs with a 5555-variable BS, a 1111-variable SS, and a 2222-variable FS. The decomposition generates two BS functions (L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, L3subscript𝐿3L_{3}italic_L start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT) and a composition function (L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) with 5555 inputs.

II-C FPGA technology map**

LUT map** is the process of expressing a Boolean network in terms of k𝑘kitalic_k-input lookup tables (k𝑘kitalic_k-LUTs). Before map**, the network is represented as a k-bounded Boolean network called the subject graph, which contains nodes with a maximum fanin size of k. The AIG is the most common subject graph representation. The subject graph is transformed into a mapped network by applying local substitutions to sections of the circuit defined by cuts, which are computed using cut enumeration [18]. A LUT mapper computes a map** solution by selecting a subset of the cuts that cover the subject graph while minimizing a cost function. The state-of-the-art LUT mapper computes cuts and refines the map** solution in several map** passes using heuristics based on delay, area, and edge count. For further details, refer to [19].

III Improvements to ACD

This section discusses a fast and versatile truth-table-based implementation of ACD for single-output functions with support for a shared set. We propose several novelties that make ACD practical within LUT mappers and resynthesis methods. Figure 3 illustrates the ACD computation. The BS, SS, FS, and the number of BS functions used are flexible and determined during the decomposition. The composition function (L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) is implemented as a multiplexer of cofactors with respect to BS functions and the shared set. Functions dependent on the FS (gi)g_{i})italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), called FS functions, are the data inputs of the multiplexer found inside the composition function. BS functions and the shared set are instead the selection inputs.

This definition of decomposition reflects the one used by previous approaches [5]. Specifically, the decomposition is generic, i.e., it includes other types of decomposition. For instance, a Shannon’s expansion:

f=xfx+x¯fx¯,𝑓𝑥subscript𝑓𝑥¯𝑥subscript𝑓¯𝑥f=xf_{x}+\bar{x}f_{\bar{x}},italic_f = italic_x italic_f start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + over¯ start_ARG italic_x end_ARG italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT ,

where x𝑥xitalic_x is a selector of a multiplexer, can be re-expressed in our ACD format:

f=fxfx¯1+fxf¯x¯x+f¯xfx¯x¯+f¯xf¯x¯0,𝑓subscript𝑓𝑥subscript𝑓¯𝑥1subscript𝑓𝑥subscript¯𝑓¯𝑥𝑥subscript¯𝑓𝑥subscript𝑓¯𝑥¯𝑥subscript¯𝑓𝑥subscript¯𝑓¯𝑥0f=f_{x}f_{\bar{x}}1+f_{x}\bar{f}_{\bar{x}}x+\bar{f}_{x}f_{\bar{x}}\bar{x}+\bar% {f}_{x}\bar{f}_{\bar{x}}0,italic_f = italic_f start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT 1 + italic_f start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT italic_x + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT 0 ,

where x𝑥xitalic_x is a FS variable, fxsubscript𝑓𝑥f_{x}italic_f start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and fx¯subscript𝑓¯𝑥f_{\bar{x}}italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT are BS functions, and FS fuctions gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are 1111, x𝑥xitalic_x, x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG, and 00.

In this section, we first present how to efficiently check the existence of a feasible ACD and assign variables to the FS, BS, and SS (Section III-A). Next, we show how to compute the decomposition while minimizing the number of BS functions and their support (Section III-B).

00000000010101011010101011111111f𝑓fitalic_fL1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTBound setShared setFree setL2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTg0subscript𝑔0g_{0}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPTg1subscript𝑔1g_{1}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTg2subscript𝑔2g_{2}italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTg3subscript𝑔3g_{3}italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
Figure 3: Illustrating the AC decomposition of a Boolean function

III-A Finding a feasible decomposition

After defining the properties of ACD, in this section we present an efficient method to check the existence of a Boolean decomposition and find an assignment of support variables to the FS and the BS (and SS). In particular, we focus on decomposition into two levels of k𝑘kitalic_k-input LUTs. For simplicity, in this section we consider SS variables a part of the BS.

The first step to derive a decomposition is to partition of variables into FS and BS. Given a truth table, our approach enumerates different free sets. Let N𝑁Nitalic_N be the number of variables in the support of a function to decompose. Let P𝑃Pitalic_P be the number of variables to consider in the FS. The remaining NP𝑁𝑃N-Pitalic_N - italic_P variables are considered in the BS. The number of different free sets is (NP)binomial𝑁𝑃\binom{N}{P}( FRACOP start_ARG italic_N end_ARG start_ARG italic_P end_ARG ). Regarding the choice of value P𝑃Pitalic_P when searching for a feasible two-level decomposition, for an N𝑁Nitalic_N-input function and k𝑘kitalic_k-input LUTs, it is convenient to consider (Nk𝑁𝑘N-kitalic_N - italic_k) variables in the FS, so that at most k𝑘kitalic_k variables are considered in the BS. For instance, when N=8𝑁8N=8italic_N = 8 and k=6𝑘6k=6italic_k = 6, we can choose P=2𝑃2P=2italic_P = 2 and evaluate 87/2=28872288\cdot 7/2=288 ⋅ 7 / 2 = 28 different 2222-variable free sets.

For each FS, the truth table is transformed to have the FS variables as the least significant ones, compared to the BS variables. The variable reordering is performed using a dedicated procedure, which swaps two variables. Note that when enumerating all the free sets the first FS composed of the P least significant variables in the support of the function does not need variable swap** since the original truth table already reflects this order. Then, every consecutive FS can be derived from a previous FS by swap** one variable in xfssubscript𝑥𝑓𝑠x_{fs}italic_x start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT with one in xbssubscript𝑥𝑏𝑠x_{bs}italic_x start_POSTSUBSCRIPT italic_b italic_s end_POSTSUBSCRIPT. The complexity to explore all the FS is of (NP)binomial𝑁𝑃\binom{N}{P}( FRACOP start_ARG italic_N end_ARG start_ARG italic_P end_ARG ) swap operations. Figure 2 shows how a variable swap affects the truth table.

Each input assignment to the BS variables selects one P𝑃Pitalic_P-input function in terms of the FS variables. Specifically, each P𝑃Pitalic_P-input function is a cofactor with respect to xbssubscript𝑥𝑏𝑠x_{bs}italic_x start_POSTSUBSCRIPT italic_b italic_s end_POSTSUBSCRIPT. From a truth table in this format, FS functions are easily computed by extracting groups of 2Psuperscript2𝑃2^{P}2 start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT bits at i2P𝑖superscript2𝑃i\cdot 2^{P}italic_i ⋅ 2 start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT offsets with i[0,2(NP))𝑖0superscript2𝑁𝑃i\in[0,2^{(N-P)})italic_i ∈ [ 0 , 2 start_POSTSUPERSCRIPT ( italic_N - italic_P ) end_POSTSUPERSCRIPT ). Informally, FS functions are listed next to each other. Figure 2 graphically depicts the extraction of cofactors with respect to the two most significant variables.

Example 1: Let us consider the 6666-variable function represented in hexadecimal format as a truth table f=𝑓absentf=italic_f = 0x8804800184148111. Let us assume that the FS variables are the two least significant variables and the BS variables are the four most significant ones. The functions in terms of FS variables have truth tables with 2P=22=4superscript2𝑃superscript2242^{P}=2^{2}=42 start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 4 bits. There are 2(NP)=16superscript2𝑁𝑃162^{(N-P)}=162 start_POSTSUPERSCRIPT ( italic_N - italic_P ) end_POSTSUPERSCRIPT = 16 of them, corresponding to hexadecimal digits in the truth table (0x8, 0x8, 0x0, 0x4, etc). \triangle

The target function can be realized using M𝑀Mitalic_M bound set functions if the number of unique FS functions, known as column multiplicity μ𝜇\muitalic_μ, does not exceed 2Msuperscript2𝑀2^{M}2 start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, hence Mlog2(μ)𝑀subscript2𝜇M\geq\lceil\log_{2}(\mu)\rceilitalic_M ≥ ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_μ ) ⌉. If P+Mk𝑃𝑀𝑘P+M\leq kitalic_P + italic_M ≤ italic_k, the composition function can be implemented as a k𝑘kitalic_k-LUT.

Example 2: Continuing Example 1, there are 16161616 FS functions of which only 4444 are unique. The FS functions are 0x8, 0x0, 0x4, and 0x1. Hence, the column multiplicity μ=4𝜇4\mu=4italic_μ = 4, which needs at least M=log2(4)=2𝑀subscript242M=\lceil\log_{2}(4)\rceil=2italic_M = ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 4 ) ⌉ = 2 BS functions. Hence, this partition of variables into FS and BS produces a feasible support-reducing decomposition into 4444-input LUTs. Using Figure 3, ACD assigns FS functions to gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, two BS functions of at most 4444 inputs are necessary to select the correct FS function. \triangle

We employ the enumeration of free sets while counting the number of unique cofactors to check if a support-reducing decomposition exists. In practice, a sufficient condition for a 2222-level decomposition is to have M+Pk𝑀𝑃𝑘M+P\leq kitalic_M + italic_P ≤ italic_k and NPk𝑁𝑃𝑘N-P\leq kitalic_N - italic_P ≤ italic_k, i.e., the composition function is k𝑘kitalic_k-feasible, and the number of remaining variables in the BS does not exceed k𝑘kitalic_k.

After identifying a partition of variables into FS and BS, and the corresponding unique FS functions, our method uses the techniques in Section III-B to produce a decomposition while minimizing the number of BS functions and their support.

III-B Functional encoding and support minimization

Once a partition of variables into FS and BS with a feasible decomposition is found, the BS functions are extracted by assigning each FS function to an encoding. Informally, an encoding represents the assignment of FS functions to the data inputs of the MUX of Figure 3 (e.g., the encoding of g1subscript𝑔1g_{1}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is 01010101). While any encoding that distinguishes FS functions is a valid solution, a good encoding also minimizes the number of BS functions required (by maximizes the shared set), and the functional support. In particular, it is crucial to find an encoding that minimizes the support for three reasons. First, if NP>k𝑁𝑃𝑘N-P>kitalic_N - italic_P > italic_k, by minimizing the support, each BS function would ideally fit into a k𝑘kitalic_k-LUT, and the decomposition is feasible in 2222 levels. Second, minimizing the support maximizes the shared set (buffer BS functions), reducing the number of required LUTs. Third, the number of edges required is reduced, hel** routability. Finding a feasible encoding is similar to solving constrained encoding problems [20, 21, 22].

An encoding is an assignment of a code T=tM1t0𝑇subscript𝑡𝑀1subscript𝑡0T=t_{M-1}\dots t_{0}italic_T = italic_t start_POSTSUBSCRIPT italic_M - 1 end_POSTSUBSCRIPT … italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT of length M𝑀Mitalic_M to each FS function. A variable tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT takes one of the three values, 1111, 00, or --, indicating the ON-set, OFF-set, and DC-set, respectively. Let i-sets be the set of μ𝜇\muitalic_μ Boolean functions in terms of the BS variables encoding FS functions using one-hot encoding. Precisely, an i-set represents one FS function and takes value 1111 when an input assignment to the BS variables results in the corresponding FS function.

Example 3: Using Example 2, the i-set corresponding to the FS function 0x8 is 1100100010001000 in binary format. Note that the truth table has NP𝑁𝑃N-Pitalic_N - italic_P variables and has value 1111 when the original function is 0x8. \triangle

I-sets are used to derive a more compact encoding with a two-step procedure. The first one enumerates candidate BS functions. The second one solves a unate covering problem in which columns are candidate BS functions and rows are pairs of FS functions to be distinguished.

Candidate BS functions are functions depending on BS variables whose output can used as tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to encode FS functions. They are enumerated by combining i-sets. To leverage all the functional degrees of freedom of a strict encoding, i-sets in a BS candidate can be either in the ON-set, OFF-set, or don’t-care (DC) set. Since candidate BSs are used as select inputs of a multiplexer, BS candidates can distinguish elements in the ON-set (takes value 1111) against elements in the OFF-set (takes value 00). In encoding problems, BS functions are called dichotomies, while the pairs of functions to be distinguished are referred to as seed dichotomies [22]. Don’t-cares in BS candidates are also important to minimize the support, which translates into fewer LUT edges.

Example 4: Continuing Example 3, let us consider the candidate bound set function hhitalic_h that has the i-sets {0x8, 0x1} in the ON-set and the i-set {0x4} in the OFF-set. Its function in binary format is h=absenth=italic_h =11-01–110101111 where “-” is a don’t care. When h=11h=1italic_h = 1, either 0x8 or 0x1 are selected. When h=00h=0italic_h = 0, 0x4 is selected. The corresponding dichotomy is {{0x8, 0x1},{0x4}}. In this case, function hhitalic_h distinguishes 0x8 from 0x4 and 0x1 from 0x4, covering the two seed dichotomies {{0x8},{0x4}} (or {{0x4},{0x8}}) and {{0x1},{0x4}} (or {{0x4},{0x1}}). \triangle

A candidate bound set function is generated by assigning each i-set to be in the ON-set, OFF-set, or DC-set. Hence, the total number of possible BS candidates is 3μsuperscript3𝜇3^{\mu}3 start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT. Nonetheless, some BS candidates are interchangeable, i.e., one candidate can be obtained by swap** the ON-set and the OFF-set of another BS candidate. Our enumeration removes these symmetries by fixing one i-set to be only in the ON-set or DC-set, enumerating only 23μ12superscript3𝜇12\cdot 3^{\mu-1}2 ⋅ 3 start_POSTSUPERSCRIPT italic_μ - 1 end_POSTSUPERSCRIPT BS candidates. Moreover, candidates not distinguishing any pair of FS functions are removed. As a special case, if μ𝜇\muitalic_μ is a power of 2222, the number of possible BS candidates reduces to (MM/2)/2binomial𝑀𝑀22\binom{M}{M/2}/2( FRACOP start_ARG italic_M end_ARG start_ARG italic_M / 2 end_ARG ) / 2 by splitting the FS functions to be equally distributed between ON-set and OFF-set, i.e., each BS candidate must distinguish half of the FS functions against the other half.

One limitation of this method is that the number of BS candidates is exponentially dependent on the column multiplicity. However, we may further reduce the number of BS candidates when it is too large. In particular, for an ACD into 6666-LUTs the maximum column multiplicity to support is 16161616. Consequently, the highest number of BS candidates is 9.59.59.59.5 million for μ=15𝜇15\mu=15italic_μ = 15. To maintain a reasonable number of BS candidates, our method does not use don’t cares for problems with μ>8𝜇8\mu>8italic_μ > 8, enumerating 2μ1superscript2𝜇12^{\mu-1}2 start_POSTSUPERSCRIPT italic_μ - 1 end_POSTSUPERSCRIPT candidates and reducing the highest number of candidates to 16161616 thousand. Through experimentation, we have observed that imposing this limitation scarcely affects the quality of the encoding, while substantially enhancing run-time efficiency. Conversely, extending this method to lower multiplicity values noticeably compromises the solution quality.

Each BS candidate function is associated with a cost that depends on the number of variables in its support. The number of variables is computed with a special procedure that considers don’t cares. Then, a covering table is constructed by having all the pairs of FS functions to be distinguished (seed dichotomies) as rows and the BS candidates as columns. A row-column entry (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) is 1111 if the BS candidate of column j𝑗jitalic_j distinguishes the seed dichotomy i𝑖iitalic_i. A solution that minimizes the support is computed by solving a minimum-cost covering problem [22]. The solution must cover all the rows while minimizing the cost. We use greedy covering followed by local search to compute cost-minimizing cover. A single iteration of greedy covering extracts one column covering the most non-covered rows while minimizing the cost. The process is iterated until a solution is found. Then, the solution is iteratively improved by replacing one column with another having a lower cost.

Example 5: Figure 4 shows a covering table reflecting the examples in this section. Each column in the table is a candidate BS function shown as a truth table in hexadecimal format on 4444 variables. Each BS candidate has a cost based on the number of variables on its support. Each row is a seed dichotomy. An element (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) in the table is 1111 if the BSj distinguishes the seed dichotomy i𝑖iitalic_i. The best solution with cost 6666 takes the second and third columns and results in two BS functions depending on 3333 variables. \triangle

44{4}433{3}333{3}3C9AFC9AF{\text{C}9\text{AF}}C 9 AF11771177{1177}117727272727{2727}2727{{0x8},{0x0}}0x80x0{\{\{0\text{x}8\},\{0\text{x}0\}\}}{ { 0 x 8 } , { 0 x 0 } }11{1}100{0}11{1}1{{0x8},{0x4}}0x80x4{\{\{0\text{x}8\},\{0\text{x}4\}\}}{ { 0 x 8 } , { 0 x 4 } }11{1}111{1}100{0}{{0x8},{0x1}}0x80x1{\{\{0\text{x}8\},\{0\text{x}1\}\}}{ { 0 x 8 } , { 0 x 1 } }00{0}11{1}111{1}1{{0x0},{0x4}}0x00x4{\{\{0\text{x}0\},\{0\text{x}4\}\}}{ { 0 x 0 } , { 0 x 4 } }00{0}11{1}111{1}1{{0x0},{0x1}}0x00x1{\{\{0\text{x}0\},\{0\text{x}1\}\}}{ { 0 x 0 } , { 0 x 1 } }11{1}111{1}100{0}{{0x4},{0x1}}0x40x1{\{\{0\text{x}4\},\{0\text{x}1\}\}}{ { 0 x 4 } , { 0 x 1 } }11{1}100{0}11{1}1
Figure 4: Covering table to solve the encoding problem.

Given a solution, an encoding of the FS functions is obtained by assigning a code T=tM1t0𝑇subscript𝑡𝑀1subscript𝑡0T=t_{M-1}\dots t_{0}italic_T = italic_t start_POSTSUBSCRIPT italic_M - 1 end_POSTSUBSCRIPT … italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, in which each variable tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to a selected BSi candidate.

Example 6: Continuing Example 5, a minimum cover involves BS=0{}_{0}=start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT = 0x1177, by taking 0x4 and 0x1 in the ON-set, and BS=1{}_{1}=start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT = 0x2727 by taking 0x0 and 0x1 in the ON-set. Given the BS functions, the encoding of the FS functions assigns the following codes to gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in Figure 3: T0x8=subscript𝑇0x8absentT_{0\text{x}8}=italic_T start_POSTSUBSCRIPT 0 x 8 end_POSTSUBSCRIPT = 00, T0x4=subscript𝑇0x4absentT_{0\text{x}4}=italic_T start_POSTSUBSCRIPT 0 x 4 end_POSTSUBSCRIPT = 01, T0x0=subscript𝑇0x0absentT_{0\text{x}0}=italic_T start_POSTSUBSCRIPT 0 x 0 end_POSTSUBSCRIPT = 10, and T0x1=subscript𝑇0x1absentT_{0\text{x}1}=italic_T start_POSTSUBSCRIPT 0 x 1 end_POSTSUBSCRIPT = 11. Finally, the composition function is computed using the FS and its encoding, resulting in function 0x1048 when represented in hexadecimal format. Consequently, the function has been successfully decomposed using three 4444-LUTs. \triangle

IV Technology map** with ACD

In this section, we leverage the Ashenhurst-Curtis decomposition (ACD) methods described in Section III to improve the delay of LUT networks. ACD can be used in two ways: 1) as part of LUT map** or 2) as a post-map** resynthesis method to compact logic and decrease the delay. In this work, we focus on the former usage since it has more flexibility and optimization opportunities. Although post-map** resynthesis is not covered in this work, its implementation would follow a methodology similar to [9]. First, this section discusses how to perform delay-oriented functional decomposition for any number of FS variables and BS functions. Then, it describes the integration of ACD in a technology mapper.

IV-A Delay-oriented ACD

Let us consider a node n𝑛nitalic_n in a k𝑘kitalic_k-LUT network and a cut C𝐶Citalic_C rooted in n𝑛nitalic_n that contains leaves in the input sub-network of n𝑛nitalic_n. Among all the leaves, some are timing-critical and some are not. Let D𝐷Ditalic_D be the latest arrival delay of a leaf in C𝐶Citalic_C. We use ACD to find an implementation that realizes the function of cut C𝐶Citalic_C with delay D+1𝐷1D+1italic_D + 1 where |C|>k𝐶𝑘|C|>k| italic_C | > italic_k, assuming a unit-delay model. Specifically, we use the timing-critical leaves of C𝐶Citalic_C in the FS and other non-critical ones in the BS or SS. This transformation may reduce the worst delay of a LUT network when applied on the critical path.

The ACD-based transformation is performed in two steps. First, our method verifies the existence of a delay-minimizing decomposition. Second, if a decomposition exists, it solves the encoding problem and returns a solution.

IV-A1 Checking the existence of a decomposition

Algorithm 1 shows the procedure evaluate to check the existence of an ACD. The algorithm receives the function represented as a truth table tt𝑡𝑡ttitalic_t italic_t of a large cut with size N𝑁Nitalic_N where N>k𝑁𝑘N>kitalic_N > italic_k. Set S𝑆Sitalic_S contains a list of timing-critical variables with delay D𝐷Ditalic_D. First, the truth table is transformed to have critical variables as the least significant ones since they must be in the FS (at line 1). The proposed approach limits NPk𝑁𝑃𝑘N-P\leq kitalic_N - italic_P ≤ italic_k to ensure a two-level decomposition without solving the encoding problem. Hence, the number of variables in the FS must be at least PNk𝑃𝑁𝑘P\geq N-kitalic_P ≥ italic_N - italic_k, and P|S|𝑃𝑆P\geq|S|italic_P ≥ | italic_S | to include all the delay-critical variables (at line 1). For each FS of Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT variables, the column multiplicity value is computed using the method described in Section III-A, and the smallest one is returned (at line 1). In this case, since delay-critical variables are always part of the FS, (NPi|S|)binomial𝑁subscript𝑃𝑖𝑆\binom{N}{P_{i}-|S|}( FRACOP start_ARG italic_N end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - | italic_S | end_ARG ) different combinations are enumerated. If the smallest multiplicity found can be implemented using at most kPi𝑘subscript𝑃𝑖k-P_{i}italic_k - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT BS functions, a delay-minimizing ACD exists. In this case, variables in the FS have the delay increase of 1111 while other variables have the delay increase of 2222 (at line 1). If, on the other hand, a decomposition with Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT does not exist, the function is not decomposable.

The loop in line 1 begins checking the existence of a decomposition with a smaller value of P𝑃Pitalic_P. This approach is based on the theoretical property that if a function is not decomposable for the given value of P𝑃Pitalic_P, it is also not decomposable for P+1𝑃1P+1italic_P + 1. Then, if a decomposition exists, the loop attempts to increase the number of variables in the free set. Specifically, maximizing the free set to include non-critical variables has multiple benefits. Primarily, the decomposition would have a reduced column multiplicity, which simplifies the encoding problem. Additionally, maximizing the free set may increase the required time of the associated non-critical signals, facilitating the area-recovery process of technology map**.

1 Input  : Truth table tt𝑡𝑡ttitalic_t italic_t, LUT size k𝑘kitalic_k, Late vars set S𝑆Sitalic_S
2 Output: Propagation delay
3
4reorder_variables(tt𝑡𝑡ttitalic_t italic_t, S𝑆Sitalic_S);
5 μbestsubscript𝜇𝑏𝑒𝑠𝑡\mu_{best}\leftarrow\inftyitalic_μ start_POSTSUBSCRIPT italic_b italic_e italic_s italic_t end_POSTSUBSCRIPT ← ∞;
6 xfssubscript𝑥𝑓𝑠x_{fs}\leftarrow\emptysetitalic_x start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT ← ∅;
7 for Pimax(P_{i}\leftarrow\max(italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← roman_max (num_vars(tt)k𝑡𝑡𝑘(tt)-k( italic_t italic_t ) - italic_k, |S|)|S|)| italic_S | ) to k1𝑘1k-1italic_k - 1 do
8       {μ,xfs}𝜇superscriptsubscript𝑥𝑓𝑠absent\{\mu,x_{fs}^{\prime}\}\leftarrow{ italic_μ , italic_x start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } ← compute_smallest_multiplicity(tt𝑡𝑡ttitalic_t italic_t, Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, |S|𝑆|S|| italic_S |);
9       if μ2kPi𝜇superscript2𝑘subscript𝑃𝑖\mu\leq 2^{k-P_{i}}italic_μ ≤ 2 start_POSTSUPERSCRIPT italic_k - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and μ<μbest𝜇subscript𝜇𝑏𝑒𝑠𝑡\mu<\mu_{best}italic_μ < italic_μ start_POSTSUBSCRIPT italic_b italic_e italic_s italic_t end_POSTSUBSCRIPT then
10             μbestμsubscript𝜇𝑏𝑒𝑠𝑡𝜇\mu_{best}\leftarrow\muitalic_μ start_POSTSUBSCRIPT italic_b italic_e italic_s italic_t end_POSTSUBSCRIPT ← italic_μ;
11             xfsxfssubscript𝑥𝑓𝑠superscriptsubscript𝑥𝑓𝑠x_{fs}\leftarrow x_{fs}^{\prime}italic_x start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT;
12             continue ;
13            
14      break ;
15      
16
17if μbestsubscript𝜇𝑏𝑒𝑠𝑡\mu_{best}\neq\inftyitalic_μ start_POSTSUBSCRIPT italic_b italic_e italic_s italic_t end_POSTSUBSCRIPT ≠ ∞  then
18       return compute_propagation_delay(tt𝑡𝑡ttitalic_t italic_t, xfssubscript𝑥𝑓𝑠x_{fs}italic_x start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT);
19      
20return infinite_propagation_delay();
Algorithm 1 ACD evaluation

IV-A2 Computing the decomposition

After applying evaluate, another procedure decompose is used to compute the actual decomposition using the methods described in Section III-B.

IV-B LUT map** with ACD

The methods described in Section IV-A have been integrated into the LUT map** algorithm in [19]. Each map** iteration computes k𝑘kitalic_k-feasible cuts rooted in nodes of the subject graphs and selects one best cut for each node based on cost functions and slack. Typically, enumerated cuts are k𝑘kitalic_k-feasible, i.e., any cut abstracts a k𝑘kitalic_k-LUT. In our implementation, cut enumeration computes large cuts up to size k<l11𝑘𝑙11k<l\leq 11italic_k < italic_l ≤ 11, where l𝑙litalic_l is provided by the user. During cut enumeration, the mapper computes cut functions as truth tables. For the non-k𝑘kitalic_k-feasible computed cuts, the mapper uses Algorithm 1 to check the existence of a delay-minimizing decomposition into k𝑘kitalic_k-LUTs. If a decomposition is not feasible, the cut is discarded. If a decomposition exists, the cut delay is computed using the propagation delay returned by Algorithm 1. The area is computed pessimistically, neglecting the existence of a shared set, i.e., Area=log2μ+1𝐴𝑟𝑒𝑎subscript2𝜇1Area=\lceil\log_{2}{\mu}\rceil+1italic_A italic_r italic_e italic_a = ⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_μ ⌉ + 1. To have precise area information, i.e., the number of required LUTs, ACD would need to solve the encoding problem and compute the decomposition. However, experimentally, not running the decomposition on the fly reduces the run time considerably with negligible impact on the final circuit area.

The mapper uses l𝑙litalic_l-feasible cuts with ACD in the delay map** pass, while it uses k𝑘kitalic_k-feasible cuts in the following area recovery iterations. Note that area-recovery aims at improving the solution over non-critical paths and can always re-use the best cuts from the previous pass, such that the required times are met. After the last map** pass, a cover is generated consisting of k𝑘kitalic_k- and l𝑙litalic_l-feasible cuts. At this stage, the mapper decomposes the non-k𝑘kitalic_k-feasible cuts into k𝑘kitalic_k-LUTs.

V Experiments

This section presents an experimental evaluation of the proposed LUT map** with ACD. First, the ACD algorithm proposed in this paper is compared with other state-of-the-art methods for decomposing practical functions. Then, we evaluate ACD for delay-driven LUT map**. While the experiments are reported for 6666-input LUTs, similar improvements have been obtained for 4444-input LUTs as well.

The proposed methods have been implemented in ABC [23]. For our experiments, we use the EPFL combinational benchmark suite [24] containing several circuits provided as and-inverter graphs (AIGs). The baseline has been obtained using the commands and scripts “dfraig; resyn; resyn2; resyn2rs; if -y -K 6; resyn2rs” in ABC, which perform a high-effort size and depth AIG optimization. In particular, it combines SAT swee** [25], scripts for delay-oriented AIG optimization [17], and lazy man’s logic synthesis [26], which is the most aggressive depth minimization command in ABC. The experiments have been conducted on an Intel i5555 quad-core 2222GHz on MacOS. The results have been verified using combinational equivalent checkering in ABC. We extended the LUT mapper if in ABC to perform ACD as discussed in Section IV. The following commands are used in the experiments:

  • dch (-f): computes structural choices used to mitigate the structural bias [15], where -f stands for “fast”;

  • if -K 6: performs delay-oriented technology map** with choices into 6666-LUTs using 6666-feasible cuts;

  • if -s -S 66 -K 8: performs delay-oriented technology map** using 8888-feasible cuts and decomposes logic for minimal delay into two 6666-LUTs using a SAT-based formulation (available in ABC but not published);

  • if -Z 6 -K 8: performs technology map** into 6666-LUTs using the proposed implementation of delay-oriented ACD described in Section IV for 8888-feasible cuts;

  • st: derives an AIG from an LUT network.

V-A Decomposition success rate

TABLE I: Decomposition success ratio into two 6666-LUTs for practical functions using different ACD methods.
ACD type 7 vars (41071) 8 vars (107466) 9 vars (195602) 10 vars (313649) 11 vars (404991)
Success (%) Time(s) Success (%) Time(s) Success (%) Time(s) Success (%) Time(s) Success (%) Time(s)
lutpack [9] 98.34% 20.39 83.47% 64.37 69.92% 154.38 48.95% 334.79 26.87% 897.55
S66 [10] 84.18% 0.60 69.24% 2.57 52.13% 4.99 37.36% 6.99 19.14% 9.79
66 1-SS 97.30% 0.28 82.23% 1.41 74.24% 4.20 63.06% 9.39 32.88% 16.43
66 M-SS 99.82% 0.30 92.94% 3.08 84.71% 9.92 63.06% 9.73 32.88% 16.58

In this experiment, we evaluate the performance of ACD in decomposing functions by comparing it against other implementations in ABC. Specifically, we test the number of functions that can be successfully decomposed into two 6666-LUTs and the run time needed. We run this experiment on practical functions, i.e., functions that are observable in designs and benchmarks, which include fully-, partially-, and non-DSD-decomposable functions. We extract practical functions from the EPFL benchmarks. Since the number of practical functions can be large, we classify them into 𝒩𝒫𝒩𝒩𝒫𝒩\mathcal{NPN}caligraphic_N caligraphic_P caligraphic_N-equivalence classes employing the heuristic sifting algorithm [27, 28].

Table I shows the percentage of decomposable functions and the runtime for different methods and support sizes. For instance, the first column contains results for decomposing practical 7777-input functions, where (41071)41071(41071)( 41071 ) indicates the number of unique NPN classes collected. Each row of the table shows one ACD method. The first method lutpack [9] performs a heuristic ACD using DSD and the Shannon’s expansion, supporting up to 3333-SS variables. The second method, S66 [10], performs ACD using heuristic variable re-ordering supporting at most 1111-SS variable. Finally, we present two variants of our decomposition method restricted to use 2222 6666-LUTs. One uses up to 1111-SS variable (66 1111-SS), the other (66 M-SS) has no restrictions on the number of SS variables. The approaches described in this paper outperform the state of the art in quality for a competitive or better run time.

V-B Decomposition success rate for delay optimization

TABLE II: Success ratio when decomposing practical functions into 2222 levels of 6666-LUTs with the given late-arriving variables.
N late ACD type 7 vars 8 vars 9 vars 10 vars 11 vars
0 66 M-SS 99.82% 92.94% 84.71% 63.06% 32.88%
Generic 100.00% 100.00% 98.05% 90.20% 32.88%
1 66 M-SS 96.59% 79.60% 61.51% 37.35% 16.54%
Generic 100.00% 100.00% 97.57% 83.23% 16.54%
2 66 M-SS 86.22% 59.78% 39.28% 23.74% 10.95%
Generic 100.00% 100.00% 94.19% 66.56% 10.95%
3 66 M-SS 65.11% 36.37% 21.25% 13.78% 6.96%
Generic 93.78% 86.03% 76.82% 44.51% 6.96%
4 66 M-SS 36.96% 17.00% 8.62% 7.21% 4.43%
Generic 54.55% 40.42% 25.45% 23.70% 4.43%
5 66 M-SS 14.52% 5.42% 2.96% 2.84% 2.61%
Generic 14.52% 5.42% 2.96% 2.84% 2.61%

We extend the previous experiment to evaluate delay minimization using the proposed ACD method. This experiment tests the success rate of the decomposition for practical functions given delay-critical variables, which are required to be in the free set. Informally, for delay-critical variables with delay D𝐷Ditalic_D, this experiment checks the existence of a decomposition with delay D+1𝐷1D+1italic_D + 1. We only consider 66 M-SS and generic ACD since other known methods do not perform delay minimization using the input arrival time. For each function, we randomly generate up to 10101010 unique sets of delay-critical variables and test the decomposition for each one of them. While 66 M-SS is limited to two LUTs, generic can use up to 4444 LUTs.

Table II presents the success rate based on the number of delay-critical variables, shown in the column “N late”. The table highlights the advantage of supporting multiple BS functions. Generic ACD has a high success rate in most cases. Limitations occur when the number of delay-critical variables exceeds 3333 or the number of variables in the support is 10101010 or more. Generally, the decomposition of 11111111-input variables is rare. However, many 10101010 input variables are still decomposable.

V-C Delay-driven LUT map**

TABLE III: Comparison of delay-driven LUT map**, LUT map** into LUT structure “66”, and LUT map** using ACD.
Benchmark ABC: dch; if -K 6 ABC: dch; if -s -S 66 -K 8 ACD ACD; st; dch -f; if -K 6
LUTs Edges Depth Time (s) LUTs Edges Depth Time (s) LUTs Edges Depth Time (s) LUTs Edges Depth Time (s)
adder 363 1433 22 0.18 362 1465 20 0.28 383 1519 16 0.20 353 1518 10 0.39
bar 1664 9344 4 0.44 1664 9344 4 0.57 1664 9344 4 0.47 1006 5274 4 0.76
div 8618 32394 406 6.62 9107 33665 397 13.42 11644 44496 326 7.16 9068 39167 271 21.19
hyp 58393 239097 1864 5.43 61701 247699 1840 31.82 65615 264998 1396 11.13 61769 263254 1034 19.76
log2 9712 43562 58 17.05 10172 44943 58 30.06 10313 46365 56 17.81 9429 42533 57 39.09
max 831 3804 14 0.37 840 3668 14 0.63 1211 5578 12 0.42 871 4277 11 1.39
multiplier 7383 34137 36 6.01 7334 32781 36 12.11 7693 35798 33 6.82 6800 31705 31 13.32
sin 1928 8445 30 1.31 1948 8463 30 4.94 2052 8913 29 1.50 1830 8178 30 2.91
sqrt 7515 29573 663 4.17 7972 30610 638 12.66 10156 38558 519 4.73 9292 36030 476 8.77
square 4122 17319 23 1.98 4165 17547 22 3.91 4107 17924 18 2.22 4118 18285 14 5.15
arbiter 1833 8982 6 1.64 1879 8836 6 2.02 1850 8987 6 1.70 2037 8780 6 3.33
cavlc 137 707 4 0.13 104 491 4 0.56 137 707 4 0.15 123 655 4 0.20
ctrl 30 133 2 0.07 28 127 2 0.08 30 133 2 0.08 29 126 2 0.08
dec 287 684 2 0.09 287 1404 2 0.1 287 684 2 0.10 284 816 2 0.12
i2c 312 1360 3 0.16 306 1316 3 0.36 319 1378 3 0.19 297 1329 3 0.27
int2float 52 258 3 0.08 46 205 3 0.18 52 258 3 0.09 50 251 3 0.11
mem_ctrl 11037 48812 18 10.24 10830 46368 18 31.67 11232 49483 17 11.40 10398 45793 16 20.57
priority 178 725 6 0.11 182 736 6 0.18 185 736 6 0.12 171 698 6 0.17
router 89 285 4 0.09 61 283 4 0.14 92 290 4 0.09 89 279 4 0.12
voter 1838 8596 13 2.23 1784 8624 13 4.14 1838 8583 13 2.32 1777 8426 13 4.82
Improvement 2.57% -2.57% 1.04% -8.13% -7.87% 7.52% 2.20% -0.30% 12.39%
Total 58.40 149.83 68.70 142.52

Table III compares four technology map** strategies for delay minimization during map** into 6666-LUTs, assuming a unit-delay model. Each strategy takes the baseline as an input and computes structural choices before map**. Structural choices have not been used for the benchmark hyp due to a known bug in ABC. The proposed method is compared against standard LUT map** and map** into LUT structures. Command ACD denotes our mapper with Boolean decomposition using the sequence “dch; if -Z 6 -K 8”. We do not compare against [10] and [9] because those methods do not support delay minimization. Furthermore, we do not compare against the recent mapper with gate decomposition based on bin-backing [29]. Nevertheless, the mapper in [29] would improve the average delay of ABC if by only 0.310.310.310.31%.

Map** into LUT structure “66666666” composed of two 6-LUTs, which is a SAT-based version of structural ACD, reduces depth by 1.041.041.041.04% and the area by 2.572.572.572.57% on average, at the cost of increasing the number of edges by 2.572.572.572.57%. The proposed LUT map** with ACD improves the depth of the LUT network by 7.527.527.527.52% on average while increasing the number of LUTs and edges by 8.138.138.138.13% and 7.877.877.877.87%, respectively.

Note that most of the improvement is concentrated in the first 10101010 benchmarks since others are already close to their best known depth [30]. For 4444 of them, the delay reduction exceeds 20202020% and is up to 27.2727.2727.2727.27%. Practically, part of the area increase can be reduced by area-recovery methods [9, 31, 32], using delay relaxation, or by an additional map** step applied after ACD. The rightmost strategy performs the latter option. The LUT count and edge count are reduced considerably, leading to an area improvement of 2.202.202.202.20%, compared to traditional technology map** with choices. Also, the logical depth further decreases up to 54.5554.5554.5554.55%. Specifically, the result after ACD is used as a choice to improve the next round of technology map** because choices extracted from map** with ACD are more structurally suited to delay-oriented map**, compared to the original AIG. Moreover, structural choices help reduce the area over the non-critical paths. Note that a second map** round does not provide practical benefits if applied to the default LUT mapper (leftmost column) since the network after deriving the AIG is structurally similar to the baseline. Furthermore, benchmark hyp is noticeably improved by remap** both in area and delay without using structural choices. Regarding the run time, map** with ACD is faster than map** into LUT structures while being more general.

V-D EPFL synthesis competition

TABLE IV: LUT map** in the EPFL synthesis competition.
Benchmark Best [30] dch -f; if -K 6 dch -f; if -Z 6 -K 10
LUTs Depth LUTs Depth LUTs Depth
adder 347 5 360 6 445 5
bar 512 4 512 4 512 4
div 25318 175 23461 192 31526 175
hyp 182723 483 122394 511 154903 473
log2 8617 52 8778 60 9613 51
max 1114 6 1113 7 1250 6
multiplier 7785 25 6839 28 6903 25
sin 680530 10 1820 33 2379 27
sqrt 29593 162 30945 172 41626 156
square 3732 10 4189 11 4275 10

This experiment shows that map** using ACD can improve well-optimized LUT networks, resulting in best known results for 4444 benchmarks in the EPFL synthesis competition. The previous best results were obtained using a portfolio of heavy logic optimizations applied to various representations, such as AIGs and LUT networks. In recent years, results have been further improved using design-space exploration (DSE) techniques that incrementally generate optimization scripts.

We obtain the optimized AIGs by repeatedly running the script used in the baseline of Table III along with additional delay-oriented AIG commands in ABC. From the obtained AIG, we compare traditional LUT map** with choices to LUT map** with ACD. Notably, results from the traditional mapper are quite far from the best results. This observation shows, as expected, that our technology-independent optimization finds worse AIGs than those used to obtain the best results. However, LUT map** with ACD matches or improves the depth for almost all benchmarks. The improved benchmarks are hyp, log2, multiplier, and square. Remarkably, our method reduces the depth of hyp by 10101010 levels, compared to the state of the art, while reducing area by 15151515%. In the benchmark multiplier, our result matches the depth but improves the number of LUTs. Benchmark sin is the only one where there is a large gap compared to the best result. In particular, the best result for sin requires significant logic duplication that is not performed in our synthesis flow. Contrarily to many other methods used to produce the best results, our results in Table III are obtained directly by LUT map** without employing post-map** optimization.

VI Conclusion

This work proposes a novel formulation of Ashenhurst-Curtis decomposition (ACD) that enables efficient technology map** and post-map** resynthesis. The algorithm is truth-table-based and works for any size of the free set, bound set, and shared set, which makes it well-suited for delay optimization. We have shown that the proposed Boolean decomposition improves state-of-the-art in the decomposition quality with a competitive runtime. We have implemented and integrated the proposed method into a delay-driven LUT mapper. The experiments have shown that LUT map** with ACD can improve the average delay by 12.3912.3912.3912.39%, compared to the traditional structural LUT map** with choices. Furthermore, the proposed approach has produced best results for 4444 test cases in the EPFL synthesis competition.

Acknowledgments

This research was supported by the SNF grant “Supercool: Design methods and tools for superconducting electronics”, 200021_1920981, and Synopsys Inc.

References

  • [1] R. L. Ashenhurst, “The decomposition of switching functions,” 1957, pp. 74–116.
  • [2] J. P. Curtis, “A new approach to the design of switching circuits,” 1962.
  • [3] J. P. Roth and R. M. Karp, “Minimization over boolean graphs,” IBM Journal of Research and Development, vol. 6, no. 2, pp. 227–238, 1962.
  • [4] V. N. Kravets and K. A. Sakallah, “Constructive multi-level synthesis by way of functional properties,” Ph.D. dissertation, 2001.
  • [5] C. Legl, B. Wurth, and K. Eckl, “Computing support-minimal subfunctions during functional decomposition,” Trans. VLSI, vol. 6, no. 3, pp. 354–363, 1998.
  • [6] M. Perkowski, M. Marek-Sadowska, L. Jozwiak, T. Luba, S. Grygiel, M. Nowicka, R. Malvi, Z. Wang, and J. Zhang, “Decomposition of multiple-valued relations,” in Proc. Inter. Symp. on Mult.- Valued Logic, 1997, pp. 13–18.
  • [7] J.-H. Jiang, Y. Jiang, and R. K. Brayton, “An implicit method for multi-valued network encoding,” in Proc. IWLS, 2001, pp. 127–131.
  • [8] R. Bryant, “Graph-based algorithms for boolean function manipulation,” IEEE Trans. on Computers, vol. C-35, no. 8, pp. 677–691, 1986.
  • [9] A. Mishchenko, R. Brayton, and S. Chatterjee, “Boolean factoring and decomposition of logic networks,” in Proc. ICCAD, 2008, pp. 38–44.
  • [10] S. Ray, A. Mishchenko, N. Een, R. Brayton, S. Jang, and C. Chen, “Map** into LUT structures,” in Proc. DATE, 2012.
  • [11] J. Cong and Y. Ding, “FlowMap: an optimal technology map** algorithm for delay optimization in lookup-table based FPGA designs,” Trans. CAD, vol. 13, no. 1, pp. 1–12, 1994.
  • [12] A. H. Farrahi and M. Sarrafzadeh, “Complexity of the lookup-table minimization problem for FPGA technology map**,” IEEE Trans. CAD, 1994.
  • [13] E. Lehman, Y. Watanabe, J. Grodstein, and H. Harkness, “Logic decomposition during technology map**,” Trans. CAD, 1997.
  • [14] G. Chen and J. Cong, “Simultaneous logic decomposition with technology map** in FPGA designs,” in Proc. FPGA, 2001, p. 48–55.
  • [15] S. Chatterjee, A. Mishchenko, R. Brayton, X. Wang, and T. Kam, “Reducing structural bias in technology map**,” in Proc. ICCAD, 2005.
  • [16] A. Mishchenko, R. Brayton, A. Tempia Calvino, and G. De Micheli, “Boolean decomposition revisited,” in Proc. IWLS, 2023.
  • [17] A. Mishchenko and R. Brayton, “Scalable logic synthesis using a simple circuit structure,” in Proc. IWLS, 2006.
  • [18] J. Cong, C. Wu, and Y. Ding, “Cut ranking and pruning: Enabling a general and efficient FPGA map** solution,” in Proc. FPGA, 1999.
  • [19] A. Mishchenko, S. Cho, S. Chatterjee, and R. Brayton, “Combinational and sequential map** with priority cuts,” in Proc. ICCAD, 2007.
  • [20] G. De Micheli, R. Brayton, and A. Sangiovanni-Vincentelli, “Optimal state assignment for finite state machines,” Trans. CAD, vol. 4, no. 3, pp. 269–285, 1985.
  • [21] T. Villa and A. Sangiovanni-Vincentelli, “NOVA: state assignment of finite state machines for optimal two-level logic implementation,” Trans. CAD, vol. 9, no. 9, pp. 905–924, 1990.
  • [22] S. Yang and M. Ciesielski, “Optimum and suboptimum algorithms for input encoding and its relationship to logic minimization,” Trans. CAD, vol. 10, no. 1, pp. 4–12, 1991.
  • [23] R. Brayton and A. Mishchenko, “ABC: An academic industrial-strength verification tool,” in Computer Aided Verification, T. Touili, B. Cook, and P. Jackson, Eds., 2010. [Online]. Available: https://github.com/berkeley-abc/abc
  • [24] L. Amarù, P.-E. Gaillardon, and G. D. Micheli, “The EPFL combinational benchmark suite,” in Proc. IWLS, 2015.
  • [25] A. Mishchenko, S. Chatterjee, and R. Brayton, “FRAIGs: A unifying representation for logic synthesis and verification,” EECS Dep., UC Berkeley, Tech. Rep., 2005.
  • [26] W. Yang, L. Wang, and A. Mishchenko, “Lazy man’s logic synthesis,” in Proc. ICCAD, 2012, p. 597–604.
  • [27] Z. Huang, L. Wang, Y. Nasikovskiy, and A. Mishchenko, “Fast boolean matching based on NPN classification,” in Intern. Conf. on Field-Programmable Technology, 2013.
  • [28] M. Soeken, A. Mishchenko, A. Petkovska, B. Sterin, P. Ienne, R. K. Brayton, and G. De Micheli, “Heuristic NPN classification for large functions using AIGs and LEXSAT,” in Theory and Applications of Satisfiability Testing, N. Creignou and D. Le Berre, Eds., 2016.
  • [29] L. Fan and C. Wu, “FPGA technology map** with adaptive gate decomposition,” in Proc. FPGA, 2023, p. 135–140.
  • [30] “EPFL synthesis competition best results [2023].” [Online]. Available: https://github.com/lsils/benchmarks/tree/v2023.1/best_results
  • [31] A. Mishchenko, R. Brayton, J.-H. R. Jiang, and S. Jang, “Scalable don’t-care-based logic optimization and resynthesis,” ACM Trans. Reconfigurable Technol. Syst., vol. 4, no. 4, 2011.
  • [32] B. Schmitt, A. Mishchenko, and R. Brayton, “SAT-based area recovery in structural technology map**,” in Proc. ASP-DAC, 2018, pp. 586–591.