# Gate Sizing for Crosstalk Reduction under Timing Constraints by Lagrangian Relaxation Debjit Sinha and Hai Zhou Electrical and Computer Engineering Northwestern University Evanston, IL 60208 Abstract—This paper presents a post-route, timing-constrained gate-sizing algorithm for crosstalk reduction. Gate-sizing has emerged as a practical and feasible method to reduce crosstalk in deep sub-micron VLSI circuits. It is however critical to ensure that the timing constraints of the circuit are not violated after sizing. We present an iterative gate-sizing algorithm for crosstalk reduction based on Lagrangian Relaxation that optimizes area and power while ensuring that the given timing constraints are met. Experimental results demonstrating the effectiveness of the algorithm are reported for the ISCAS benchmarks and other large circuits with comparisons to an alternative design methodology. #### I. INTRODUCTION As technology scales into the deep submicron regime, increased aspect ratios make the significance of coupling capacitances profound. For present day processes, the coupling capacitance can be as high as the sum of the area capacitance and the fringing capacitance, and trends indicate that the role of coupling capacitance will be even more dominant in the future as feature-sizes shrink [1]. Noise induced on nets due to coupling can cause functional failures in the circuit. This makes crosstalk reduction critical. In the rest of the paper we shall use the terms coupling-noise and crosstalk interchangeably. Crosstalk on a net is dependent on the size of its driving-gate and the driving-gate size of each net coupled to it. Various design methodologies try to minimize crosstalk in early stages of design. In the post-routing stage, coupling-noise on some nets may still be critical which need to be fixed. In this scenario, gate-sizing is an attractive approach to noise reduction since it does not require re-routing. Scalable libraries and existing fill space can be used for gate-sizing. Crosstalk on a net can be reduced if the size of its driving-gate is increased, or if driving-gate sizes of the coupling nets are decreased. However, when the driving-gate of a net is sized up, it may increase the noise it induces on other nets as an aggressor. On the other hand, when the driving gate-size of an aggressor is reduced, the noise on itself may increase. Since coupling is symmetric on the coupled nets, it is artificial to classify them into aggressors or victims since a net could be an aggressor and a victim at the same time. Though it is plausible to classify a net either as an aggressor or a victim based on the strength of the noise on itself and other coupled nets, with changing driving-gate sizes, the role of a net may change. We therefore do not classify nets as aggressors or victims. Gate-sizing for circuit optimization and crosstalk reduction has been shown to be effective by researchers in the past. Crosstalk aware static timing analysis has been used by Xiao et al. [2] in a gate-sizing method for elimination of crosstalk induced timing violation. The gate-sizing algorithm for crosstalk noise optimization proposed by Hashimoto et al. [3] is limited by the fact that it only incorporates sizing down the aggressor gates. The algorithm proposed by Becer et al. [4] does not guarantee an optimal solution. An optimal algorithm for simultaneous gate and wire sizing under path delay constraints was presented by Chen et al [5], which used a non-linear optimization technique called Lagrangian Relaxation [6]-[8]. Crosstalk reduction was however not considered in their work. Jiang et al. [9] proposed a similar algorithm which considered crosstalk reduction by trying to minimize physical coupling capacitance by wire-sizing. Sinha et al. [10] proposed an optimal gate-sizing algorithm for coupling-noise reduction. The optimal gate-sizing algorithm finds minimal gate-sizes under given physical and timing constraints that ensure no coupling-noise violations in the circuit. However, the algorithm does not consider the effect of sizing on the path delay, but assumes that the given gate-size bounds guarantee that the timing constraint of the circuit is met. It requires timing budgeting before the size bound generations and is over constrained. In this paper, we propose an iterative gate-sizing algorithm for crosstalk reduction, which considers timing constraints on the path, and attains to minimize the weighted sum of gate-sizes. Lagrangian relaxation is used to solve the non-linear optimization problem, and the associated problem of crosstalk reduction is translated into a fixpoint computation problem. We develop gate-sizing transformations, and use them to obtain the solution. The effectiveness of the algorithm is validated by experimental simulations on the ISCAS benchmarks [11] and larger circuits. We compare our results to the design methodology where successive iterations of gate-sizing are performed for timing and for crosstalk reduction independently. This alternative design methodology was based on algorithms from [5] and [10]. The rest of the paper is organized as follows. We formulate the gate-sizing for crosstalk reduction under timing constraints problem in section II and explain the theoretical concepts of gate-sizing by Lagrangian Relaxation in section III. We present the crosstalk aware gate-sizing algorithm in section IV and experimental results with comparisons to the alternative design methodology in section V respectively. We finally draw conclusions in section VI. #### II. PROBLEM FORMULATION #### A. Motivation Gate-sizes have been traditionally used in area and power metrics for VLSI circuits. A very common objective of most VLSI circuit optimizations is to optimize the area and power of a circuit without compromising on its performance. Performance is related to the delay of critical paths in a circuit. Consequently, the acceptable delay between primary inputs and outputs are constrained to attain a certain performance level, which cannot be sacrificed during other optimizations. Optimizations involving gate sizing for crosstalk reduction must not violate the timing constraints of the circuit. Successive iterations of gate-sizing for crosstalk reduction and timing become necessary. This motivates the development of a gate-sizing algorithm for crosstalk reduction which considers the timing constraints on the primary outputs, thus saving independent iterations of gate-sizing for crosstalk reduction and timing. #### B. Circuit Modeling We model a circuit as a directed acyclic graph (DAG). Nodes of the DAG represent gates in the circuit and the edges represent the corresponding nets. A coupling-graph [10] having coupling information is then superimposed on the DAG. A pseudo node (*PO* henceforth) is added to represent a single primary output node having incoming edges from all the output nodes of the DAG. Similarly, a pseudo input node (*PI* henceforth) is added having outgoing edges to all input nodes of the DAG. The graph is topologically sorted with *PO* having index 0 and *PI* having the largest index for ease of notation. Gates and nets are modeled as standard switch level and $\pi$ -type RC circuits respectively. The gate-size of gate i is denoted by s(i), while $\hat{r_i}$ and $\hat{c_i}$ represent unit gate-size output resistance and input capacitance per unit gate-size respectively. Output resistance of gate i can now be expressed as $r_i = \hat{r_i}/s(i)$ and the capacitance of an input pin of the gate can be expressed as $c_i = \hat{c_i}s(i) + f(i)$ , where f(i) represents the gate perimeter capacitance. For simplicity, we assume input capacitances of all input pins of a gate to be the same and ignore intrinsic gate delay. The algorithm can however be simply extended to handle them. We use the Elmore delay model for timing calculations. #### C. Problem Definition Gate-sizes of a circuit can be represented by a gate-size vector S as $$S \stackrel{\triangle}{=} (i: 0 \leq i < n: s(i))$$ Given the size of gates in a circuit, the noise induced on each net can be calculated using a noise-model. It is not necessary that the noise-model be an analytical one. Coupling-noise N(i) on a given net can be represented as $$N(i) \stackrel{\triangle}{=} f_i(s(i), s(i_1), \dots, s(i_k))$$ where s(i) represents the driving-gate size of the net and $s(i_1), \ldots, s(i_k)$ represent driving-gate sizes of the coupled nets respectively. Given a weight vector $W=(i:0\leq i< n:w_i),\ w_i\geq 0$ , the objective of the optimal gate-sizing problem for crosstalk reduction is to find the minimal weighted-sum of gate-sizes under the constraints that the maximal arrival time at the PO with respect to the primary input is at most $A_0$ , noise N(i) On every net is at most U(i) and $I(i)\leq s(i)\leq u(i)$ . Here I(i) represents the maximum noise the corresponding net can tolerate, while I(i) and I(i) represent the minimum and maximum driving gate-sizes of the net given by physical constraints respectively. The timing constraints require that for all paths I(i) from I(i) for I(i) the sum of the delays across components in each path I(i) must be at most I(i). We can thus express the gate-sizing problem formally as $$Minimize \sum_{i=0}^{n-1} w_i s(i) \tag{1}$$ s.t. $$\sum_{i \in p} D_i \le A_0 \qquad \forall p \in P$$ $$N(i) \le U(i) \qquad \forall i \in W$$ $$l(i) \le s(i) \le u(i) \quad \forall i \in G$$ where, $D_i$ is the Elmore delay associated with component i, P is the set of all paths p from PI to PO, W is the set of all nets, and G is the set of all gates in the circuit. Since the total number of paths in P can be exponential, the above formulation is impractical for analysis and optimization. We therefore associate a variable $a_i$ to each component which represents the output arrival time for that component. This gives an alternate formulation of the above problem, which can be formally expressed as $$Minimize \sum_{i=0}^{n-1} w_i s(i)$$ (2) s.t. $$a_{j} \leq A_{0} \qquad \forall j \in input(PO)$$ $$a_{j} + D_{i} \leq a_{i} \qquad \forall i \wedge j \in input(i)$$ $$N(i) \leq U(i) \qquad \forall i \in W$$ $$l(i) \leq s(i) \leq u(i) \qquad \forall i \in G$$ where, input(i) gives the set of components that are inputs to component i. If we consider the weighted sum of gate-sizes in an area metric, the gate-sizing problem can be interpreted as finding the minimum area of the circuit such that there are no noise violations and the timing constraints are met. Similarly, power consumption can be expressed as $\sum_{i=0}^{n-1} p_i s(i)$ for some non-negative constants $p_0, p_1, \ldots, p_{n-1}$ . The given gate-sizing problem thus attains to find a gate-size vector which minimizes area and power under no noise violations, ensures that the timing constraints are met and all gates are within their respective size bounds. # III. GATE SIZING BY LAGRANGIAN RELAXATION # A. Lagrangian Relaxation Lagrangian Relaxation (LR) is a widely acclaimed technique for solving constrained optimization problems. In LR, "troublesome" constraints are "relaxed" and incorporated into the objective function after multiplying them with nonnegative constants called *Lagrange multipliers*, one multiplier for each constraint. For each fixed vector $\lambda$ of the *Lagrange multiplier* introduced, we obtain an easier sub-problem called the Lagrangian relaxation sub-problem associated with $\lambda$ ( $\mathcal{LRS}/\lambda$ ). The problem of finding a $\lambda$ such that the solution to the sub-problem is also the solution to the original constrained problem is called the Lagrangian Dual problem or $\mathcal{LDP}$ . We use LR to solve the gate sizing problem for crosstalk reduction under timing constraints. Kuhn-Tucker conditions [8] are used to simplify the sub-problem to a simpler sub-problem denoted by $\mathcal{LRS}/\mu$ . We use the method of sub-gradient optimization [6], [8] to solve the $\mathcal{LDP}$ . We relax the constraints on the arrival time of the components since they are difficult to handle. Analytical noise models are complex in form and cannot be expressed as posynomials. We leave the noise and size bound constraints unrelaxed. This facilitates the use of the lattice-fixpoint framework in [10]. The sub-problem objective function can be expressed as a posynomial additionally. Following the Lagrangian relaxation procedure, we introduce a Lagrange multiplier for each constraint on arrival time. For all $j \in input(PO)$ , we introduce $\lambda_{j0}$ for the constraint $a_j \leq A_0$ . For all $j \in input(i) \land i \neq 0$ , we introduce $\lambda_{ji}$ for the constraint $a_j + D_i \leq a_i$ . We assume that the arrival time of PI is 0. We denote the vector of Lagrange multipliers introduced by $\lambda$ and the vector of arrival times using $A = (i : 0 < i < n : a_i)$ respectively. Thus the Lagrangian sub-problem associated with the Lagrangian multiplier $\lambda$ can be expressed as $$Minimize\ L_{\lambda}(S,A)$$ (3) s.t. $$\begin{split} N(i) & \leq U(i) & \forall i \in W \\ l(i) & \leq s(i) \leq u(i) & \forall i \in G \end{split}$$ where, $L_{\lambda}(S, A)$ is given by $$\sum_{i=0}^{n-1} w_i s(i) + \sum_{j \in input(PO)} \lambda_{j0} (a_j - A_0)$$ $$+\sum_{i=0}^{n-1}\sum_{i\in Input(i)}\lambda_{ji}(a_j+D_i-a_i)$$ Kuhn Tucker [8] conditions imply the following optimality condition on $\lambda$ . The proof is similar to that in [5]. $$\sum_{i \in output(k)} \lambda_{ki} = \sum_{j \in input(k)} \lambda_{jk} \ \forall \ k$$ The above conditions on $\lambda$ can be used to simplify $\mathcal{LRS}/\lambda$ . We define $\Omega_{\lambda} = (\lambda \geq 0 : \lambda)$ to denote the set of $\lambda$ which satisfies the above conditions. For any $\lambda \in \Omega_{\lambda}$ , $(\mathcal{LRS}/\lambda)$ is now equivalent to solving a new sub-problem called $(\mathcal{LRS}/\mu)$ , which is defined as $$Minimize\ L_{\mu}(S)$$ (4) s.t. $$N(i) \le U(i) \quad \forall i \in W$$ $l(i) \le s(i) \le u(i) \quad \forall i \in G$ where $\mu=(i:0\leq i< n:\mu_i)$ , $\mu_i=\sum_{j\in input(i)}\lambda_{ji}$ , and $L_{\mu}(S)=\sum_{i=0}^{n-1}\mu_iD_i+\sum_{i=0}^{n-1}w_is(i)$ . We can now solve $\mathcal{LRS}/\lambda$ by solving $\mathcal{LRS}/\mu$ instead. The vector A can then be evaluated by topologically considering each variable $a_i$ from the PI and setting it to the smallest possible value that satisfies the timing constraints. #### B. Solving $LRS/\mu$ A local refinement function is derived to solve the Lagrangian sub-problem, which is applied to gates iteratively until convergence. We define upstream(i) to be the set of resistor indexes on the path(s) from gate i to the nearest upstream gate(s) or input driver(s). Let $R_i = \sum_{j \in upstream(i)} \mu_j r_j$ and $C_i$ denote the weighted upstream resistance and the downstream capacitance of gate i respectively. We define $h_i$ to be a function of gate-sizes in a circuit that returns the optimal size of gate i that minimizes $L_{\mu}$ , given other gate-sizes and no other constraints (similar to [5]). Formally, this is represented as $$h_i(S) \stackrel{\triangle}{=} (min \ s(i) : \ L_{\mu}) \tag{5}$$ $L_{\mu}$ can be expressed as a function of s(i) assuming all other gate-sizes are fixed [5]. $$L_{\mu} = A_i(S) \cdot s(i) + \frac{B_i(S)}{s(i)} + E_i(S)$$ (6) where $A_i(S)$ , $B_i(S)$ , and $E_i(S)$ are functions of gate-sizes of the circuit, but independent of s(i). $A_i(S)$ is given by $\hat{c_i}R_i + w_i$ , and $B_i(S)$ is given by $\mu_i\hat{r_i}C_i$ . From the above equation, the optimal value of s(i) can be obtained that minimizes $L_\mu$ . We can formally evaluate $h_i$ as $$h_i(S) = \sqrt{\frac{B_i(S)}{A_i(S)}} = \sqrt{\frac{\mu_i \hat{r}_i C_i}{\hat{c}_i R_i + w_i}}$$ (7) We next consider the gate-sizing transformation $g_i$ as in [10]. Based on the monotonic property of the noise function with gate sizes, $g_i$ is defined as a transformation that gives the minimal gate-size of a gate such that all its driven nets satisfy - Algorithm: Solve $\mathcal{LRS}/\mu$ - Output: optimized S - begin - 1) initialize all driver-sizes to their minimum $(\forall i: 0 \le i < n: s(i) = l(i))$ - 2) do - 3) for i := 1 to n 1 - 4) evaluate $C_i$ - 5) for i := n 1 downto 0 - 6) evaluate $R_i$ - $(i) = \phi_i(S)$ - 8) while (no further improvement) - end Fig. 1. Algorithm to solve $\mathcal{LRS}/\mu$ noise constraints, given other gate-sizes. Formally, we express this as $$g_i(S) \stackrel{\triangle}{=} min \ \{s(i) : N(i) \le U(i)\}$$ (8) Having defined $g_i(S)$ and $h_i(S)$ , we define $\phi_i(S)$ to be the local refinement function for a gate i which returns its optimal size that minimizes $L_{\mu}$ , satisfies the noise constraint on its driven nets and is within its physical size constraints. Formally, we express $\phi_i$ as $$\phi_i(S) \stackrel{\triangle}{=} \min \left( u(i), \max(l(i), g_i(S), h_i(S)) \right)$$ (9) $\mathcal{LRS}/\mu$ can be solved by a greedy algorithm based on iteratively resizing the gates. Local refinement is performed on every gate iteratively assuming all other gate sizes are fixed. $R_i$ and $C_i$ are evaluated for each gate by traversing the circuit in a reverse topological order and in a topological order respectively. $g_i$ for every gate is evaluated based on the noise model used. The iteration continues till convergence. The pseudo code of the algorithm is given in Fig 1. We present the proof of convergence next. A system of equations is formed when locally refined gatesizes for all nets are combined: $$( \forall i : 0 \le i < n : s(i) = \phi_i(s(0), \dots, s(n-1) )$$ If $\Phi = (i : 0 \le i < n : \phi_i)$ is used to represent the vector transformation, the above equation can be written as: $$S = \Phi(S) \tag{10}$$ A solution to Eqn (10) is a gate-size vector S, which is called a fixpoint of $\Phi$ . We shall prove in the following theorem that $\Phi$ is a monotonic and convergent transformation over the partial order $\leq$ as explained next. Consider gate-size vectors $X = (i: 0 \leq i < n: x(i))$ and $Y = (i: 0 \leq i < n: y(i))$ . Definition 1: We say that $X \leq Y$ iff $(\forall i : 0 \leq i < n : x(i) \leq y(i))$ . Theorem 1: For any $S_1 \leq S_2$ , we have $\Phi(S_1) \leq \Phi(S_2)$ . *Proof:* For two gate-size vectors $S_1 = (i: 0 \le i < n: s_1(i))$ and $S_2 = (i: 0 \le i < n: s_2(i))$ , given $S_1 \le S_2$ , from Defn (1) we can state that $$(\forall i : 0 \le i < n : s_1(i) \le s_2(i)) \tag{11}$$ We prove the above theorem using contradiction. If $\Phi(S_1) \not \leq \Phi(S_2)$ we must have $$(\exists j : 0 \le j < n : \phi_j(S_1) > \phi_j(S_2))$$ (12) From the definition of $\phi$ , we express this as $$\min\left(u(j), \max(l(j), g_j(S_1), h_j(S_1))\right) > \min\left(u(j), \max(l(j), g_j(S_2), h_j(S_2))\right)$$ $$\Rightarrow \max(g_j(S_1), h_j(S_1)) > \max(g_j(S_2), h_j(S_2))$$ $$\Rightarrow g_i(S_1) > g_i(S_2) \qquad \lor$$ $$h_i(S_1) > h_i(S_2)$$ However, $g_i$ and $h_i$ are monotonic [5], [10]. This implies that for the given gate size vectors $S_1$ and $S_2$ , we have $$(\forall i : 0 \le i < n : g_i(S1) \le g_i(S_2)) \quad \land \tag{13}$$ $$(\forall i : 0 \le i < n : h_i(S1) \le h_i(S_2)) \tag{14}$$ The above monotonic properties of $g_i$ and $h_i$ contradict (12). $\Phi$ is thus shown to be a monotonic and a convergent transformation, such that $\Phi(S_1) \leq \Phi(S_2)$ for $S_1 \leq S_2$ . # C. Lagrangian Dual Problem Let the function $Q(\lambda)$ be the solution to the Lagrangian sub-problem. We define the Lagrangian Dual problem $\mathcal{LDP}$ as $$Maximize \ Q(\lambda) \qquad \qquad (15)$$ $$subject \ to \qquad \qquad \lambda \geq 0$$ $Q(\lambda)$ is not differentiable in general. Methods like steepest descent, which depend on gradient directions are not necessarily applicable. The sub-gradient optimization method is used instead, and can be viewed as a generalization of the steepest descent method in which the gradient direction is substituted by a sub-gradient based direction [8]. This method however, cannot guarantee the optimal solution in all cases. Starting with an arbitrary value of $\lambda$ , the method iteratively moves from the current point to a new point following the sub-gradient direction. At step k, we solve the Lagrangian sub-problem using the $Solve\ \mathcal{LRS}/\mu$ algorithm. Subsequently, for each relaxed constraint, we define the sub-gradient to be the right side minus the left hand side of the constraint, evaluated at the current solution. The sub-gradient direction is the vector of all the sub-gradients. We move to a new point by multiplying a step size $\rho_k$ to the sub-gradient direction and adding it to $\lambda$ . Finally, $\lambda$ is projected back to the nearest point in $\Omega_{\lambda}$ , so that we can solve $\mathcal{LRS}/\mu$ instead of $\mathcal{LRS}/\lambda$ in the next iteration. This is repeated till convergence. - Algorithm: Solve $\mathcal{L}\mathcal{DP}$ - Output: $\lambda$ that maximizes $\mathcal{LRS}/\lambda$ - begin - 1) k := 1 - 2) $\lambda := \text{arbitrary initial vector in } \Omega_{\lambda}$ - 3) evaluate $\mu$ from $\lambda$ ( $\mu_i = \sum_{j \in input(i)} \lambda_{ji}$ ) - 4) solve $\mathcal{LRS}/\mu$ - 5) evaluate $A = (i : 0 \le i \le n : a_i)$ - 6) for i := 0 to n 1 do - for all $j \in input(i)$ do 8) $$\lambda_{ji} := \begin{cases} \lambda_{ji} + \rho_k(a_j - A_0) & \text{if } i = PO \\ \lambda_{ji} + \rho_k(a_j + D_i - a_i) & \text{otherwise} \end{cases}$$ 9) project $\lambda$ to the nearest point in $\Omega_{\lambda}$ - 10) k := k + 1 - 11) Repeat steps 3-10 until $(\sum_{i=0}^{n-1} w_i s(i) Q(\lambda))$ < Error bound. - end Fig. 2. Algorithm to solve $\mathcal{LDP}$ If the step size sequence $\rho_k$ satisfies the two conditions $\lim_{k\to\infty}\rho_k=0$ and $\sum_{k=1}^{\infty}\rho_k=\infty$ , then the sub-gradient optimization method will always converge to the optimal solution. The pseudo code of the algorithm is given in Fig 2. # IV. CROSSTALK AWARE GATE SIZING ALGORITHM Based on our problem formulation and the Lagrangian Relaxation procedure described in the previous section, the Crosstalk Aware Gate Sizing Algorithm is used to optimize the weighted sum of gate-sizes under given noise, timing and physical gate-size constraints. The pseudo code of the algorithm is given in Fig 3. Lemma 1: The solution obtained by the Optimal Gatesizing algorithm [10] $S_{opt}$ is a lower bound to the optimal solution of the gate-sizing problem. The Optimal Gate-sizing algorithm can be used to obtain the lower size bound of every gate in the given gate-sizing problem, since $S_{opt}$ gives the minimum size of each gate for which noise violations do not exist. Here we assume that the size bounds used in the optimal gate-sizing algorithm for coupling-noise reduction are the same as the given bounds for our problem. We can now use $S_{opt}$ as the lower bound of gate-sizes during our optimization process. Lemma 2: If the optimal solution to the gate-sizing problem without noise constraints $(S^*)$ as in [5] has a partial order over the relation < with the optimal solution to our gate-sizing problem $(S^{**})$ , the converged solution is guaranteed to be the optimal solution. If $S^* \leq S^{**}$ , then $h(S^*) = S^* \leq h(S^{**}) \leq S^{**}$ . Also the optimal solution must be a fixpoint of g. This implies that the optimal solution $S^{**}$ is a fixpoint of $\Phi$ ( $\Phi(S^{**}) = S^{**}$ ). Since $S^{**}$ minimizes the area metric, it is also the least fixpoint and the crosstalk aware gate-sizing algorithm is guaranteed to converge to it due to the monotonic property of $\Phi$ . - Algorithm: Crosstalk Aware Gate Sizing - **Input:** Layout extraction results - **Output:** Optimized gate-size vector S - begin - 1) construct DAG for circuit and superimpose coupling-graph based on layout extraction - 2) Call Solve $\mathcal{LDP}$ to get optimal $\lambda$ - 3) evaluate $\mu$ from $\lambda$ ( $\mu_i = \sum_{j \in input(i)} \lambda_{ji}$ ) 4) Call Solve $\mathcal{LRS}/\mu$ to get optimized S - end Fig. 3. Crosstalk Aware Gate Sizing Algorithm ### V. EXPERIMENTAL RESULTS We present results of the Crosstalk Aware Gate Sizing algorithm on the ISCAS'85 benchmarks [11] and three larger circuits, parameters for which were randomly generated with realistic values in a $0.18\mu$ technology. We used the $2-\pi$ noise model [12] for noise estimations and consider the following optimization stages - Unoptimized: Unoptimized Circuit with given timing constraints - Timing: Circuit after optimization for area without crosstalk consideration (similar to [5]) - *Timing* → *Noise* : Optimizing *Timing* for crosstalk reduction by Optimal Coupling-Noise Reduction algorithm [10] - Timing + Noise: Optimization using the developed Crosstalk Aware Gate Sizing algorithm Table I presents the number of nodes (# of Nodes), number of coupling edges in the coupling graph for the circuit (# of C Edges) and results for the four stages mentioned above. Each stage contains the weighted-sum of all gates (Area), the maximal delay of the circuit (Ta), and number of noise violations in the circuit (NV). Unoptimized represents a circuit stage satisfying a given timing constraint, but without area optimization or crosstalk consideration. For ease of comparison, we normalize the Area and Ta values for this stage and the corresponding values for the other stages are given with respect to this stage. Timing represents the circuit after optimizing for Area under the given timing constraints without crosstalk consideration. This is similar to the work in [5]. As expected, this stage is usually the optimal in terms of area, but has the most number of noise violations. $Timing \rightarrow Noise$ represents the circuit after gates of circuit in *Timing* are sized up using the Optimal Gate Sizing algorithm [10] to reduce coupling noise. Results demonstrate increase in area during this stage, and decrease in number of noise-violations, with little effect on timing. In a few cases, the circuit is timed faster, but experiments did not show any timing violation caused by the optimal coupling-noise reduction algorithm. *Timing* + Noise presents the circuit after the Unoptimized circuit is optimized with our Crosstalk Aware Gate Sizing algorithm. Though the area obtained was found to be more than the Timing stage, it was very effective in reducing noise violations, and proved that it could obtain lower final violations with $\label{eq:table I} \text{Noise Reduction Results.} \ U(i) = 0.2 V_{dd}$ | Circuit | # of | # of | Unoptimized | | | Timing | | | Timing → Noise | | | Timing + Noise | | | |---------|-------|---------|-------------|------|-----|--------|------|-----|----------------|------|----|----------------|------|----| | | Nodes | C Edges | Area | Ta | NV | Area | Ta | NV | Area | Ta | NV | Area | Ta | NV | | C432 | 198 | 553 | 1.00 | 1.00 | 3 | 0.59 | 0.99 | 2 | 0.60 | 0.99 | 1 | 0.54 | 0.99 | 0 | | C499 | 245 | 621 | 1.00 | 1.00 | 4 | 0.48 | 0.99 | 3 | 0.49 | 0.99 | 0 | 0.43 | 0.99 | 0 | | C880 | 445 | 1240 | 1.00 | 1.00 | 2 | 0.75 | 0.99 | 3 | 0.76 | 0.99 | 0 | 0.73 | 1.00 | 0 | | C1355 | 589 | 1653 | 1.00 | 1.00 | 6 | 1.11 | 0.99 | 12 | 1.13 | 0.99 | 0 | 1.12 | 0.99 | 0 | | C1908 | 915 | 2655 | 1.00 | 1.00 | 10 | 0.66 | 0.95 | 10 | 0.67 | 0.94 | 0 | 0.66 | 0.96 | 0 | | C2670 | 1428 | 3851 | 1.00 | 1.00 | 14 | 0.58 | 1.00 | 32 | 0.60 | 1.00 | 1 | 0.60 | 1.00 | 1 | | C3540 | 1721 | 5086 | 1.00 | 1.00 | 10 | 0.78 | 0.96 | 16 | 0.79 | 0.93 | 3 | 0.80 | 0.92 | 3 | | C5315 | 2487 | 7076 | 1.00 | 1.00 | 15 | 0.50 | 1.00 | 17 | 0.50 | 1.00 | 0 | 0.50 | 1.00 | 0 | | C6288 | 2450 | 7239 | 1.00 | 1.00 | 16 | 0.86 | 0.99 | 15 | 0.87 | 0.99 | 0 | 0.87 | 0.99 | 0 | | C7552 | 3721 | 10823 | 1.00 | 1.00 | 20 | 0.49 | 1.00 | 33 | 0.50 | 1.00 | 0 | 0.50 | 1.00 | 0 | | CKT_1 | 15K | 42017 | 1.00 | 1.00 | 53 | 0.21 | 1.00 | 67 | 0.22 | 1.00 | 0 | 0.22 | 1.00 | 0 | | CKT_2 | 18K | 53114 | 1.00 | 1.00 | 73 | 0.23 | 0.98 | 83 | 0.23 | 0.98 | 1 | 0.24 | 1.00 | 0 | | CKT_3 | 20K | 59389 | 1.00 | 1.00 | 116 | 0.28 | 0.99 | 140 | 0.28 | 0.99 | 0 | 0.28 | 0.99 | 0 | $\label{eq:table II} \text{Noise Reduction Results.} \ U(i) = 0.1 V_{dd}$ | Circuit | # of | # of | Unoptimized | | | Timing | | | Timing → Noise | | | Timing + Noise | | | |---------|-------|---------|-------------|------|------|--------|------|------|----------------|------|-----|----------------|------|-----| | | Nodes | C Edges | Area | Ta | NV | Area | Ta | NV | Area | Ta | NV | Area | Ta | NV | | C432 | 198 | 553 | 1.00 | 1.00 | 24 | 0.59 | 0.99 | 28 | 0.78 | 0.97 | 5 | 0.70 | 0.99 | 3 | | C499 | 245 | 621 | 1.00 | 1.00 | 32 | 0.67 | 1.00 | 35 | 0.84 | 0.96 | 12 | 0.75 | 1.00 | 9 | | C880 | 445 | 1240 | 1.00 | 1.00 | 37 | 0.75 | 0.99 | 52 | 0.87 | 0.99 | 15 | 0.80 | 0.99 | 10 | | C1355 | 589 | 1653 | 1.00 | 1.00 | 43 | 1.11 | 0.99 | 65 | 1.22 | 0.92 | 21 | 1.16 | 0.99 | 21 | | C1908 | 915 | 2655 | 1.00 | 1.00 | 88 | 0.66 | 0.95 | 111 | 0.93 | 0.94 | 27 | 0.64 | 0.96 | 25 | | C2670 | 1428 | 3851 | 1.00 | 1.00 | 146 | 0.58 | 1.00 | 172 | 0.74 | 1.00 | 40 | 0.62 | 1.00 | 30 | | C3540 | 1721 | 5086 | 1.00 | 1.00 | 166 | 0.78 | 0.96 | 220 | 0.88 | 0.93 | 61 | 0.62 | 0.98 | 39 | | C5315 | 2487 | 7076 | 1.00 | 1.00 | 264 | 0.50 | 1.00 | 297 | 0.66 | 0.99 | 50 | 0.54 | 1.00 | 38 | | C6288 | 2450 | 7239 | 1.00 | 1.00 | 220 | 0.86 | 0.99 | 282 | 0.99 | 0.90 | 80 | 0.83 | 0.94 | 62 | | C7552 | 3721 | 10823 | 1.00 | 1.00 | 360 | 0.49 | 1.00 | 414 | 0.65 | 0.99 | 72 | 0.56 | 1.00 | 57 | | CKT_1 | 15K | 42017 | 1.00 | 1.00 | 1136 | 0.32 | 1.00 | 1272 | 0.42 | 0.89 | 108 | 0.31 | 1.00 | 87 | | CKT_2 | 18K | 53114 | 1.00 | 1.00 | 1544 | 0.23 | 0.98 | 1573 | 0.34 | 0.92 | 100 | 0.32 | 0.98 | 94 | | CKT_3 | 20K | 59389 | 1.00 | 1.00 | 1783 | 0.28 | 0.99 | 1860 | 0.40 | 0.95 | 158 | 0.35 | 0.99 | 132 | lesser area as compared to the Timing o Noise stage. The upper noise bound U(i) on the nets was set to $0.2V_{dd}$ . Table II presents similar results for a more constrained noise upper bound $(U(i)=0.1V_{dd})$ . Experimental results prove the effectiveness of our algorithm over the approach of optimizing for area and then for crosstalk. The algorithm is shown to reduce more number of noise violations, which is significant in cases of constrained U(i) and its efficiency in optimizing area can also be observed. The algorithm can be used consistently for both continuous as well as discrete gate-sizing. The algorithm took 0.8 seconds for the smallest benchmark and 1.4 minutes for the largest benchmark. All simulations have been run on a Pentium 2.4GHz Xeon processor server, having 1GB RAM and running RedHat Linux 8.0. #### VI. CONCLUSION The paper presents a post-route, timing constrained gatesizing algorithm for crosstalk reduction based on Lagrangian Relaxation. Experimental results validate the effectiveness of the algorithm over an alternative design methodology of independent gate-sizing for timing and crosstalk successively. #### VII. ACKNOWLEDGEMENT This research was supported by the National Science Foundation under grant CCR-0238484. #### REFERENCES - S. I. Association, "National technology roadmap for semiconductors," 1999. - [2] T. Xiao and M. Marek-Sadowska, "Gate sizing to eliminate crosstalk induced timing violation," in *ICCD*, 2001. - [3] M. Hashimoto, M. Takahashi, and H. Onodera, "Crosstalk noise optimization by post-layout transistor sizing," in ISPD, 2002. - [4] M. R. Becer, D. Blauuw, I. Algor, R. Panda, C. Oh, V. Zolotov, and I. N. Hajj, "Post-route gate sizing for crosstalk noise reduction," in *DAC*, 2003, pp. 954–957. - [5] C. Chen, C. Chu, and D. F. Wong, "Fast and exact simultaneous gate and wire sizing by lagrangian relaxation," in *IEEE Transactions on Computer-Aided Design, July 1999*. - [6] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network flows: Theory, algorithms, and application. Prentice Hall, 1993. - [7] M. L. Fisher, "An applications oriented guide to lagrangian relaxation," in *Interfaces*, vol 15, March/April 1985, pp. 10–21. - [8] D. G. Luenberger, *Linear and nonlinear programming*. Addison Wesley Publishing Company, 1984. - [9] I. H. Jiang, Y. Vhang, and J. Jao, "Crosstalk-driven interconnect optimization by simultaneous gate and wire sizing," in *IEEE Transactions* on Computer-Aided Design, September 2000, 2000, pp. 999–1010. - [10] D. Sinha, H. Zhou, and C. Chu, "Optimal gate sizing for coupling noise reduction," in ISPD, 2004. - [11] F. Brglez and H. Fujiwara, "A neutral netlist of 10 combinatorial benchmark circuits," in ISCAS, 1985, pp. 695–698. - [12] J. Cong, D. Z. Pang, and P. V. Srinivas, "Improved crosstalk modeling for noise constrained interconnect optimization," in ACM Intl. Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, 2000.