# Study of Power-Delay Characteristics of a Mixed-Logic-Style Novel Adder Circuit at 90nm Gate Length 

Parameshwara M. C.<br>Dept. of E \& CE<br>Vemana Institute of Technology<br>Koramangala, Bangalore

Srinivasaiah H. C.<br>Dept. of TCE<br>Dayananda Sagar College of Engineering, Bangalore


#### Abstract

This paper discusses a rail to rail swing, mixed logic style 1bit 28-transistor (28T) full-adder, based on a novel architecture. The performance metrics: power, delay, and power delay product (PDP) of the proposed 1-bit adder is compared with other two high performance 1-bit adder architectures reported, till date. The proposed 1-bit adder has a $50 \%$ improvement in delay, and $49 \%$ improvement in power-delay-product, over the two reported architectures, verified at 90 nm technology. The power performance of proposed 1-bit adder and that of the two reported architecture are comparable, within $8 \%$. This analysis has been done at supply voltage $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$, load capacitance $\mathrm{C}_{\mathrm{L}}=150 \mathrm{fF}$, and at a maximum input signal frequency $\mathrm{f}_{\mathrm{MAX}}=200 \mathrm{MHz}$. Also, the worst case performance metrics of the proposed 1-bit adder circuit is seen to be less sensitive to variations in $\mathrm{V}_{\mathrm{DD}}$ and $\mathrm{C}_{\mathrm{L}}$, over a wide range from 0.6 V to 1.8 V , and from 0 fF to 200 fF , respectively.


## Keywords

Full-adder, Mixed logic style, 28T 1-bit adder, Power delay product, Worst-case delay, Worst case power, Carry dependant sum, Input vector transition, and Adder architecture.

## 1. INTRODUCTION

Modern embedded devices are characterized by the integration of variety of functionalities augmented by advancements in architecture, design, manufacturing, etc., technologies. Some of the leading edge functionality implementation includes Digital Signal Processing (DSP). The DSP functionalities are an integral part of real-time multimedia processors, high speed digital Transceivers, data acquisition systems, etc., of modern Internet technology. The most fundamental of all digital operations is the addition. Thus the performance of Arithmetic Logic Unit (ALU) plays major role in characterizing the performance of all the DSP systems. The design of an efficient full adder is the most basic need for high speed real time DSP at given process technology node. The focus of this paper is to develop an efficient (i.e. high speed, low power, small area, low cost, etc.) 1-bit adder circuit for integration in DSP requirements.

In the design of an efficient digital circuit, for e.g., an n-bit adder, the most important design metrics of concern are power, speed, size, and cost $[1,2]$. The design metrics compete with each other while optimizing; for e.g., reduced delay results in increased power dissipation due to either
increased leakage current (i.e. small $\mathrm{V}_{\mathrm{T}}$ ) or faster clocking The simultaneous optimization of power, speed, size, and cost design metrics needs proper knowledge about each of these design metrics in terms of relationship among each other. This relationship finds its root in process-voltage-temperature (PVT) space. Modelling these design metrics in terms of PVT parameters is a very complex problem of statistical design of experiment (DoE) and response surface modelling (RSM) techniques followed by optimization [3, 4]. This approach becomes very complex due to the large number of the PVT parameters significantly influencing each of the design metrics. To reduce this complexity, simple heuristics, involving design and optimization of sub-circuits, and then integrate each sub-circuit to obtain next level circuit modules, are followed. For e.g. design optimization of a 64-bit adder circuit involves design optimization of 1-bit adder sub-circuit.

We take the product of competing design metrics and minimize this product resulting in simultaneous optimization of each design metrics in the product; for e.g. "power delay" product (PDP) [5]. If the PDP is minimized, both the power and delay get minimized simultaneously.
In this paper we have achieved minimum PDP for the proposed 1-bit adder circuit through its architectural innovation. Accordingly, we designed and implemented a novel 1-bit, 28T 'carry dependent sum' adder circuit based on mixed logic style using 90 nm generic process design kits (PDKs). The simulation of this circuit is done using Cadences' Spectre simulator. This architecture is referred herein as 'alternative logic-3' (AL-3) as an alternative to 1-bit adder circuits, recently reported in [6, 7], called herein as 'alternative logic - 1' (AL-1) and 'alternative logic-2' (AL-2), discussed in subsequent sections. We simulated and compared the performance metrics, such as worst case delay, and worst case power, and worst case PDP for AL-1, AL-2, and AL-3 circuits. We also simulated these 3 circuits to determine their worst case performance metric as a function of $\mathrm{V}_{\mathrm{DD}}$ and $\mathrm{C}_{\mathrm{L}}$. In this comparison AL-3 has performed remarkably, with minimum worst case: delay, and PDP. Rest of this paper is organized as follows: Section 2 reviews fundamentals of existing 1-bit full adder circuits, which includes AL-1 and AL-2. Section 3 presents the proposed 1-bit AL-3 adder circuit, highlighting its salient features. Section 4 discusses methodology used in comparing AL-1, AL-2, and AL-3 adder circuits. Section 5 discusses all the performance metrics as a function of $V_{D D}$ and $C_{L}$. Finally conclusions are drawn in section 6.

## 2. CLASSIFICATION OF FULL ADDER CELL ARCHITECTURES

The full adder architectures have been broadly classified into three main categories viz. XOR-XOR based, XNOR-XNOR based, and XOR-XNOR based $[1,2]$ depending upon the circuit approach to realize the two outputs: sum- $\mathrm{S}_{\mathrm{i}}$, and carry$\mathrm{C}_{\mathrm{i}+1}$. The mixed logic style architectures $\mathrm{AL}-1$ and $\mathrm{AL}-2$ reported in $[6,7]$, and AL-3 proposed in this paper, together classified as another ( $4^{\text {th }}$ ) category, herein. The AL-1 and AL2 architectures have been realized based on double pass logic (DPL) and CMOS transmission gate logic, resulting in mixed logic style implementation, whose block diagram and circuits are given in Fig. 1 and discussed in subsequent subsection. A variant of AL-1 and AL-2 architectures called AL-3 is proposed in this paper and discussed in section 3. The performances of all the 3 architectures are compared with respect to the delay, power, and PDP performance metrics and discussed later section 4 and 5.

### 2.1 XOR-XOR, XNOR-XNOR and XORXNOR based full adder Architecture

In the general architecture with XOR-XOR based full adder, $i^{\text {th }}$ adder output bits $\mathrm{S}_{\mathrm{i}}$ and $\mathrm{C}_{\mathrm{i}+1}$ are given by the following equations:

$$
\begin{align*}
& S_{i}=H \oplus C_{i}=H^{\prime} \cdot C_{i}+H \cdot C_{i}^{\prime}  \tag{1A}\\
& C_{\mathrm{i}+1}=A_{i} \cdot H^{\prime}+C_{i} \cdot H \tag{1B}
\end{align*}
$$

where $H$ represents XOR of $A$ and $B$, and $H^{\prime}$ 'is inversion of $H$.
Whereas the output bits of XNOR-XNOR based full adder is given by the following equations:

$$
\begin{align*}
& S_{\mathrm{i}}=\overline{\overline{A_{i} \oplus B_{i}} \oplus C_{i}}=\overline{H^{\prime} \oplus C_{i}}=\overline{H^{\prime} C_{i}^{\prime}+H \cdot C_{i}}  \tag{2~A}\\
& C_{\mathrm{i}+1}=A_{i} \cdot H+C_{i} \cdot H^{\prime} \tag{2B}
\end{align*}
$$

Equation 1(B) and 2(B) shows $\mathrm{C}_{\mathrm{i}+1}$ is a multiplexing of Ai and Ci inputs, with H as the select signal.
In XOR-XNOR based full adder the output bits are expressed by the following equations:

$$
\begin{align*}
& S_{i}^{\prime}=\overline{A_{i} \oplus B_{i} \oplus C_{i}}=\overline{H \oplus C_{i}}=\overline{H^{\prime} \cdot C_{i}+H \cdot C_{i}^{\prime}}  \tag{3A}\\
& C_{i+1}=A_{i} \cdot H^{\prime}+C_{i} . H \tag{3B}
\end{align*}
$$

Equation 3(A) shows $S_{i}^{\prime}$ is a multiplexing of $H$ and $H^{\prime}$ with Ci as the select signal. Equation $3(\mathrm{~B})$ shows $\mathrm{C}_{\mathrm{i}+1}$ is a multiplexing of $A_{i}$ and $C i$ with $H$ as the select signal.

### 2.2 Alternative Logic AL-1 and AL-2 based Full Adder Architectures.


(a)


Fig. 1: Alternative Logic of 1-bit adder in [6, 7], (a) and (b) General block diagram form of AL-1 and AL-2 respectively, (c) and (d) are circuit representations of AL1 and AL-2, respectively.

The block diagrams of Fig. 1(a) and (b) shows the general architecture of 1-bit adders: AL-1 and AL-2; and Fig. 1 (c) and (d) show the circuit implementation of AL-1 and AL-2 architectures using the DPL logic, and they are 28T and 26T implementations, respectively. In AL-1 the XOR, XNOR, AND, and OR gates were implemented independently. In AL2 the XOR and XNOR are implemented together, which saves 2 transistors, whereas the AND and OR gates implemented independently, as shown in the Fig. 1(b), whereas the gates XOR and XNOR are integrated together. In Fig. 1, for better correlation between the block diagram and circuits, we have circled and labelled the corresponding sub-blocks.

In the Fig. 1, the output bits are expressed by the following equations.
$S_{\mathrm{i}}=H \oplus C_{i}=H . C_{i}^{\prime}+H^{\prime} . C_{i}$
$C_{\mathrm{i}+1}=\left(A_{i} \cdot B_{i}\right) \cdot C_{i}^{\prime}+\left(A_{i}+B_{i}\right) \cdot C_{i}$
In equation 4(A), the signals $H$ and $H^{\prime}$ are multiplexed by the select signal $C_{i}$; in equation 4(B), the AND and OR of $A_{i}$ and $B_{i}$ inputs are multiplexed, with $C_{i}$ select signal. In the circuits of Fig. 1, the numbers next to transistor indicates the $W / L_{g}$ (with gate length $L_{g}=2 \lambda=90 \mathrm{~nm}$ ) ratios. The $W / L_{g}$ ratios of PMOS transistors are considered twice that of NMOS transistors to compensate for their carrier mobility. Transistor sizing is discussed in section 4.

## 3. PROPOSED 1-BIT FULL ADDER AL-3 ARCHITECTURE AND ITS MIXED LOGIC STYLE IMPLEMENTATION

The proposed adder architectures are based on the truth table shown in Table 1; examining the truth table it can be observed that carry out $\left(C_{i+1}\right)$ is equal to $\left(A_{i} \cdot B_{i}\right)$ value when carry in $\left(C_{i}\right)$ equal to ' 0 ' and $\left(A_{i}+B_{i}\right)$ when carry in $\left(C_{i}\right)$ is equal to ' 1 '. Thus carry out ( $C_{i+1}$ ) can be generated by multiplexing Boolean functions $\left(A_{i} \cdot B_{i}\right)$ and $\left(A_{i}+B_{i}\right)[6,7]$. In addition to evaluating $C_{i+1}$ in this approach, we propose the generation of sum ( $S_{i}$ ) also by multiplexing $A_{i} \cdot B_{i} \cdot C_{i}$ and $A_{i}+B_{i}+C_{i}$ using $C_{i+1}$ as the select signal, i.e., sum $S_{i}$ is equal to $A_{i} . B_{i} C_{i}$ value when carry output $C_{i+1}$ is equal to ' 1 ', and to $A_{i}+B_{i}+C_{i}$ value when $C_{i+1}$ is equal to ' 0 '. This approach leads to carry $\left(C_{i+1}\right)$ dependant sum ( $S_{i}$ ) AL-3 architecture. Thus the Boolean expressions for the sum and carry output bits are expressed by the following logic expressions;
$S_{i}=\left(A_{i} \cdot B_{i} \cdot C_{i}\right) \cdot C_{i+1}+\left(A_{i}+B_{i}+C_{i}\right) \cdot C_{i+1}^{\prime}$
5(A)
$C_{\mathrm{i}+1}=\left(A_{i} \cdot B_{i}\right) \cdot C_{i}^{\prime}+\left(A_{i}+B_{i}\right) \cdot C_{i}$

Table 1. Truth Table for the implementation of 1-bit AL-3 Full Adder

| Inputs |  |  | Outputs |  |
| :---: | :---: | :---: | :---: | :---: |
| $\mathbf{A i}$ | $\mathbf{B i}$ | $\mathbf{C i}$ | $\mathbf{S i}$ | $\mathbf{C}_{\mathbf{i}+\mathbf{1}}$ |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 |


(a)

(b)

Fig 2: The proposed alternative logic-3(AL-3) (a) Block diagram, and (b) Mixed logic style circuit implementation
Fig. 2(a) is the block schematic derived using the Equation 5(A) and 5(B); and the Fig. 2(b) is the circuit schematic of Fig. 2(a), implemented using mixed logic style. In the block schematic Fig. 2(a) the NAND-NOR gates are implemented using static CMOS logic and multiplexers are implemented using CMOS transmission gate logic hence we call Mixed Logic style. For better correlation with Fig. 2(a), and Fig. 2(b), the transistors in Fig. 2(b) forming the corresponding gates of Fig. 2(a) are circled together. This circuit of Fig. 2(b) involves 28 transistors (28T), same as that of AL-1 implementation (Fig. 1(c)); but AL-3 circuit proves to be better in terms of power, delay and PDP parameters, over both AL-1 (28T) and AL-2 (26T), presented and discussed in section 5.

## 4. SIMULATION ENVIRONMENT

### 4.1 Simulation Setup

To generate standard test signals at each input of circuit under test (CUT), a chain of buffers are used. The transistors of these buffers are sized such that their output resembles standard CMOS signal [1, 2]. For functionality verification, and measurement of worst case delay and worst case power, we have used the standard input vectors as suggested in [8, 9], and discussed in section 5.

The AL-1, AL-2 and AL-3 are the CUTs, simulated under identical PVT conditions to compare their power, delay, and PDP performance metrics. The performance metrics for the 3 circuits are compared through parameterization of the $V_{D D}$ and $\mathrm{C}_{\mathrm{L}}$, discussed in next section.

### 4.2 Transistor Sizing

The $W / L_{g}$ ratios, of the N/PMOS transistors are indicated next to each transistor, in Fig. 1(c), 1(d), and Fig. 2(b). We have adopted the transistor sizing methodology as suggested in [6, 8, 9]. The steps of this methodology are given as follows;
a) Set all the NMOS transistors to the minimum size. If there were $n$ transistors connected in series, then size (W) of each transistor within the chain to $n$ times the original NMOS transistor size.
b) Set all the $P M O S$ transistors to double the minimum size (to compensate for the mobility difference between NMOS and PMOS transistors). If there are $p$ transistors connected in series, then size each of these transistor in the chain to $p$ times the original PMOS transistor size.
c) Simulate the circuit with an input pattern to cover all input combinations, as discussed later.
d) Figure out the switching transition in $\mathrm{S}_{\mathrm{i}}$ and $\mathrm{C}_{\mathrm{i}+1}$ output variables with the highest propagation delay, and resize the transistor (widths) in this critical path.
Repeat the steps (c) and (d) until no longer improvement is attained for the propagation delay.

## 5. SIMULATION RESULTS AND DISCUSSION

In this section, simulation result of 1-bit full adder circuits is presented under the common PVT conditions. All the 3, 1-bit adder circuits: AL-1, AL-2, and AL-3 are simulated with Cadences' Spectre using generic 90 nm PDKs to determine their worst-case delay, worst-case average power, and worstcase PDP. Study is performed in 2 steps. In the first step, the study of the performance metrics are compared for the 3 adder circuits as a function of supply voltage $\mathrm{V}_{\mathrm{DD}}$ variation from 0.6 V to 1.8 V , at 20 fF load capacitance $\mathrm{C}_{\mathrm{L}}$, and a maximum input speed of $200 \mathrm{MHz}[6,7]$. Frequency of the input signals is the reciprocal of smallest pulse width $(5 \mathrm{~ns})=200 \mathrm{MHz}$, is common to all 3 full adder circuits. The actual maximum operating frequencies of AL-1, AL-2, and AL-3 circuits are well beyond 1 GHz , which is verified through simulations. In the second step the performance metrics are again studied as a function of the load capacitance $\mathrm{C}_{\mathrm{L}}$, varied from 0 fF to 200 fF , at $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$, and maximum input speed at 200 MHz . The (propagation) delay $t_{p d}$ is calculated as the time since a $50 \%$ change in input signal while transitioning either from a 0 to 1 or from a 1 to 0 logic levels, till a corresponding $50 \%$ change in output signals, again either from 0 to 1 or 1 to 0 logic levels.

For a 1-bit adder with 3 inputs $A_{i}, B_{i}$, and $C_{i}$, there are $2^{3}=8$ possible input vectors. For an exhaustive delay analysis for $S_{i}$ or $C_{i+1}$, we need to consider all possible input vector transitions. There are $2^{\mathrm{k}} \times 2^{\mathrm{k}}-1=56$ numbers of input vector transitions for $\mathrm{k}=3$. All the 56 input vector transitions are defined as standard input test patterns [8, 9], to determine the worst case delay in $\mathrm{S}_{\mathrm{i}}$ or $\mathrm{C}_{\mathrm{i}+1}$ as shown in waveforms of Fig. 3. For a better visualization a transition matrix which records the delay in $S_{i}$ and $C_{i+l}$ is prepared as shown in Fig. 4(a)-(c) for the 3 circuits: AL-1, AL-2, and AL-3, respectively. Each delay matrix consists a total of 64 cells; the 8 cells of which along diagonal corresponds to the transitions within the same input vector states, i.e., $000 \rightarrow 000,001 \rightarrow 001, \cdots 111 \rightarrow$ 111 , are insignificant. Further, for 24 input vector transitions, there will be no corresponding transitions in the outputs, labelled 'Not Applicable' (NA) in Fig. 4(a)-(c). Each of the remaining $64-8=56$ non-diagonal cells are partitioned into two sub-cells viz., sub-cell $S_{i}$ (first, in a cell), and sub-cell $C_{i+1}$ (second, in a cell). Further, 56-24=32-delays in $S_{i}$ and,
corresponding 32 delays in $C_{i+1}$ have been simulated, measured, and tabulated in the respective sub-cells, in a cell in Fig. 4. The worst case delay during any of the 56 'input vector' transitions on $\mathrm{A}_{\mathrm{i}}, \mathrm{B}_{\mathrm{i}}, \mathrm{C}_{\mathrm{i}}$ inputs is the $\operatorname{MAX}\left(S_{i}, C_{i+1}\right)$ propagation delays (Fig.4).

The worst-case delay of AL-1 circuit occurs for the carry $\mathrm{C}_{\mathrm{i}+1}$ output for the input vector transition ' 010 ' to ' 101 ' and the corresponding delay is 283.9 ps (Fig. 4(a)). The worst-case delay value of 309.2 ps in $S_{i}$ output occurs for AL-2 circuit when the input vector transitions from ' 100 ' to '011' (Fig. 4(b)); whereas the worst-case delay of AL-3 is 245.2 ps in $S_{i}$ output signal for ' 110 ' to '100' transition (Fig. 4(c)). In Fig. 4, $\operatorname{MAX}\left(S_{i}, C_{i+1}\right)$ delay is considered as worst case delay with corresponding value in the sub-cell, highlighted. In this paper, the worst case power is determined as the average power dissipated, over 9 input frequency patterns (Table 2), applied to the inputs $\mathrm{Ai}, \mathrm{Bi}$, and Ci resulting in a valid logic levels at sum $\mathrm{S}_{\mathrm{i}}$ and carry $\mathrm{C}_{\mathrm{i}+1}[8,9]$.


Fig. 3: Waveforms corresponding to 56 input vector transitions at the inputs $\mathrm{Ai}, \mathrm{Bi}$, and $\mathrm{C}_{\mathrm{i}}$ and at the outputs of Si and $\mathrm{C}_{\mathrm{i}+1}$ for the 3, 1-bit adder circuits AL-1, AL-2, and AL-3.

The frequencies for the inputs $\mathrm{A}_{\mathrm{i}}, \mathrm{B}_{\mathrm{i}}$, and $\mathrm{C}_{\mathrm{i}}$, are labelled as $f_{\mathrm{Ai}}, \mathrm{f}_{\mathrm{Bi}}$, and $\mathrm{f}_{\mathrm{Ci}}$. The first 6 frequency patterns are equivalent to applying 56 different input vector transitions of Fig. 4 for the input vectors $A_{i} B_{i} C_{i}$. Average power $P_{\text {avg }}$, is the sum of 3 components, given as;

| AB.C. | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 000 |  | 213 | 218.4 | NA | 211.4 | NA | NA | 199.2 |
|  |  | NA | NA | 202.1 | NA | 234 | 216.2 | 219.9 |
| 001 | 206.5 |  | NA | 211.6 | NA | 259.1 | 260.9 | NA |
|  | NA |  | NA | 200 | NA | 236.2 | 278.5 | 221.8 |
| 010 | 217 | NA |  | 199.9 | NA | 270.7 | 247.1 | NA |
|  | NA | NA |  | 192.7 | NA | 283.9 | 223.9 | 201.7 |
| 011 | NA | 237.2 | 2193 |  | 255.5 | NA | NA | 207.9 |
|  | 198.3 | 218 | 202.9 |  | 203 | NA | NA | NA |
| 100 | 214.8 | NA | NA | 232.5 |  | 231.6 | 265.8 | NA |
|  | NA | NA | NA | 251.1 |  | 216.3 | 224.3 | 213.6 |
| 101 | NA | 230.7 | 263.5 | NA | 212.2 |  | NA | 211.2 |
|  | 201.4 | 214.9 | 203.2 | NA | 214.9 |  | NA | NA |
| 110 | NA | 262.9 | 236.8 | NA | 221.2 | NA |  | 202.2 |
|  | 211.3 | 216.4 | 208.3 | NA | 231.2 | NA |  | NA |
| 111 | 206.6 | NA | NA | 215.9 | NA | 245.3 | 237.7 |  |
|  | 211 | 218.7 | 204.9 | NA | 227.6 | NA | NA |  |

(a)

| AB.C: | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 000 |  | 153.5 | 251.3 | NA | 203.3 | NA | NA | 150.9 |
|  |  | NA | NA | 205.1 | NA | 234.8 | 214 | 218.6 |
| 001 | 208.9 |  | NA | 299.5 | NA | 246.4 | 255.3 | NA |
|  | NA |  | NA | 214.1 | NA | 239.8 | 262.8 | 218.6 |
| 010 | 257.2 | NA |  | 232.3 | NA | 260.4 | 299.5 | NA |
|  | NA | NA |  | 206.1 | NA | 286.1 | 207.7 | 197.9 |
| 011 | NA | 226.3 | 158.3 |  | 227.2 | NA | NA | 243.7 |
|  | 198.4 | 212.5 | 196.6 |  | 209.3 | NA | NA | NA |
| 100 | 256.5 | NA | NA | 309.2 |  | 206.6 | 287.8 | NA |
|  | NA | NA | NA | 264.5 |  | 221.3 | 207.9 | 205.9 |
| 101 | NA | 230 | 262.9 | NA | 159.5 |  | NA | 235.4 |
|  | 198.8 | 209.7 | 260.6 | NA | 210.1 |  | NA | NA |
| 110 | NA | 203.3 | 257.7 | NA | 235.9 | NA |  | 153.7 |
|  | 208.4 | 213.7 | 208.7 | NA | 237.9 | NA |  | NA |
| 111 | 214.1 | NA | NA | 296.7 | NA | 256.7 | 238.8 |  |
|  | 208.6 | 216.6 | 204 | NA | 227.4 | NA | NA |  |

(b)

| ABC. | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 000 |  | 155.4 | 140.2 | NA | 136.6 | NA | NA | 183.5 |
|  |  | NA | NA | 115 | NA | 110.7 | 118.6 | 101.3 |
| 001 | 204.9 |  | NA | 180.8 | NA | 176.6 | 235.6 | NA |
|  | NA |  | NA | 111.8 | NA | 107.8 | 170.2 | 98.67 |
| 010 | 201.6 | NA |  | 163.7 | NA | 222.2 | 181.1 | NA |
|  | NA | NA |  | 95.31 | NA | 157.8 | 112.2 | 91.7 |
| 011 | NA | 200.7 | 205 |  | 205,1 | NA | NA | 193 |
|  | 123.1 | 128.9 | 127.9 |  | 125.48 | NA | NA | NA |
| 100 | 196.3 | NA | NA | 225.8 |  | 163.7 | 184.1 | NA |
|  | NA | NA | NA | 163.2 |  | 95.31 | 115.2 | 91.97 |
| 101 | NA | 195.1 | 206.1 | NA | 205 |  | NA | 195.3 |
|  | 123.5 | 123.3 | 125.1 | NA | 127.8 |  | NA | NA |
| 110 | NA | 216.6 | 231.6 | NA | 245.2 | NA |  | 210 |
|  | 125.2 | 133.2 | 152.8 | NA | 164.7 | NA |  | NA |
| 111 | 135.6 | NA | NA | 164.1 | NA | 177.5 | 162.9 |  |
|  | 137.8 | 130.9 | 156.6 | NA | 167.9 | NA | NA |  |

(c)

Fig. 4: Simulated delay matrices for Si (first sub-cell in a cell) and $\mathrm{C}_{\mathrm{i}+1}$ (second sub-cell in cell) for (a) AL-1, (b) AL2, and (c) AL-3, circuits.

$$
\begin{equation*}
\mathrm{P}_{\mathrm{avg}}=\mathrm{P}_{\text {dynamic }}+\mathrm{P}_{\text {static }}+\mathrm{P}_{\mathrm{sc}} \tag{6}
\end{equation*}
$$

where $\mathrm{P}_{\text {dynamic }}$ is the average dynamic power loss, $\mathrm{P}_{\text {static }}$ is the average static power loss, and $\mathrm{P}_{\mathrm{sc}}$ is the average short circuit power lost in each of the 3 circuits: AL-1, AL-2, and AL-3. The static power is the power dissipated due to steady state leakage currents, while the dynamic power is power loss due to switching of the node capacitances over all the nodes in the circuit, and the short circuit power is the power lost over the entire circuit due to simultaneous conduction of series NMOS and PMOS transistors connected between the power rails.

In equation 6 , the $\mathrm{P}_{\text {avg }}$ is computed, over 9 frequency pattern assignments at the inputs $\mathrm{A}_{\mathrm{i}}, \mathrm{B}_{\mathrm{i}}$, and $\mathrm{C}_{\mathrm{i}}$ (Table 2). The 6 out of 9 frequency patterns are combinations of 3 frequencies, viz., $\mathrm{f}_{\mathrm{H}}=200 \mathrm{MHz}, \mathrm{f}_{\mathrm{M}}=\mathrm{f}_{\mathrm{H} / 2}=100 \mathrm{MHz}$, and $\mathrm{f}_{\mathrm{L}}=\mathrm{f}_{\mathrm{H} / 4}=50 \mathrm{MHz}$, taken in 3! (3 factorial) ways and constitute the first 6 rows in the Table 2. This power analysis is done with $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$, and $\mathrm{C}_{\mathrm{L}}=20 \mathrm{fF}$. In the last 3 rows of this table, $\mathrm{f}_{\mathrm{MD}}$ assignment to $\mathrm{B}_{\mathrm{i}}$ is the $f_{M}$ which is delayed by $50 \%$ of its pulse width. The frequency patterns in the last 3 rows will simulate the worstcase power loss due to glitches. This worst-case glitch is evident in the waveform of Fig. 5. The waveform of Fig. 5 conveys all the 9 frequency patterns in Table 2, graphically.

The 3 frequencies $f_{A i}, f_{B i}$, and $f_{C i}$ (i.e., a row in Table 2) corresponding to each sub-pattern are labelled in this figure.

To determine the worst case PDP, we take the product of worst case power and worst case delay (Fig. 4), both determined as said above. Historically, the PDP is considered to be a suitable performance metric for simultaneous optimization of power and delay [5]. The power dissipation is mainly due to switching of the nodes capacitances in the circuits in long channel transistors, where the leakage power dissipation was relatively insignificant. However in sub-90nm gate length, the total leakage power dissipation due to all the leakage mechanism, viz., subthreshold, junction, and carrier tunnelling through oxide, becomes comparable with dynamic power dissipation.

In this paper it is the worst case delay in the outputs, $S_{i}$ or $C_{i+1}$, and the worst case average power are considered to determine the worst case PDP. The worst case PDP provides a trade off between worst case power and worst case delay performance metrics. Traditionally, minimum PDP implies prolonged battery life, a desirable feature for portable application.

Table 2. Power Measurement Input Sub-Patterns

| $\#$ | Input Sub-pattern |  |  |
| :---: | :---: | :---: | :---: |
|  | $\mathrm{f}_{\mathrm{Ai}}$ | $\mathrm{f}_{\mathrm{Bi}}$ | $\mathrm{f}_{\mathrm{Ci}}$ |
| 1 | $\mathrm{f}_{\mathrm{H}}$ | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{L}}$ |
| 2 | $\mathrm{f}_{\mathrm{H}}$ | $\mathrm{f}_{\mathrm{L}}$ | $\mathrm{f}_{\mathrm{M}}$ |
| 3 | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{L}}$ | $\mathrm{f}_{\mathrm{H}}$ |
| 4 | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{H}}$ | $\mathrm{f}_{\mathrm{L}}$ |
| 5 | $\mathrm{f}_{\mathrm{L}}$ | $\mathrm{f}_{\mathrm{H}}$ | $\mathrm{f}_{\mathrm{M}}$ |
| 6 | $\mathrm{f}_{\mathrm{L}}$ | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{H}}$ |
| 7 | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{MD}}$ | $\mathrm{f}_{\mathrm{H}}$ |
| 8 | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{MD}}$ | $\mathrm{f}_{\mathrm{M}}$ |
| 9 | $\mathrm{f}_{\mathrm{M}}$ | $\mathrm{f}_{\mathrm{MD}}$ | $\mathrm{f}_{\mathrm{L}}$ |



Fig. 5: The waveforms of the 9 possible frequency patterns (Table 3) at the inputs $\mathrm{Ai}, \mathrm{Bi}$, and $\mathrm{C}_{\mathrm{i}}$ and their corresponding outputs at $S_{i}$ and $C_{i+1}$ for AL-1, AL-2, and AL-3 adder circuits.

### 5.1 Worst case delay as a function of supply voltage and load capacitance

Fig. 6(a) shows the worst-case delay characteristics of three 1bit adder circuits as a function of supply voltage $V_{D D}$ varied
from 0.6 V to 1.8 V . It was noticed that the proposed 1 -bit adder AL-3 is having less delay compared to AL-1 and AL-2 circuit when $\mathrm{V}_{\mathrm{DD}}$ is increased beyond $\sim 0.7 \mathrm{~V}$. This improvement in delay for AL-3 is attributed to relatively smaller intermediate node and output node parasitic capacitances (Fig. 1 and Fig. 2). For $V_{D D}$ below 0.6 V , the functionality of the 3 adders becomes indeterminate.

For the proposed AL-3 circuit, $S_{i}$ is critical output, since it's evaluated using its $C_{i+1}$ (Fig. 2). Due to buffer action by CMOS NAND1 followed by CMOS NOR2, yielding minterm $\mathrm{A}_{\mathrm{i}} \mathrm{B}_{\mathrm{i}} \mathrm{C}_{\mathrm{i}}$ at one input of MUX2, and CMOS NOR1 followed by CMOS NAND2 in the other input of MUX2, yielding maxterm $A_{i}+B_{i}+C_{i}$, the speed of the AL-3 is better than AL-1 and AL-2 circuits. The minterm and maxterm are selected by MUX2 with $\mathrm{C}_{\mathrm{i}+1}$ select signal. The expression for $S_{i}$ is given by Equation 5(A) as discussed earlier, which simplifies to $\mathrm{S}_{\mathrm{i}}=\mathrm{A}_{\mathrm{i}} \oplus \mathrm{B}_{\mathrm{i}} \oplus \mathrm{C}_{\mathrm{i}}$, realizing the sum operation. The worst-case delay for $S_{i}$ is observed during 110 to 100 transitions (highlighted in Fig. 4(c)) for this circuit. The worst-case delay for AL-1 and AL-2 remains higher in comparison with that of AL-3, particularly at higher $\mathrm{V}_{\mathrm{DD}}$, above 0.7 V .


Fig. 6: Plots of delay as a function of (a) supply voltage $V_{D D}$, and (b) load capacitance, $C_{L}$

Fig. 6(b) indicates the worst-case delay characteristics as a function of load capacitance $\mathrm{C}_{\mathrm{L}}$, varied from 0 fF to 200 fF . The CMOS NAND1 and NOR2 gates produce the minterm, $A_{i} B_{i} C_{i}$, while CMOS NOR1 and NAND2 gates generate the maxterm, $A_{i}+B_{i}+C_{i}$. Due to buffering action by these CMOS gates, the output delay for AL-3 is small, compared to AL-1 and AL-2 circuits, over the entire range of $\mathrm{C}_{\mathrm{L}}$. This implies higher fan-out capacity for AL-3 in comparison with AL-1 and AL-2 circuits. The improvement in the delay for AL-3 circuit is pronounced over other 2 circuits for $C_{L}$ value greater than 10 fF .

### 5.2 Worst-case power as a function of supply voltage and load capacitance

The power analysis is done using the technique discussed earlier to estimate the worst-case power. Accordingly, the speeds of the input signals $A_{i}, B_{i}$, and $C_{i}$ are chosen as a sequence of frequency sub-patterns, i.e. the rows in the Table 2. The power analysis is done at maximum speed of 200 MHz at the adder input with $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$ and $\mathrm{C}_{\mathrm{L}}$ at 20fF. The 9 frequency patterns applied at the 3 inputs $\mathrm{Ai}, \mathrm{Bi}$, and Ci will yield average worst-case power dissipated.

Fig. 7(a) shows the simulated average power over 9 frequency sub-patterns as a function of supply voltage $\mathrm{V}_{\mathrm{DD}}$. The difference in the power dissipated by AL-3 and that of AL-1 and AL-2 circuits is not significant. The marginal difference in power in AL-3 is attributed to presence of CMOS, NAND1-NOR2 and NOR1-NAND2 gates.

In figure 7(b), we are plotting average worst case power as a function of load capacitance $\mathrm{C}_{\mathrm{L}}$ for AL-1, AL-2, and AL-3 1bit adders. The $\mathrm{C}_{\mathrm{L}}$ is varied from 0 fF to 200 fF , while the $\mathrm{V}_{\mathrm{DD}}=1.2 \mathrm{~V}$, and maximum frequency, $\mathrm{f}_{\mathrm{MAX}}=200 \mathrm{MHz}$. Again, over the entire range of $\mathrm{C}_{\mathrm{L}}$, the difference in power dissipation in AL-1, AL-2, and AL-3 circuits is not significant. This means that the AL-3 circuit has comparable power dissipation with AL-1 and AL-2 circuits.


Fig. 7: Power dissipation as a function of (a) supply voltage $V_{D D}$, (b) load capacitance $C_{L}$.

### 5.3 Worst-case PDP as a function of supply voltage and load capacitance


(a)

(b)

Fig. 8: The PDP as a function of (a) supply voltage $V_{D D}$, (b) load capacitance $\mathrm{C}_{\mathrm{L}}$

The worst case PDPs of AL-1, AL-2, and AL-3 1-bit adder circuits are studied as a function of supply voltage $\mathrm{V}_{\mathrm{DD}}$ and the load capacitance $\mathrm{C}_{\mathrm{L}}$. In this analysis, $\mathrm{V}_{\mathrm{DD}}$ is varied from 0.6 V to 1.8 V , and the $\mathrm{C}_{\mathrm{L}}$ from 0 fF to 200 fF .

Fig. 8(a) shows the worst case PDP as a function of supply voltage $\mathrm{V}_{\mathrm{DD}}$ for the 3 adder circuits: AL-1, AL-2, and AL-3. The AL-3 adder is having better worst case PDP, among all the 3 circuits, when $V_{D D}$ exceeds 0.8 V . Fig. 8(b) shows the variation of worst case PDP as a function $C_{L}$, which is varied from 0 fF to 200 fF . Initially, from 0 fF to 20 fF , the worst case PDP dependence on $C_{L}$ is comparable for all the 3 adders. When $\mathrm{C}_{\mathrm{L}}$ exceeds 20 fF , the AL-3 circuit is having minimum PDP variation among the 3 circuits, which implies AL-3 circuit to be better for portable applications.

Table 3 shows percentage ( $\Delta$ ) improvement in power, delay and PDP for AL-3 1-bit adder circuit with respect to AL-1 and AL-2 architectures. The minus sign in the percentage change in power indicates, increase in power dissipation; but this increase in not significant, compared to improvement in delay and PDP performance parameters.
Table 2. Percentage Improvement in Worst-Case: Power, Delay, and PDP for AL-3 with respect to (w.r.t) AL-1 and AL-2, 1-bit Adder Circuits, at Vdd=1.2 V, $C_{L}=150 f f$, and $\mathbf{f m a x}^{\mathbf{2}} \mathbf{2 0 0} \mathrm{MHz}$.

| Improvement in <br> Performance Metric of: | $\Delta$ Power <br> $(\%)$ | (Delay <br> $(\%)$ | पPDP <br> $(\%)$ |
| :---: | :---: | :---: | :---: |
| AL-3 w.r.t AL-1 | -8.1 | 45.1 | 40.6 |
| AL-3 w.r.t AL-2 | -2.72 | 50.1 | 48.7 |

## 6. CONCLUSION

In this paper, a novel 1 -bit 28 T adder circuit designated 'alternative logic-3' (AL-3) is proposed, and analyzed to determine its worst case: delay, power, and PDP performance metrics in comparison with two high performance adder circuits designated as AL-1 and AL-2, reported in [6, 7]. There is a significant improvement of, $\sim 45 \%$ in worst case delay parameter and $\sim 41 \%$ in worst case PDP parameter for AL-3 over AL-1 circuit; and $\sim 50 \%$ in worst case delay parameter and $\sim 49 \%$ in worst case PDP parameter for AL-3 over AL-2 circuit. The analysis and comparison among 3 adder circuits, under identical PVT conditions, is done in two
steps. In the first step, the worst case: delay, power, and PDP are studied as a function of supply voltage $\mathrm{V}_{\mathrm{DD}}$, which is varied from 0.6 V to 1.8 V ; over the entire range of $\mathrm{V}_{\mathrm{DD}}$, the performance of AL-3 circuit has minimum delay, comparable power, and minimum PDP, over other 2 architectures: AL-1 and AL-2. In the second step the worst case: delay, power, and PDP are studied as a function of load capacitance $C_{L}$ which is varied from 0 fF to 200 fF ; over the entire range of $\mathrm{C}_{\mathrm{L}}$, the performance of AL-3 circuit has minimum delay, comparable power, and minimum PDD, over other 2 architectures: AL-1 and AL-2. Thus the AL-3, 1-bit adder circuit is a suitable choice for portable applications.

## 7. REFERENCES

[1] S. Goel, A. Kumar, and M. A. Bayoumi, "Design of robust, energy-efficient full adders for deepsubmicrometer design using hybrid- CMOS logic style", IEEE Transaction on Very Large Scale Integration (VLSI) System, Vol. 14, no. 12, pp. 1309-1320, Dec. 2006.
[2] S. Goel, S. Gollamudi, A. Kumar, and M. A. Bayoumi, "On the design of low energy hybrid CMOS 1-bit full adder cells", in Proc. Midwest Symposium of Circuits and System, pp. 209-212, 2004.
[3] Myers RH, Montgomery DC, Response surface methodology: process and product optimization using designed experiments, $2^{\text {nd }}$ ed. NY: John Wiley \& Sons Inc.; 2002.
[4] Box GEP, Draper NR, "Empirical model-building and response surfaces", International edition, New York: John Wiley and Sons Inc.; 1987.
[5] Dipanjan Sengupta, and Resve Saleh, "Generalized Power Delay Metric-in Deep Submicron CMOS Design", IEEE Transaction on CAD of ICs and Systems, Vol. 6, No. 1, pp. 183-189, Jan. 2007.
[6] Mariano Aguirre-Hernandez and Monico LinaresAranda, "CMOS Full-Adders for Energy-Efficient Arithmetic Applications", IEEE Transaction on Very Large Scale Integration (VLSI) System, Vol. 19, no. 4, pp. 718-721, April. 2011.
[7] Mariano Aguirre-Hernandez and Monico LinaresAranda, "An alternative logic approach to implement high-speed low power full adder cells", in Proc. SBCCI, Florianopolis, Brazil, Sep. 2005, pp. 166-171.
[8] A. M. Shams and M. Bayoumi, "Performance evaluation of 1 bit CMOS adder cells", in Proc. IEEE ISCAS, Orlando, FL, May 1999, Vol. 1, pp. 27-30.
[9] A. M. Shams, T. K. Darwish, and M. Bayoumi, "Performance analysis of low-power 1-bit CMOS full adder cells", IEEE Transaction on Very Large Scale Integration (VLSI) System, Vol. 10, No. 1, pp.20-29, Feb. 2002

