# Analysis of Low Power CMOS Current Comparison Domino Logic Circuits in Ultra Deep Submicron Technologies

Gurjeet Kaur ACS-Division C-DAC Mohali

# ABSTRACT

Performance of high fan-in Domino circuits is degraded by technology scaling due to exponential increase in leakage. To improve the performance Current Comparison Domino (CCD) circuits are widely used. This work presents design of wide fan-in high performance current comparison domino circuits with goals of minimizing the power dissipation and propagation delay at 90nm and 45nm ultra deep submicron (UDSM) technology nodes. A Current Comparison Domino (CCD) 32-input wide footless OR gate circuit is employed for design and analysis work. Cadence GPDK 90nm & 45nm model parameters are used in this research work. Cadence Virtuoso schematic editor is used to draw the schematic and Spectre circuit simulator is used for simulation work. Layouts are generated in Virtuoso Layout editor. Pre-layout and postlayout simulation results are compared for validation of results. At 45nm technology, the Power consumption is 718.6nW, Propagation delay is 1.450ns and Power delay product (PDP) is 1.041fj. 92% improvement in power consumption at 45nm technology has been achieved as compared to previous work (at 16nm) by generating highly customized layout of the designed circuit.

## Keywords

Dynamic logic, Static logic, Domino logic, UDSM, CCD

## **1. INTRODUCTION**

Dynamic gates are imperative for constructing wide high speed Processors, DRAMs, SRAMs etc. But the design and operation are however, more engage and prone to failure due to lack of design automation, a decreased tolerance to noise and enhanced power dissipation [1]. Thus domino logic circuit techniques are extensively applied in high performance microprocessors due to superior speed and area characteristics. Wide fan-in footless domino OR gate has been selected which controls the degradation in speed for design and analysis work.

## **1.1 Static Logic Circuits**

Static Logic circuits can maintain their output logic levels for indefinite period as long as the inputs are unaffected as shown in Figure 1. Although Static CMOS logic is widely used for its high noise margins, good performance and low power consumption with no static power dissipation, still these circuits are limited at running extremely high clock speeds and suffers from Glitches [1]. Number of transistors requires to implement an N fan-in gate is almost equal to 2N; therefore it will consume large silicon area. An alternate logic style is the dynamic CMOS Logic.

Gurmohan Singh DEC-Division C-DAC Mohali



Fig 1: CMOS Static Logic [1]

## 1.2 Dynamic Logic Circuit

Dynamic Logic utilizes simple sequential circuits with memory functions. The operation depends on temporary storage of charge in parasitic node capacitances. Dynamic circuits have achieved widespread use because they require less silicon area and have superior performance over conventional Static Logic circuits. As shown in Figure 2, Dynamic logic uses a sequence of Precharge and Evaluation Phases governed by the clock to realize complex logic functions [1]- [3].

### 1.2.1 Precharge Phase

When clock signal  $(\Phi) = 0$ , the output node *Out* is precharged to *VDD* by the PMOS transistor  $M_p$  and the evaluate NMOS transistor  $M_e$  remains off, so that the pull-down path is disabled.

#### 1.2.2 Evaluation Phase

In the PDN when clock signal  $(\Phi) = 1$ , the pre-charge transistor  $M_p$  is OFF, and the evaluation transistor  $M_e$  is turned ON. The output is conditionally set down based on the input values and the pull-down topology.



Fig 2: CMOS Dynamic Logic [2]

But the main disadvantage of Dynamic Logic Circuits is that they cannot be cascaded. To overcome this problem Domino Logic came into existence.

#### 1.3 Domino Logic

The name Domino comes from the behavior of a chain of the logic gates. It is a non-inverting structure as shown in Figure 3. It runs 1.5-2 times faster than static logic circuits. It is simply a logic which permits high-speed operation and enables the implementation of complex functions which otherwise is not achieved by Static and Dynamic circuits [1]-[3]. Domino logic offers a simple technique to eliminate the need of complex clocking scheme by utilizing a single phase clock and have no static power consumption as it is removed by clock input in the first stage. These logic circuits are glitch free, have fast switching threshold and possibility to cascade. Domino circuits employ a dual-phase dynamic logic style with each clock cycle divided into a Precharge and an Evaluation phase.

#### 1.3.1 Precharge Phase

When clock (*CLK*) is low, Precharge PMOS (*Mpre*) is ON and Evaluate NMOS (*Meval*) is OFF. Output node is precharged to *VDD*.

#### 1.3.2 Evaluation Phase

When clock (*CLK*) is high Evaluation NMOS (*Meval*) is ON and Precharge PMOS (*Mpre*) is OFF. Output node may be discharged if inputs have configured a conducting path to ground otherwise output node stay charged high. Input must be stable before clock goes high because once output has been discharged it will not go high again until next cycle.



Fig 3: CMOS Domino Logic [4]

#### 1.3.3 Cascading in Domino Logic Circuits

The Domino Logic gates are obtained by attaching a dynamic gate to a static inverter as shown in Figure 4. This inverter is added in order to enhance the reliability of the gates [4]. During the Precharge Phase the output node of the dynamic gate rises up to a high level and the static CMOS gate output falls to the low voltage level. Then in the Evaluation Phase the unique transition which can happen at the output of static gate is a single rise transition. This means that there are no problems in cascading such logic gates to form a complete design.



Fig 4: Cascaded Domino Logic [1]

#### **1.4 Keeper Circuit**

When a PMOS device is added between the output node and VDD and gate is connected to ground it will always turned ON and then PMOS transistor is called as keeper even in the evaluation phase, the output node will be connected in some magnitude to VDD. By using PMOS keeper transistor in Domino Logic circuits undesired discharging at the dynamic node due to leakage current will be reduced. The keeper ratio K is defined as

$$K = \frac{\mu_{\rm p} \left(\frac{W}{L}\right) \text{keeper} - \text{transistor}}{\mu_{\rm n} \left(\frac{W}{L}\right) \text{evaluation} - \text{transistor}}$$

Where,  $\mu_n$  and  $\mu_p$  are electron and hole mobilities respectively, and W and L denote the width and length of transistor respectively [5] & [9].

## 2. PREVIOUS WORK

As the technology scales down to the deep sub-micrometer regime, noise immunity becomes a major challenge in design of VLSI chip [6]. So Domino Logic circuits were designed which have low leakage and low power consumption. Current Comparison Domino (*CCD*) technique was proposed in [5]. In this circuit evaluation current of the gate was compared with the leakage current which will reduce the parasitic capacitance on the dynamic node and there is a continuous progress in leakage reduction also thus increase in the speed will be used for high- speed application [5].

#### **3. CIRCUIT DESIGN**

As the capacitance of the dynamic node is large in wide fanin gates this will limits the speed. On the other hand noise immunity will also decreased due to many parallel leaky path in wide fan-in gates. Thus Domino Logic Circuits came into existance as shown in Figure 3. It consists of two functional inputs, Input 1 and Input 2 which are attended by the Clock signal (CLK), and an inverter which is used to cascade the first stage to the next stage. In this paper Current Comparison Domino (CCD) 32 Input wide Fan- in footless circuit is introduced as shown in Figure 5. The circuit consists of two stages. The first stage is a Pre-evaluation network which includes the PUN and transistors  $M_{Pre}$ ,  $M_{eval}$ , and  $M_1$ . The second stage is a footless domino with node A as input without any charge sharing. Controlled keeper consists of transistors  $M_{kl}$  and  $M_{k2}$ . This keeper is, used to increase the performance and decreases power consumption [7]. The keeper size can be increased accordingly in order to speed up the operation [8]. Transistor  $M_{dis}$  which is connected to the inverter from where output has to be taken. Current of the PUN M<sub>2</sub> is compared with the worst case leakage current  $M_{k2}$ . Transistor  $M_6$  similar to  $M_1$  is added. Transistors  $M_3$ ,  $M_4$ ,  $M_5$ ,  $M_7$ , and  $M_8$  are also there in the circuit to provide biasing to the circuit.

# 3.1 Working of the Current Comparison Domino (CCD) Circuit

# 3.1.1 Pre-discharge Phase

Input signals are high and clock voltage have low level, respectively [*CLK* = '0' and  $\overline{CLK}$  = '1' in Figure 5]. Therefore, the voltages of the node A will lifted to the high level by transistor *Mpre* and dynamic node (*Dyn*) have fallen to the low level by transistor  $M_{Dis}$  respectively. Hence, transistors  $M_1$ ,  $M_2$ , and  $M_{eval}$  are OFF and  $M_{Pre}$ ,  $M_{Dis}$ ,  $M_{K1}$ , and  $M_{K2}$  are ON.



Fig 5: Schematic of CCD 32-input wide footless OR gate.

# 3.1.2 Evaluation Phase

In this phase, input signals is in the low level and clock voltage is in the high level [CLK = 'l' and  $\overline{CLK} = '0'$  in Figure 5]. Hence, transistors  $M_1$ ,  $M_2$ ,  $M_{k2}$ , and  $M_{eval}$  are ON,  $M_{Pre}$  and  $M_{Dis}$  are OFF, and transistor  $M_{Kl}$  can become on or off depending on input voltages. Thus, two states will be there. In the first state, all of the input signals will remain high. Second, at least one input falls to the low level. In the first state, due to the leakage current a small amount of voltage is established across transistor  $M_1$ . Although this leakage current is mirrored by PUN transistor  $M_2$ , the keeper transistors of the second stage  $(M_{k1} and M_{k2})$  suppress this mirrored leakage current. Only one pull-up transistor is added with the dynamic node instead of the n-transistor in the n-bit OR gate to reduce capacitance on the dynamic node, in order to provide a higher speed. In the second state, when at least one conduction path exists, the pull-up current flow is raised and the voltage of node A is decreased to nonzero voltage, Increasing the pull-up current increases the mirrored current in transistor  $M_2$ , thus voltage of the dynamic node (Dyn) is charged to V<sub>DD</sub>, output node will start discharging and turning off the main keeper transistor  $M_{k1}$ . By this technique the switching current between the keeper transistor and the mirror transistor will reduced and there will be increase in the speed [5].

# 3.2 Transistor Sizing

In order to achieve the desired delay and average power consumption optimal sizing of all transistors is necessary. The width of the transistors such as  $M_{pre}$ ,  $M_{eval}$ , in the OR gate is set to the minimum width, which is equal to  $W_{min} = 7L_{min}$ , where  $L_{min}$  is the minimum length. The length of all transistors is fixed at *Lmin*. Table 1 show the transistor sizing for designed Current Comparison Domino (CCD) 32-input wide footless OR gate. To obtain the desired delay, the widths of keeper transistor  $M_{k1}$  and  $M_{k2}$  are varied. Transistors  $M_6$ ,  $M_7$ , and  $M_8$  are chosen with a width such that the leakage current of the reference circuit is slightly higher than that of a 32-input OR gate. The width of transistor  $M_6$  to reduce the contention between  $M_4$  and  $M_3$  in order to achieve maximum speed.

Table1: Transistor sizing for designed 32 input wide CCD footless OR gate [5]

| Transistor (s)                                                             | Width                |
|----------------------------------------------------------------------------|----------------------|
| $W_{k1}$                                                                   | 8L <sub>min</sub>    |
| $\mathbf{W}_{k2}$                                                          | 9L <sub>min</sub>    |
| $(W_p / W_n)$ Inverter                                                     | $14L_{min}/7L_{min}$ |
| $W_{\text{pre}}, W_{\text{eval}}, W_{\text{dis}}, W_1, W_4, W_5, W_6, W_8$ | $7L_{min}$           |
| (W/L) <sub>3</sub>                                                         | $15L_{min}/2L_{min}$ |
| W <sub>7</sub>                                                             | 64x7L <sub>min</sub> |
| W2                                                                         | 8L <sub>min</sub>    |

# 4. SIMULATION RESULTS

In this section, the simulation results of designed 32 input wide Current Comparison Domino (CCD) footless OR gate are obtained for average power consumption and propagation delay. The simulations are carried out at 45nm and 90nm technology nodes for the supply voltage of 1V and 1.2V respectively. The schematic of designed CCD circuit is generated in Virtuoso Schematic Editor. Cadence Spectre circuit simulator is used for simulation using BSIM3v3 models in 45nm and 90nm CMOS technology. The layouts of CCD circuit are generated in Virtuoso Layout Editor. The power and time delay calculations are performed using visualization and analysis XL calculator provided in cadence spectre tool.

# 4.1 Simulation waveforms of Domino Logic Circuits

# 4.1.1 Precharge Phase

Figure 6 shows during Precharge Phase when  $CLK = {}^{\prime}0{}^{\prime}$  irrespective of the inputs (*Input 1= {}^{\prime}1'*, *Input 2 = {}^{\prime}0'*) the Dynamic Node start getting charged whereas output node will discharged.



Fig 6: Precharge Phase of CMOS Domino Logic Circuit

## 4.1.2 Evaluation Phase

Figure 7 shows during evaluation phase when CLK = '1' all the inputs (*Input 1= '1', Input 2= '1'*) must be high then only the dynamic node start getting discharged whereas output node will be charged.



Fig 7: Evaluation Phase of CMOS Domino Logic Circuit

# 4.2 Post Layout Simulation waveforms of Current Comparison Domino (CCD) at 45nm

## 4.2.1 Pre-discharge Phase

All 32-inputs are tied together and applied as a common signal (*Input 1= '1', CLK 1= '0', CLK 2= '1'*) the dynamic node will start discharging and node A will start charging as shown in Figure 8. CLK2 signal in figures 8 and 9 represents is  $\overline{\text{CLK}}$  signal.



Fig 8: Precharge Phase of CCD 32input wide footless OR-Gate Circuit at 45nm technology node

## 4.2.2 Evaluation Phase

In the evaluation phase, 31 inputs are tied together and applied as a common signal *input1* and  $32^{nd}$  input is applied as *Input2 (CLK 1= '1',CLK 2= '0' Input 1= '1', Input 2= '0')*, the dynamic node will start charging and output node will start discharging as shown in Figure 9.



#### Fig 9: Evaluation Phase of CCD 32-input wide footless OR-Gate Circuit at 45nm technology node

# 4.2.3 Layout of Current Comparison Domino (CCD) at 45nm technology node:

Layouts are generated at both 45nm and 90nm technology nodes. Pre-layout and post-layout results are compared and verified. Figure10 depicts the layout drawn for the desired schematic of Current Comparison Domino at 45nm. In this power supplies are available through the top and the middle metal-1 rails, while a shared ground rail travels at the bottom of the cell. Signals routing have been realized using only metal layers 1 and 2. The obtained layout dimensions are  $22.9\mu m \times 10.89\mu m$ . Post layout verifications are also done for the schematic and the obtained results are validated with prelayout simulation results. Propagation delay, power consumption and power delay product (PDP) results for designed CCD 32 input footless OR gate at 45nm and 90nm technologies are listed in Table II and Table III respectively.



Fig10: Layout of designed current comparison domino (CCD) OR gate at 45nm technology node

Table 2: Performance results for 45nm Technology Node

|                             | Pre-charge |        | Evaluation |        |
|-----------------------------|------------|--------|------------|--------|
| Phase                       | Pre-       | Post   | Pre –      | Post   |
|                             | Layout     | Layout | Layout     | Layout |
| Power (nW)                  | 706.9      | 718.6  | 78.36      | 83.31  |
| Propagation<br>Delay (nsec) | 1.432      | 1.450  | 49.74      | 51.44  |
| Power Delay<br>Product(fj)  | 1.012      | 1.041  | 3.897      | 4.285  |

Table 3: Performance results for 90nm Technology Node

|                             | Pre-charge     |                | Evaluation      |                |
|-----------------------------|----------------|----------------|-----------------|----------------|
| Phase                       | Pre-<br>Layout | Post<br>Layout | Pre –<br>Layout | Post<br>Layout |
| Power (µW)                  | 147.0          | 150.2          | 23.63           | 25.70          |
| Propagation<br>Delay (nsec) | 1.588          | 1.600          | 50.85           | 51.90          |
| Power Delay<br>Product(pj)  | 0.233          | 0.240          | 1.201           | 1.332          |

Table 4: Comparison with previous work

| Parameters              | [5]    | This work |
|-------------------------|--------|-----------|
|                         | (16nm) | (45nm)    |
| Power(µW)               | 9.1    | 0.718     |
| Propagation Delay(psec) | 60     | 1450      |

# 5. CONCLUSION

This research work focuses on analysis of CMOS current comparison Domino logic styles to achieve higher speed and low power consumption. Current Comparison Domino (CCD) 32-input footless OR gate circuit has been selected for design & analysis work. This circuit is suitable for the applications where wide fan-in logic functions with lower power consumption, better noise immunity and lower time delay are required. CMOS Current Comparison Domino (CCD) circuit topology was selected to reduce the leakage current in the evaluation network which increases due to scaling down of technology. This circuit topology minimizes the contention between the Keeper transistor and evaluation network. Thus it is concluded that the design of Current Comparison Domino (CCD) logic circuits would help the designers to design high performance and low power digital circuits. Design and analysis work has been carried out at 90nm & 45nm technology nodes. Highly optimized layouts in terms of area are generated in Cadence Virtuoso Layout Editor. This work achieved 92% of power consumption improvement as reported in previous work.

# 6. REFERENCES

- J. M. Rabaey, A. Chandrakasan, and B. Nicolic, Digital Integrated Circuits: A Design Perspective, 2<sup>nd</sup> ed. Upper Saddle River, NJ: Prentice-Hall, 2003.
- [2] Sung-Mo Kang & Yusuf Leblebici, CMOS Digital Integrated Circuits: Analysis and Design, 3<sup>rd</sup> Edition, Tata McGraw-Hill Publishing Company Ltd, New Delhi, 2007.
- [3] Neil H. E. Weste and K. Eshragian, Principle of CMOS VLSI Design, 2<sup>nd</sup> Edition, Pearson Education (Asia) Pvt. Ltd.2000.
- [4] Wayne Wolf, Modern VLSI Design: IP-Based Design, 4<sup>th</sup>Edition, Pearson Education, 2008.
- [5] Ali Peiravi and Md. Asyaei, "Current-Comparison-Based Domino: New Low-Leakage High –Speed Domino Circuit for Wide Fan –In Gates", *IEEE Transactions*

Very Large Scale Integration (VLSI) Systems, vol.21, no.99, pp 934-943, May 2012.

- [6] Sarma, D.S.V.S., Mahapatra, K.K., "Improved techniques for high performance noise-tolerant domino CMOS logic circuits," Students Conference on Engineering and Systems (SCES), pp.1-6 March 2012.
- [7] Digital object Identifier:10.1109/SCES.2012.6199119
- [8] H.F. Dadgour and K. Banerjee, "A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic or Gates," *IEEE Transactions Very Large Scale Integration (VLSI) Systems*, vol.18, no.11, pp.1567-1577, Nov. 2010.
- [9] Sharroush, S.M., Abdalla, Y.S., Dessouki, A.A., El-Badawy, E.-S.A., "Speeding-up wide-fan in domino logic using a controlled strong PMOS keeper," Computer and Communication Engineering, 2008. ICCCE 2008. International Conference on Electronic Design, pp.633-637, 13-15 May 2008.
- [10] Digital object Identifier: 10.1109/ICCCE.2008.4580681
- [11] H. Mahmoodi and K. Roy, "Diode-footed domino: A leakage-tolerant high fan-in dynamic circuit design style," *IEEE Transactions* Circuits and Systems I: Regular Papers, vol.51, no.3, pp.495-503, March 2004