# Design of Low Power FlipFlop to Reduce Area and Delay using Conditional Pulse Enhancement Method

G.Venkadeshkumar M.Tech VLSI Design Scholar Department of ECE Kalasalingam University

# ABSTRACT

A low power pulse triggered flipflop (P-FF) design is done by the pulse generation control logic, an AND function, is removed from the critical path to facilitate a faster discharge operation. A simple two-transistor AND gate design is used to reduce the circuit complexity. A conditional pulse enhancement technique is devised to speed up the discharge along the critical path only when needed. As a result, transistor sizes in delay inverter and pulse generation circuit can be reduced for power saving. Various post layout simulation results based on UMC CMOS 90-nm technology reveal that the enhanced pulse triggered FF design features the best power-delay-product performance in seven FF designs under comparison. Its maximum power saving against rival designs is up to 38.4%. Compared with the conventional transmission gate based flipflop design. The average leakage power consumption is also reduced by a factor of 3.52.

#### **General Terms**

Conditional pulse enhancement technique, Pulse triggered flipflop, Two-transistor AND gate design, CMOS 90-nm technology.

#### **Keywords**

Flipflop, low power, pulse triggered.

#### 1. INTRODUCTION

Flip-Flops (FFs) are the basic storage elements used extensively in all kinds of digital designs. The digital designs nowadays often adopt intensive pipelining techniques and employ many FF rich modules and also estimated that the power consumption of clock system, which consists of clock distribution networks and storage elements is as high as 20% to 45% of the total system power [14]. Pulse triggered flip flop (P-FF) is considered as a popular alternative to the conventional master slave based FF in the application of high speed operations. Besides the speed advantage, its circuit simplicity is also beneficial to lowering the power consumption of the clock tree system. A P-FF consists of a pulse generator for generating strobe signals and a latch for data storage. Since triggering pulses generated on the transition edges of the clock signal are very narrow in pulse width, the latch acts like an edge-triggered FF. The circuit complexity of a P-FF is simplified since only one latch, as opposed to two used in conventional master slave configuration, and is needed. P-FFs also allow time borrowing across clock cycle boundaries and feature a zero or even negative setup time. P-FFs are thus less sensitive to clock jitter. The pulse generation circuitry requires delicate pulse width control in the face of process variation and the configuration of pulse clock distribution network [13]. Depending on the method of pulse generation, P-FF designs can be classified as implicit or explicit. In an implicit type P-FF, the pulse generator is a built in logic of the latch design, and no explicit pulse signals are generated. In an explicit-type K.Pandiaraj Assistant Professor II Department of ECE Kalasalingam University

P-FF, the designs of pulse generator and latch are separate. Implicit pulse generation is often considered to be more power efficient than explicit pulse generation. This is because the former merely controls the discharging path while the latter needs to physically generate a pulse train. Implicit-type designs, however, face a lengthened discharging path in latch design, which leads to inferior timing characteristics. The situation deteriorates further when low power techniques such as conditional capture, conditional precharge, conditional discharge, or conditional data mapping are applied. A low power implicit type P-FF design featuring a conditional pulse enhancement scheme. Three additional transistors are employed to support this feature. In spite of a slight increase in total transistor count, transistors of the pulse generation logic benefit from significant size reductions and the overall layout area is even slightly reduced. This gives rise to competitive power and power delay product performances against other P-FF designs [10]-[11].

# 2. IMPLICIT TYPE P-FF DESIGN WITH PULSE CONTROL SCHEME

Some conventional implicit-type P-FF designs, which are used as the reference designs in later performance comparisons, are reviewed.

#### 2.1 Implicit Pulse Triggered DCO FlipFlop



#### Fig.1 ip-DCO

Implicit pulse triggered data close to output (ip-DCO) contains an AND logic based pulse generator and a semi dynamic structured latch design [12]. Inverters I5 and I6 are used to latch data and inverters I7 and I8 are used to hold the internal node. The pulse generator takes complementary and delay skewed clock signals to generate a transparent window equal in size to the delay by inverters I1-I3. Two practical problems exist in this design. First, during the rising edge,

nMOS transistors N2 and N3 are turned on. If data remains high, node will be discharged on every rising edge of the clock. This leads to a large switching power. The other problem is that node controls two larger MOS transistors (P2 and N5). The large capacitive load to node causes speed and power performance degradation.

### 2.2 Modified Hybrid Latch FlipFlop



#### Fig.2 MHLFF

Modified Hybrid Latch Flip Flop (MHLFF) is employing a static latch structure in which the node is no longer precharged periodically by the clock signal [6]. A weak pull up transistor P1 controlled by the FF output signal Q is used to maintain the node level at high when Q is zero. This design eliminates the unnecessary discharging problem at node but it encounters a longer D-to-Q delay during 0 to 1 transitions because node is not pre-discharged. Larger transistors N3 and N4 are required to enhance the discharging capability. Another drawback of this design is that node X becomes floating when output and input data both equal to 1. Extra DC power emerges if node X is drifted from an intact 1.

#### 2.3 SCCER FlipFlop



#### Fig.3 SCCER

Single ended Conditional Capture Energy Recovery (SCCER) Flipflop using a conditional discharged technique is used in which the keeper logic in ip-DCO replaced by a weak pull up transistor P1 in conjunction with an inverter I2 to reduce the load capacitance of node [7],[2]. The discharge path contains nMOS transistors N2 and N1 connected in series. In order to eliminate superfluous switching at node, an extra nMOS transistor N3 is employed. Since N3 is controlled by Q\_fdbk, no discharge occurs if input data remains high. The worst case timing of this design occurs when input data is 1 and node is discharged through four transistors in series, i.e., N1 through N4, while combating with the pull up transistor P1.

# 3. IMPLICIT TYPE P-FF DESIGN WITH PULSE ENHANCEMENT SCHEME



Fig.4 P-FF design with pulse enhancement scheme

This design adopts two measures to overcome the problems associated with existing P-FF designs. The first one is reducing the number of nMOS transistors stacked in the discharging path. The second one is supporting a mechanism to conditionally enhance the pull down strength when input data is 1. Refer to Fig. 2(a), the upper part latch design is similar to the one employed in SCCER design [2]. As opposed to the transistor stacking design in ip-DCO and SCCER, transistor N2 is removed from the discharging path. Transistor N2, in conjunction with an additional transistor N3, forms a two-input pass transistor logic (PTL) based AND gate [1],[4] to control the discharge of transistor N1. Since the two inputs to the AND logic are mostly complementary (except during the transition edges of the clock), the output node Z is kept at zero most of the time. At the rising edges of the clock, both transistors N2 and N3 are turned on and collaborate to pass a weak logic high to node Z, which then turns on transistor N1 by a time span defined by the delay inverter I1. The switching power at node Z can be reduced due to a diminished voltage swing. With this design measure, the number of stacked transistors along the discharging path is reduced and the sizes of transistors N1-N5 can be reduced.

In this design, the longest discharging path is formed when input data is "1" while the Qbar output is "1." To enhance the discharging under this condition, transistor P3 is added. Transistor P3 is normally turned off because node is pulled high most of the time. It steps in when node is discharged to  $V_{tp}$  below the VDD. This provides additional boost to node Z (from  $V_{DD}\text{-}V_{th}$  to  $V_{DD}\text{)}.$ 



Fig.5 P-FF with pulse enhancement scheme in Microwind

The generated pulse is taller, which enhances the pull-down strength of transistor N1. After the rising edge of the clock, the delay inverter I1 drives node Z back to zero through transistor N3 to shut down the discharging path. The voltage level of Node rises and turns off transistor P3 eventually. With the intervention of P3, the width of the generated discharging pulse is stretched out. This means to create a pulse with sufficient width for correct data capturing, a bulky delay inverter design, which constitutes most of the power consumption in pulse generation logic, is no longer needed. This conditional pulse enhancement technique takes effects only when the FF output Q is subject to a data change from 0 to 1. The leads to a better power performance than those schemes using an indiscriminate pulse width enhancement approach. Another benefit of this conditional pulse enhancement scheme is the reduction in leakage power due to shrunken transistors in the critical discharging path and in the delay inverter

# 4. SIMULATION



#### **Fig.6 Simulation Setup**

The operating condition used in simulations is 500 MHz/1.0V. Since pulse width design is crucial to the correctness of data capturing as well as the power consumption, the pulse generator logic in all designs are first sized to function properly across process variation. All designs are further optimized subject to the tradeoffs between power and D-to-Q

delay, i.e., minimizing the product of the two terms. The simulation setup model is to mimic the signal rise and fall time delays, input signals are generated through buffers. Considering the loading effect of the FF to the previous stage and the clock tree, the power consumptions of the clock and data buffers are also included. The output of the FF is loaded with a 20-fF capacitor. An extra capacitance of 3 fF is also placed after the clock buffer. The power consumption and timing behaviour of these FF designs is calculated. The power consumption of the enhanced pulse triggered flipflop design is the lowest in all test patterns because of shorter discharging path.

#### 5. RESULT COMPARISON

The comparison of result summarizes some important performance indexes of these P-FF designs. These include transistor count, layout area, setup time and hold time, min D to Q delay, optimal PDP, and the clock tree design.

| P-FF                                                        | IP-DCO   | MHLFF    | SCCER    | EPTFF    |
|-------------------------------------------------------------|----------|----------|----------|----------|
| No. of<br>transistors/<br>Layout area<br>(µm <sup>2</sup> ) | 23/91.88 | 19/93.02 | 17/80.07 | 19/79.17 |
| Setup time<br>(pS)                                          | -35.8    | 8.3      | -58.1    | -39.7    |
| Hold time<br>(pS)                                           | 47.4     | 82.2     | 59.3     | 85.1     |
| Min. D-Q<br>Delay (pS)                                      | 118.75   | 117.01   | 112.90   | 107.24   |
| Avg. Power<br>(µW)                                          | 17.50    | 18.97    | 19.40    | 12.90    |
| Power<br>Delay<br>Product                                   | 4.22     | 4.89     | 3.19     | 2.65     |

#### **Table.1 Comparison of Result**

The MHLLF design exhibits the largest layout area because of an oversized pulse generation circuit. The Enhanced Pulse Triggered Flipflop design (EPTFF) features the shortest minimum D to Q delay. Its hold time is longer than other designs because the transistor (P3) for the pulse enhancement requires a prolonged availability of data input. The power drawn from the clock tree is calculated to evaluate the impact of FF loading on the clock jitter. Although the EPTFF design requires clock signal connected to the drain of transistor N2, the drawn current is not significant. Due to complementary switching behaviour of N2 and N3, there exists no signal path from the entry of the clock signal to either V<sub>DD</sub> or GND. The clock tree is only liable for charging/discharging node Z. The optimal PDP value is significantly better than other designs.

### 6. CONCLUSION

The enhanced pulse triggered low-power FF (EPTFF) design by employing two new design measures. The first one successfully reduces the number of transistors stacked along the discharging path by incorporating a PTL-based AND logic. The second one supports conditional enhancement to the height and width of the discharging pulse so that the size of the transistors in the pulse generation circuit can be kept minimum. Simulation results indicate that the proposed design excels rival designs in performance indexes such as power, D to Q delay, and PDP. Coupled with these design merits is a longer hold-time requirement inherent in pulse triggered FF designs. However, hold-time violations are much easier to fix in circuit design compared with the failures in speed or power.

#### 7. ACKNOWLEDGEMENT

I express my sincere thanks to my guide, **Mr.K.Pandiaraj**, **M.E.**, Assistant Professor in Electronics and Communication Engineering Department and to my Project Coordinator **Mr.G.Karthy**, **M.Tech.**, Assistant Professor, ECE Dept.,for his able guidance and useful suggestions, which helped me in the project work.

# 8. REFERENCES

- P. Zhao, J. McNeely, W. Kaung, N. Wang, and Z. Wang, 2011. "Design of sequential elements for low power clocking system," IEEE Trans. Very Large Scale Integr. (VLSI) System.
- [2] H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy, 2009. "Ultra low power clocking scheme using energy recovery and clock gating," IEEE Trans. Very Large Scale Integrated. (VLSI) system.
- [3] C. K. Teh, M. Hamada, T. Fujita, H. Hara, N. Ikumi, and Y. Oowaki, 2006. "Conditional data mapping flip-flops for low-power and high-performance systems," IEEE Trans. Very Large Scale Integr. (VLSI) Systems.
- [4] Y.-H. Shu, S. Tenqchen, M.-C. Sun, and W.-S. Feng, 2006 "XNOR-based double-edge-triggered flip-flop for two-phase pipelines," IEEE Trans. Circuits Systems. II, Exp. Briefs.

- [5] A. G. M. Strollo, D. De Caro, E. Napoli, and N. Petra, 2005. "A novel high speed sense-amplifier-based flipflop," IEEE Trans. Very Large Scale Integrated. (VLSI) System.
- [6] S. H. Rasouli, A. Khademzadeh, A. Afzali-Kusha, and M. Nourani, 2005 "Low power single- and double-edgetriggered flip-flops for high speed applications," Proc. Inst. Electr. Eng. Circuits Devices System.
- [7] P. Zhao, T. Darwish, and M. Bayoumi, 2004 "Highperformance and low power conditional discharge flipflop," IEEE Trans. Very Large Scale Integr. (VLSI) System.
- [8] V. G. Oklobdzija, 2003 "Clocking and clocked storage elements in a multi-giga-hertz environment," IBM J. Res. Devel.
- [9] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, and T. Grutkowski, 2002 "The implementation of the Itanium 2 microprocessor," IEEE J. Solid-State Circuits.
- [10] N. Nedovic, M. Aleksic, and V. G. Oklobdzija, 2002 "Conditional precharge techniques for power-efficient dual-edge clocking," in Proc. Int. Symp. Low-Power Electron. Design.
- [11] B. Kong, S. Kim, and Y. Jun, 2001 "Conditional-capture flip-flop for statistical power reduction," IEEE J. Solid-State Circuits.
- [12] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, 2001 "Comparative delay and energy of single edge-triggered and dual edge triggered pulsed flipflops for high-performance microprocessors,"
- [13] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta, R. Heald, and G. Yee, 1999 "A new family of semi-dynamic and dynamic flip flops with embedded logic for high-performance processors," IEEE J.Solid-State Circuits.
- [14] H. Kawaguchi and T. Sakurai, 1998 "A reduced clock-swing flip-flop (RCSFF) for 63% power reduction," IEEE J. Solid-State Circuits.
- [15] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, 1996 "Flow-through latch and edgetriggered flip-flop hybrid elements," in IEEE Tech.