### **Switching Reduction in Power Gated Design**

Priyanka Choudhury and Sambhu Nath Pradhan NIT Agartala Jirania,Agartala Pin-799055, Tripura, India.

### ABSTRACT

During synthesis of power gated finite state machine (FSM) power can be reduced by reducing switching activity of the implemented circuit. Power gating can be applied to turn OFF the inactive sub-machine which is obtained after partitioning the FSM by gating the supply voltage. During transition from the states of one sub-machine to other sub-machine, wakeup time is required to turn OFF the current sub-machine and turn ON other. Wakeup time affects the partitioning of FSMs for its power gated implementation as both the sub-machines are ON during this time. In this paper, we have calculated this wakeup time to find the boundary depth. Variation of wakeup time as a function of size of sleep transistor and sub-machine has also been studied. Power model has been developed for the power-gated design of FSM. Here, we present a Genetic Algorithm (GA) based solution of the problem of both partitioning and encoding of power gated FSM for reduced power consumption Experimental results show that more than 55% power can be saved in this approach.

### **General Terms**

Low Power VLSI Design.

### Keywords

Finite State Machine, Partitioning, Power gating, Wakeup time, Boundary depth, Power model, Low power.

### **1. INTRODUCTION**

The power of FSM can be reduced by applying power gating technique. Power gating is a technique for saving both leakage and switching power by shutting OFF the idle blocks of the circuit. The authors in [2] presented a power-gating technique for FSM decomposition targeting low power. In [2] a simulated annealing based algorithm has been proposed to perform the decomposition without timing penalty, but it did not consider state encoding. In [1] a GA based approach for simultaneous partitioning and state assignment of FSMs with power reduction as the objective was presented. Reduction of power consumption is achieved by turning OFF the inputs of the sub-machine which is not active. The authors in [3] considered the problems of FSM decomposition and state encoding together, targeting low power dissipation in powergating technique. During the cross transition power is consumed by both the sub-machines but this issue was not considered in [3]. In our paper we have taken care off this issue and we have developed power model based on state probability, considering partitioning and state encoding together. In [3] authors assumed that number of clock cycle required for all the circuits to wakeup is 2. So, they have considered boundary depth 2. In fact, for different circuit boundary depth will differs. So, we have determined boundary depth for each circuit first and then carried out experiments taking specific boundary depth of each circuit. For larger size of sleep transistor boundary depth is 1 and for smaller size boundary depth may be 1 or 2 or more. Dynamic power

consumed by the combinational part of the circuit, has been estimated by our power estimator using Total State Transition Probability(TSTP) and Hamming distance and compared with NOVA encoding [4]. Effectiveness of power gating design has been presented in [6, 7].

The rest of the paper is organized as follows: Section 2 presents partitioning of FSM and architecture for power gating implementation. In section 3 the process of steady state probability calculation is explained. Proposed power model of power gated FSM is described in section 4. Experimental results are presented in Section 5. Finally, Section 6 concludes the whole work along with the future scope.

### 2. PARTITIONING OF FSM AND POWER GATING IMPLEMENTATION

Power gating technique was used in [3] by adding a PMOS sleep transistor between  $V_{dd}$  and the combinational circuit or by adding NMOS sleep transistor between combinational circuit and ground as shown in Fig.1 which is the proposed circuit architecture for power-gating in [3]. In our work we have considered the same architecture as in Fig.1. The description and partitioning strategy of the circuit are given in [3]. Next, we state the definitions related to boundary depth.

#### Fig 1: Power-gating architecture



**Boundary depth (BD):** It is defined as the number of clock cycles needed to turn ON the sub-machine to be activated.

**Boundary states (BS):** A boundary state between two submachines is a state in one sub-machine which is within the boundary depth of another machine. We use  $D(F_1, F_2)$  to denote the set of boundary states in  $F_1$  leading to  $F_2$ . The sum of the boundary state probability (BSP) of  $D(F_1, F_2)$  is denoted as  $P_D(F_1, F_2)$ . Similarly, the sum of the boundary state probability of  $D(F_2, F_1)$  is denoted as  $P_D(F_2, F_1)$ .

# 3. STEADY STATE PROBABILITY CALCULATION

It is possible to compute the steady state probabilities of the states and the transition probabilities from the specific input line probabilities. The steady state probability  $P(S_i)$  is the probability of the FSM being in the state  $S_i$  at a time instant. The state transition probability for the transition from  $S_i$  to  $S_j$  is defined as  $P(S_{ij})$  and it is computed as:

$$P(S_{ij}) = P(S_i) * P(k)$$

where P(k) represents the probability of the primary input combination holding true for which the transition  $S_i$  to  $S_j$  takes place. The steady state probabilities of the FSM states are calculated using the following equations.

$$\sum_{i} P(S_i) = 1$$
,  $P(S_i) = \sum_{j} P(S_{ij})$ 

This system of equations is known as Chapman-Kolmogorov equations [5] for a discrete-time discrete-transition Markov process. By solving this set of linear equations using the Gauss-Jordon Elimination method, the steady state probabilities have been determined.

Once the state probability is calculated, we can develop the power model of the power-gated FSM.

### 4. POWER MODELLING

Here, we present our power model to estimate the total power consumed by the bipartitioned FSM. We need this model only during the execution of the GA based partitioning and state assignment procedure which we have used. We assume that the combinational logic has been implemented in a two-level PLA. It may be noted that our fitness calculation targets twolevel realization.

Developed power model contains four parts, each of which corresponds to a particular event based on the possible transitions.

$$Total power = power_F_1 + power_F_2 + power_F_{12} + power_F_{21}$$
(1)

First and second part of power consumption is due to an inner transition which takes place in  $F_1$  and  $F_2$  respectively. Third and fourth terms are due to the cross transitions which takes place from  $F_1$  to  $F_2$  and  $F_2$  to  $F_1$  when both the sub-machines are ON.

$$Total power = P(F_1)Power(F_1) + P(F_2)Power(F_2) + P_D(F_1, F_2)[Power(F_1) + Power(F_2)] + P_D(F_2, F_1)$$

$$[Power(F_1) + Power(F_2)]$$
(2)

From the above expression it is evident that calculation of steady state probabilities, boundary depth and power

estimation of sub-machines are required for total power estimation of FSM.

*Power estimation:* We have used our power estimator which is total expected switching activity, to calculate the power and then, a cost function is constructed using this power. Concept of switching activity calculation is explained in [8].

For constructing this cost function, the FSM is modelled as a Marcov chain which was explained in section 3.

Total transition probability  $tstp(i \rightarrow j)$  between two states  $S_i$  and  $S_j$ : It is the probability of transition from  $S_i$  to  $S_j$  which occurs in an arbitrarily long sequence and it is given by,

$$tstp(i \rightarrow j) = P_i p_{i,j}$$

Where  $P_i$  is the steady state probability and  $p_{i,j}$  is the conditional state transition probability which was described earlier.

Total transition probability  $tstp(j \rightarrow i)$  between two states  $\mathbf{S}_{\mathbf{j}}$ and  $\mathbf{S}_{\mathbf{i}}$ : It is the probability of transition from  $S_{\mathbf{j}}$  to  $S_{\mathbf{i}}$  which occurs in an arbitrarily long sequence and it is given by,

$$tstp(j \to i) = P_j p_{j,i}$$

Where  $P_i$  is the steady state probability and  $p_{j,i}$  is the conditional state transition probability.

The power consumption P is described as:

$$P = 0.5 V_{DD}^{2} f C_{sr} E_{sa}$$

Where *f* is the clock frequency of the state machine,  $V_{DD}$  is the supply voltage ,  $C_{sr}$  is the overall state register capacitance and the  $E_{sa}$  is the expected switching activity. Since the supply voltage  $V_{DD}$ , the clock frequency *f*, the state register capacitance  $C_{sr}$  are often fixed in the system specification and cannot be affected,  $E_{sa}$  is choosen to estimate the cost function.

$$E_{sa} = \sum_{i,j \in S} tstp(i \leftrightarrow j) \times h(i,j)$$

Where h(i,j) is the Hamming distance of the state codes of state  $S_i$  and state  $S_j$  and  $tstp(i \rightarrow j) = tstp(i \rightarrow j) + tstp(j \rightarrow i)$  and  $tstp(i \rightarrow j)$  is the total state transition probability of a transition from state  $S_i$  to state  $S_i$ .

Here we have partitioned the FSM into two sub-FSMs.  $E_{sa1}$  and  $E_{sa2}$  is calculated by the above mentioned process for sub-FSM,  $F_1$  and sub-FSM,  $F_2$  respectively and written as follows:

$$E_{sa1} = \sum_{i,j \in S1} tstp_1(i \leftrightarrow j) \times h_1(i,j)$$
$$E_{sa2} = \sum_{i,j \in S2} tstp_2(i \leftrightarrow j) \times h_2(i,j)$$

Where S1 and S2 are the states of  $F_1$  and  $F_2$  respectively.  $tstp_1(i \leftrightarrow j)$  and  $tstp_2(i \leftrightarrow j)$  are the total state transition probability of  $F_1$  and  $F_2$  respectively and  $h_1(i,j)$  and  $h_2(i,j)$  are the hamming distance of the state codes of the states of  $F_1$ and  $F_2$  respectively.

Then, the total power dissipation is given by the equation:

$$TotalE_{sa} = P(F_{1}) * E_{sa1} + P(F_{2}) * E_{sa2} + P_{D}(F_{1}, F_{2})[E_{sa1} + E_{sa2}] + P_{D}(F_{2}, F_{1})[E_{sa1} + E_{sa2}]$$
(3)

*Total*  $E_{sa}$  is used as cost function for the genetic algorithm.

### 5. EXPERIMENTAL RESULTS

The GA approach of partitioning and state encoding adopted here is same as [3]. The *Chromosome Structure* contains both the partition and state encoding part together.

For power calculation we have used expected switching activity as explained in Section 4. For the set of benchmark circuits listed in Table I, minimum operating frequency found to be 463 MHz. This power estimator has been used in our proposed GA-based partitioning and encoding of power gating design.

*Result for GA encoding:* To see the impact of state-encoding, we have formulated another GA in which only state-encoding has been performed for the whole FSM—it does not do any partitioning. The GA that performs state encoding only on the average requires 52% lesser dynamic power than NOVA encoding. This clearly establishes the impact of our GA-based encoding.

*Result after finding boundary depth (BD) of sub-machine:* In the experiment [3], authors have considered that boundary depth is 2 but they did not fix the boundary depth for a particular sub-machine. In this section we set the boundary depth of a each sub-machine and apply the proposed power estimation technique explained in section 5. If a sub-machine is power-gated, the numbers of cycles which are needed to wake-up the sub-machine is the boundary depth for that sub-machine. The cost of the partitioned FSM is evaluated by the developed power model in the form of Eq.(3). In this power model boundary state probability is required to get the total

power of the power gated FSM. This boundary state probability depends on the boundary depth of sub-machine. Different boundary depth of a same sub-machine gives different boundary state probability. So, first, we need to determine the boundary depth of a sub-machine and then partition the FSM using this power metric as cost. To find the boundary depth before GA-based partitioning and encoding, we have taken GA encoded FSM. We have considered in a sub-machine maximum 80% states would be there. So, 80% encoded states of a FSM has been simulated in Cadence Spectre at 45nm technology to get the delay and wakeup time. The set of all the FSM after encoding has been simulated and delays and wakeup time are noted. Among all the delay, the maximum delay which is 2.16ns is taken to calculate the frequency of operation of all the sub-machine. So, the clock period is determined from the frequency of operation. For each circuit as we know the wakeup time and clock period, the boundary depth is determined after dividing wakeup time by clock period. Form Table 1 it is clear that for a particular size of sleep transistor wakeup time differs for different submachine and consequently boundary depth also varies for different sub-machine having different size. The size of the sleep transistor greatly affect the wakeup time for a particular sub-machine as shown in Table 1. We have shown the variation in boundary depth at 45nm technology for three different width ( $w_1 = 480nm$ ,  $w_2 = 240nm$  and  $w_3 = 120nm$ ) of sleep transistor keeping length fixed. The length and width are chosen such that boundary depth for the reported benchmark circuit is less than or equal to 2.

Out of 16 circuits, the number of circuit having boundary depth 2 is 1, 4 and 6 at  $w_1$ ,  $w_2$  and at  $w_3$  respectively. Maximum power saving of our power gated design over NOVA at  $w_1$ ,  $w_2$  and at  $w_3$  is 55.3%, 55.3% and 55.46% respectively for the circuit *keyb*.

|          |                       |                       |                | boundary              |                          |                |                |                       |                |       |
|----------|-----------------------|-----------------------|----------------|-----------------------|--------------------------|----------------|----------------|-----------------------|----------------|-------|
|          | wak                   | depth                 |                |                       | % power saving over NOVA |                |                |                       |                |       |
| circuits | <b>w</b> <sub>1</sub> | <b>w</b> <sub>2</sub> | W <sub>3</sub> | <b>w</b> <sub>1</sub> | <b>w</b> <sub>2</sub>    | w <sub>3</sub> | w <sub>1</sub> | <b>w</b> <sub>2</sub> | w <sub>3</sub> | [3]   |
| bbara    | 0.88                  | 1.48                  | 1.80           | 1                     | 1                        | 1              | 16.07          | 16.07                 | 16.07          | 62.24 |
| cse      | 1.48                  | 2.38                  | 2.89           | 1                     | 2                        | 2              | 54.60          | 54.60                 | 54.60          | -     |
| dk512    | 0.70                  | 1.02                  | 1.24           | 1                     | 1                        | 1              | 22.14          | 22.14                 | 22.14          | 15.11 |
| keyb     | 1.46                  | 1.79                  | 2.17           | 1                     | 1                        | 2              | 55.30          | 55.30                 | 55.46          | -     |
| s208     | 1.22                  | 1.44                  | 1.75           | 1                     | 1                        | 1              | 33.00          | 33.00                 | 33.00          | 69.26 |
| s386     | 1.10                  | 1.32                  | 1.61           | 1                     | 1                        | 1              | 23.80          | 23.80                 | 23.80          | 59.78 |
| s820     | 2.00                  | 2.16                  | 2.63           | 1                     | 2                        | 2              | 47.00          | 40.38                 | 40.38          | 76.06 |
| sand     | 2.20                  | 2.23                  | 2.72           | 2                     | 2                        | 2              | 0.64           | 0.64                  | 0.64           | 12.20 |
| bbtas    | 0.38                  | 0.66                  | 0.80           | 1                     | 1                        | 1              | -32.80         | -32.8                 | -32.8          | 39.71 |
| dk27     | 0.37                  | 0.62                  | 0.75           | 1                     | 1                        | 1              | -40.00         | -40.00                | -40.00         | -6.48 |
| planet   | 1.82                  | 2.14                  | 2.60           | 1                     | 1                        | 2              | -38.15         | -38.15                | -17.19         | -     |
| s832     | 2.08                  | 2.18                  | 2.65           | 1                     | 2                        | 2              | 38.65          | 36.15                 | 36.15          | 75.78 |
| ex4      | 0.86                  | 1.10                  | 1.34           | 1                     | 1                        | 1              | -24.00         | -24.00                | -24.00         | 54.71 |
| donfile  | 1.14                  | 1.42                  | 1.73           | 1                     | 1                        | 1              | 4.31           | 4.31                  | 4.31           | -     |
| AVERAGE  |                       |                       |                |                       |                          |                | 5.31           | 4.69                  | 1.87           | 44.92 |

Table 1:wakeup time, boundary depth and power saving for different size of sleep transistor

*Result comparison with the literature:* Our power gated result is being compared with the result of [3]. It may also be noted that in [3] authors have developed the power model without considering power of both the sub-FSM when state is making transition from one sub-FSM to other and also they have considered boundary depth 2. In our work we have considered power of both the sub-FSM when there is a state transition from one to other sub-FSM. To account this effect, two extra terms  $P_D(F_1,F_2) \ E_{sal}$  and  $P_D(F_2,F_1)E_{sa2}$  are considered in developing the power model (Eq. (3)), described in Section 4. These two terms are not there in the power model of [3]. Due to the presence of these additional terms in the power model (Eq. 3), the results are worse than the results in [3] for the same boundary depth.

## 6. CONCLUSIONS AND FUTURE SCOPE

We have presented an efficient technique for synthesizing the FSMs using power-gating targeting dynamic power saving by reducing switching activity. The idea of combined partitioning and state encoding is introduced in the synthesis process in the genetic algorithm formulation. Existing power model has been modified to account the power consumption during cross transition. As boundary depth depends on wakeup time of sub-machine and this wakeup time differs for different size of the sleep transistor and sub-machines, for a particular size of sleep transistor, we have determined boundary depth. The technique worked well as verified by the experimentation with a number of benchmark circuits. Though, in general powergating results in good leakage power reduction as in [3], in this paper, leakage power result has not been presented. Low power partitioning and encoding of FSM may increase the area of final circuit. Area result has not been reported in this work. Our future works include, doing experiments to find the impact of switching and leakage for this power gated design of a practical circuit. The FSM partitioning can be extended to multipartitions, effectiveness of this technique may be examined targeting multilevel circuit synthesis. The synthesis may targets testability characteristic of the integrated circuit and finally other scope may be to extend the combined partitioning and state encoding process of power gated design to thermal aware FSM synthesis.

### 7. ACKNOWLEDGMENTS

This work was supported by RPS project (Ref. No.: 8023/RID/RPS-24/(NER)2011-12) sponsored by All India Council of Technical Education, New Delhi – 110 001.

### 8. REFERENCES

- [1] G. Venkataraman, S.M. Reddy, I. Pomeranz, "GALLOP: Genetic Algorithm based Low Power FSM Synthesis by Simultaneous Partitioning and State Assignment", Proceedings of 16th IEEE conf. on VLSI design, pp.533-538, (2003).
- [2] B. Liu, Y. Cai, Q. Zhou, J. Bian, X. Hong, "FSM decomposition for power gating design automation in sequential circuits", Proceedings of the ASICON, pp. 862-865, (2005).
- [3] S. N. Pradhan, M. Tilak Kumar and S. Chattopadhyay, "Low power FSM synthesis using Power-gating", Integration, the VLSI Journal, Vol. 44, No. 3, pp. 175-184, (2011).
- [4] T. Villa, A. S. Vincentell, "NOVA: State Assignment of Finite State Machines for Optimal Two-Level Logic Implementation", IEEE transactions on CAD. VOL. 9 NO. 9. pp. 905-924, Sep. (1990).
- [5] A.T. Freitas, and A.L. Oliveira, "Implicit resolution of the Chapman-Kolmogorov equations for sequential circuits: an application in power estimation", Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 764-769, (2003).
- [6] D. E Goldberg and J. H. Hollend, "Genetic Algorithms in Search, Optimization and Machine Learning", Addison-Wesley, (1988).
- [7] J. Seomun, I. Shin and Y. Shin ,"Synthesis and implementation of active mode power gating circuits", Proceedings of Design Automation Conference (DAC), 47th ACM/IEEE (2010).
- [8] S.Chattopadhyay and P.N Reddy, Finite state machine state assignment targeting low power consumption, IEE proc.-Comput. Digit. Tech., Vol. 151, No. 1, January 2004.