# Modified Booth Multiplier with Carry Select Adder using 3-stage Pipelining Technique

Kulvir Singh Research Scholar,ACSD C-DAC, Mohali, Punjab Dilip Kumar Sr.Design Engineer, ACSD C-DAC, Mohali, Punjab

## ABSTRACT

This paper presents a high-speed and low area  $16 \times 16$  bit Modified Booth Multiplier (MBM) by using Carry Select Adder (CSA) and 3-stage pipelining technique. These techniques improve the performance of MBM by reducing the delay time. Simulation results show that the delay is reduced by 56% and the number of SLICES and LUT's are reduced by 4% respectively as compared to high speed MBM. The multiplier circuit is designed using VHDL and simulated using Xilinx ISE Simulator. The power metric of the MBM is evaluated using Cadence tools.

## **Keywords**

Carry select adder (CSA), pipelining, Modified Booth Multiplier, Xilinx ISIM, Cadence.

# 1. INTRODUCTION

In today's world, electronic equipments play a very important role, like in mobile phone, laptops, tablets etc. These devices operate on their internal processor, RAM and hard disk. Binary and various arithmetic operations are performed by processor. The demand of fast processors is increasing for high-speed data processing [9]. The multipliers are the better option for high-speed data processing. Various algorithms proposed for multiplication are Booth Algorithm, Modified Booth Algorithm, Braun and Baugh-Wooley. In this paper, Carry Select Adder (CSA) with 3-stage pipelining technique is used for enhancing the performance and reducing the area of Modified Booth Multiplier (MBM). The architecture of Modified Booth Multiplier consists of 3-stages. First stage includes booth encoder and decoder circuit [1]. Second stage includes Wallace tree structure which is composed of unit adders and the last stage is composed of CSA. CSA is consists of two sections, one for higher order bits and other for low order bits. Selection of adder is based on Cin.

Any improvement in each section leads to improvement in the multiplier performance. As the number of stages increases, the power consumption and area gets increased. This drawback can be overcome by using CSA with 3-stage pipeline technique [2]. The block diagram of MBM using CSA and pipelining is shown in Figure 1.

This is paper organized section 2 describes Partial Product Generation, section 3 discusses about unit adders used in Wallace tree structures, final adder that is Carry Select Adder is discussed in section 4, simulation results and comparison are shown in section 5 and section 6 shows the conclusion.



Fig 1: Block Diagram of Modified Booth Multiplier

### 2. PARTIAL PRODUCT GENERATION

In an n-bit modified Booth multiplier, the number of Booth encoders is n/2 and the number of partial product generator (PPG) circuits is approximately  $n^2$  [1], hence power consumption and die area in the Booth section is dominated by PPG. So, integration of PPG (Booth Decoder) section is more important than Booth encoder (BE) block. The conventionally used modified Booth selector computes the partial product of j<sup>th</sup> bit and i<sup>th</sup> row by using the equation1.

$$PP_{ij} = (X_j, X1_2 + X_{j-1}, X1_1) XOR Neg - (1)$$

Where  $X_j$  and  $X_{j-1}$  are the multiplicand inputs of weight  $2_j$  and  $2_{j-1}$  respectively,  $X1_2$  and  $X1_1$  determine whether the multiplicand should be doubled or not and Neg is a digit which determines if the multiplicand should be inverted or not. Various operations of Booth Encoder are shown in Table

1. Booth encoder and decoder circuit are shown in Figure 2 [10].

| Z | Y <sub>2i+1</sub> | Y <sub>2i</sub> | Y <sub>2i-1</sub> | Operation | Neg | X1_2 | X1_1 |
|---|-------------------|-----------------|-------------------|-----------|-----|------|------|
| 1 | 0                 | 0               | 0                 | +0        | 0   | 0    | 0    |
| 1 | 0                 | 0               | 1                 | +X        | 0   | 1    | 0    |
| 0 | 0                 | 1               | 0                 | +X        | 0   | 1    | 0    |
| 0 | 0                 | 1               | 1                 | +2X       | 0   | 0    | 1    |
| 0 | 1                 | 0               | 0                 | -2X       | 1   | 0    | 1    |
| 0 | 1                 | 0               | 1                 | -X        | 1   | 1    | 0    |
| 1 | 1                 | 1               | 0                 | -X        | 1   | 1    | 0    |
| 1 | 1                 | 1               | 1                 | -0        | 1   | 0    | 0    |

Table1: Truth Table of Booth Encoder



Fig 2: (a) Booth Encoder circuit and (b) Booth Decoder circuit

This relationship consists of three parts that are respectively related to the area I, II and III in Booth encoder and Booth decoder. Function of sign changing in most of PPG circuits is performed by XOR gate, but in the proposed PPG an XNOR gate and an inverter in the output realize this function and low driving capability of transmission gates is overcame by output inverter.

#### **3. WALLACE TREE STRUCTURE**

Wallace Tree Structure can be made by using compressors, full adders and various other techniques. But in this paper the structure is made of unit adders instead of full adders. The unit adders or carry save adders reduce the number of partial products and sum rows [7]. The carry save adder increases the speed of Booth Multiplier structure. As in this the partial products are added sequentially. In this A+B+C+D=(A+B) + (C+D). That is A and B, C and D are added in parallel. And

then they are added together. They require only two full adder delays where as A+B+C+D requires three full adder delays. This is shown in Figure 3.



Fig 3: (a) Two adder delay level and (b) Three adder delay level

The Wallace Tree structure composed of unit adders is shown in figure 4. A unit adder is having four data inputs and one carry input. It generates sum bits and carryout as the outputs.



The unit adder structure for eight partial products is shown in Figure 5. In this the first two rows of unit adder adds the partial products. The first row adds partial products as P4, P3, P2 and P1and the second row adds the partial products P8, P7, P6 and P5. The third row adds the sum outputs from the first two rows and the carry output from the first column. This approach is faster than any other approach used for Wallace multiplier.



Fig 5: Carry Save Adder for eight partial products

## 4. FINAL CARRY SELECT ADDER

As adders are one of the most widely used components in integrated circuits, designing efficient adders has been the goal of much research in VLSI design. Ripple Carry Adders (RCAs) have the most compact design among all types of adders, they are the slowest types of adders, the other hand, Carry Look-ahead Adders (CLAs) are the fastest adders, but they are the worst from the area point of view. Carry Select Adders (CSAs) have been considered as a compromise solution between RCAs and CLAs because they offer a good tradeoff between the compact area of RCAs and the short delay of CLAs [10]. In CSA, one of the two adder blocks adds high bits. And the other adds the low bits. The conventional P-bit CSA consists of one P/2-bit adder for the lower half of the bits and two P/2-bit adders for the upper half of the bits. Of the two adders, one performs the addition with the assumption that Cin=0, whereas the other does this with the assumption that Cin=1. Using a multiplexer and the value of carry out that is propagated from the adder for the P/2 least significant bits, the correct value of the most significant part of the addition can be selected. Although this technique has the drawback of increasing the area but it speeds up the addition operation. The blocks of carry select adder are shown in Figure 6[11].



Fig 6: Carry Select Adder

#### 5. SIMULATION RESULTS

The simulation results of the Modified Booth Multiplier with Carry Select Adder using 3-stage pipelining technique are shown in figure 7, 8.9 and 10. Xilinx 12.4 navigator is used for synthesizing the code. The code is written using VHDL language. Xilinx ISE Simulator is used for the simulation results. Table 2 shows the comparison of Modified Booth Multiplier using CSA with MBM using CLA in terms of area and speed.



Fig 7: RTL view of MBM



Fig 8: Internal view of MBM



Fig 9: Multiplication of Signed and Unsigned Numbers

| Nane               | Value                                   | lps | [p | 2ps | ßps                                     | 4ps        | 5øs<br>Liuu Liuu | 6ps                                     | 7ps             | 8ps | 9ps |
|--------------------|-----------------------------------------|-----|----|-----|-----------------------------------------|------------|------------------|-----------------------------------------|-----------------|-----|-----|
| <b>) 開</b> 400     | -5                                      |     |    |     |                                         | 4          | 5                |                                         |                 |     |     |
| ▶ NH 1050]         | -15                                     |     |    |     |                                         | -1         | 5                |                                         |                 |     |     |
| [] productjiki]    | 225                                     |     |    |     |                                         | Z          | 5                |                                         |                 |     |     |
| j 2300 ₩ 4         | [0,0,0,0,0,240,0,-15]                   |     |    |     |                                         | [0,0,0,0,0 | HQQ; 15]         |                                         |                 |     |     |
| 🖌 🖉 producti (320) | 000000000000000000000000000000000000000 |     |    |     |                                         |            | 0000001100001    |                                         |                 |     |     |
| ) 📲 casig[21]      | (00000000000000000000000000000000000000 |     | [  |     |                                         |            |                  | 0.0000000000000000000000000000000000000 | 000000:::::0000 |     |     |
| ) 📲 cig[21]        | [000000000000000000000]                 |     | [  |     | 000000000000000000000000000000000000000 |            |                  | ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, | 000001000000    |     |     |
| ▶ 📲 ssig[20]       |                                         |     | [  |     |                                         |            |                  | 11111111111111                          |                 |     |     |
| ) 📲 rende bit(160) | 11111111111100010                       |     |    |     |                                         | 111111     | 11100010         |                                         |                 |     |     |
| la i               | 1000                                    |     |    |     |                                         | 11         | 0                |                                         |                 |     |     |
| la i               | 1000                                    |     |    |     |                                         | 11         | 0                |                                         |                 |     |     |
| a vide             | 10000                                   |     |    |     |                                         | 10         | 100              |                                         |                 |     |     |
| e deep             | 11                                      |     |    |     |                                         | 1          |                  |                                         |                 |     |     |
|                    |                                         |     |    |     |                                         |            |                  |                                         |                 |     |     |

Fig 10: Multiplication of Signed Numbers

Table 2: Comparison of MBM using CLA and CSA

| Logic Utilization | MBM<br>using<br>CLA | MBM<br>using<br>CSA | Percentage<br>reduction |  |
|-------------------|---------------------|---------------------|-------------------------|--|
| Number of slices  | 394                 | 377                 | 4.3%                    |  |
| Number of LUTs    | 749                 | 718                 | 4.1%                    |  |
| Delay Calculation | 51.92ns             | 22.38ns             | 56.8%                   |  |

It is clear from above results that the area and delay of  $16 \times 16$ bit MBM with CSA is reduced by 4.3 %, and 56.8% respectively.

The power of this modified Booth Multiplier is calculated on Cadence tools which show that there is a small increase in total power consumption of MBM with CSA as shown in Table 3.

pipeline register delay," Proceedings of IEEE MWSCAS, vol. 1, pp. 871-874, Aug. 2005.

- [4] A. D. Booth, "A signed binary multiplication technique", Quarterly J. Mechanical and Applied Math, vol. 4, pp. 236-240, 1951.
- [5] W. C. Yen and C.W. Jen, "High-speed booth encoded parallel multiplier design," IEEE Transaction on Computers, vol. 49, pp. 692–701, Jul. 2000.
- [6] C. S. Wallace, "A suggestion for parallel multipliers", IEEE Transaction on Electron and Computers, vol.13, pp. 14–17, Feb. 1964.
- [7] J. Fadavi-Ardekani, "M × N booth encoded multiplier generator using optimized Wallace trees", IEEE Transaction on Very Large Scale Integration (VLSI) System, vol. 1, pp. 120–125, 1993.
- [8] P. J.; De Michelli, G., "Circuit and Architecture Trade for High-Speed Multiplication", IEEE Journal Solid State Circuits, vol. 26, pp. 1184-1198, Sept. 1991.
- [9] V. Oklobdzija, "High-Speed VLSI Arithmetic Units: Adders and Multipliers in Design of High-Performance Microprocessor Circuits", Book Chapter, Book edited by A Chandrakasan, IEEE Press, 2000.
- [10] Soojin Kim and Kyeongsoon Cho., "Design of Highspeed Modified Booth Multipliers Operating at GHz Ranges", World Academy of Science, Engineering and Technology, 2010.
- [11] B. Ramkumar and Harish M Kittur, "Low-Power and Area-Efficient Carry Select Adder", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, pp. 371-375, Feb. 2012.

**Table 3: Power Calculation** 

| Parameters    | Power of MBM<br>using CLA(nW) | Power of MBM<br>using CSA(nW) |
|---------------|-------------------------------|-------------------------------|
| Leakage Power | 17779.032                     | 17784.481                     |
| Dynamic Power | 232994.712                    | 233266.434                    |
| Total Power   | 250773.744                    | 251050.915                    |

# 6. CONCLUSION

In this paper, a  $16 \times 16$  bit modified booth multiplier with 3stage pipelining technique is designed. Both the delay time and area of high Speed MBM which is found to be 51.92 ns, 394 slices is reduced to 22.38ns, 377 slices respectively using MBM with CSA. The simulation results prove that the designed architecture is more efficient than the conventional one in terms of area and delay. Therefore, designed MBM architecture is low area, high speed, simple and efficient for VLSI hardware implementation. Due to efficient performance of designed MBM with CSA, a tradeoff occurred between delay, area and power. There is a minute increase in power consumption of designed MBM with CSA as compared to high speed MBM.

# 7. REFERENCE

- Wen-Chang Yeh and Chein-Wei Jen, "High-speed Booth encoded parallel multiplier design", IEEE Transaction on Computers, vol. 49, pp. 692-701, July 2000.
- [2] Hwang-Cherng Chow and I-Chyn Wey, "A 3.3V 1GHz high speed pipelined Booth multiplier", Proceedings of IEEE ISCAS, vol. 1, pp. 457-460.,May 2002.
- [3] S. B. Tatapudi and J. G. Delgado-Frias, "Designing pipelined systems with a clock period Approaching