# Reduction of Hardware Complexity and Truncation Error by using Fixed Width Baugh Wooley Multiplier

P. Arun Pandian M .Tech VLSI Design Scholar, Dept. of ECE., Kalasalingam University, Krishnankoil, virudhunagar Dist, Tamil Nadu, India T.Vijaya Bharathi Assistant Professor Dept. of ECE., Kalasalingam University, Krishnankoil, virudhunagar Dist, Tamil Nadu, India

#### ABSTRACT

The fixed-width multiplier is attractive to many multimedia and digital signal processing systems which are desirable to maintain a fixed format and allow a little accuracy loss to output data. In this paper, we propose a new error compensation circuit in Baugh-Wooley multiplier by using the dual group minor input correction (MIC) vector to lower input correction vector compensation error. By constructing the error compensation circuit mainly from the "outer" partial products, the hardware complexity only increases slightly as the multiplier input bits increase. In the proposed 16 \*16 bits fixed-width multiplier, the truncation error can be lowered by compared with the direct-truncated multiplier and the transistor count can be reduced by compared with the fulllength multiplier.

#### **Keywords**

Baugh Wooley Multiplier, AO Block, Modified Half Adder, VHDL Simulation

#### 1. Introduction

In many high-speed digital signal processing (DSP) and multimedia applications, the multiplier plays a very important role because it dominates the chip power consumption and operation speed. In DSP applications, in order to avoid infinite growth of multiplication bit width, we usually have to reduce the number of multiplication products. Cutting off n-bit less significant bit (LSB) output can construct a fixed-width multiplier with n-bit input and n-bit output. However, truncating the LSB part leads to a large number of truncation errors. Many truncation error compensation techniques have been presented to design an error compensation circuit with less truncation error and less hardware overhead. The compensation methods can be divided into two categories: compensation with constant correction value and compensation with variable correction value. The circuit complexity to compensate with constant corrected value can be simpler than that of variable correction value; however, the variable correction approaches usually can be more precise. In the approaches with variable correction value, literature proposed an input- dependent method by using probability, statistics, and linear regression analysis to find the approximate compensation value. The error compensation circuit is constructed by the partial product terms with the most-significant weight in the least-significant segment. The compensation value is dependent on the input number and thus has less truncation error.

In, the error compensation algorithm made use of binominal distribution instead of uniform distribution used in to model the probability of occurrence of multiplier inputs. This modification can bring a more precise error compensation result. Moreover, the compensation vector in can directly inject into the fixed-width multiplier as compensation, which does not need extra compensation logic gates. Therefore, the fixed-width multiplier area can be smaller than other truncated multiplier. In a two-dimensional conditional estimation method was proposed to compensate truncated error based on both the dependency among the partial product terms and multiplication inputs. The error compensation can be more precise; however, the hardware is too complex. In multipleinput error compensation vector designs were proposed to further enhance the error compensation precision. Unlike to set the same weight for each partial product terms in the input correction vector, they applied different weights to each input correction vector element.

In fixed width multiplier "inner" partial products were designed to have a higher weight with respect to "outer" partial products. The IC vector was divided into two disjoined sets with dual addition trees to compute the error compensation value. In this way, the compensation value can be more approximated to the expected results. Hence it performed better results in terms of error compensation. Recently, the design in was further extended. In a parallel configurable error-compensation circuit was proposed to perform nearly the same error compensation precision as, but with lower computation delay.

In a variable correction to include the partial products of LSB part was proposed to trade-off between hardware complexity and error compensation precision. Nowadays the state-of-thefixed-width multiplier designs that can perform lower error with efficient hardware. In this paper, we consider the impact of truncated products with the second most significant bits on the error compensation, which is similar to but with lower hardware complexity. We propose a new error compensation circuit by using the dual group minor input correction (MIC) vector to further lower IC vector compensation error. By utilizing the symmetric property of MIC, fan-in can be reduced to half and hardware in up-MIC and down-MIC can be shared. As compared with the state-of-the-art design in, the proposed fixed-width multiplier not only performs with lower compensation error but also with lower hardware complexity, especially as multiplier input bits increase.

#### 2. ERROR COMPENSATION CIRCUIT DESIGN BY USING THE DUAL-GROUP MINOR INPUT CORRECTION VECTOR

Baugh-Wooley array multiplier with two unsigned -bit inputs of and, which are shown as

$$X = \sum_{i=0}^{n-1} x_i 2^i$$
  $y = \sum_{j=0}^{n-1} y_j 2^j$ 

The multiplication result is the summation of partial products of xi yj, which is shown as

$$P = \sum_{i=0}^{n-1} p_k . 2^k = \sum_{i=0}^{n-1} \sum_{i=0}^{n-1} x_i y_j . 2^{i+j}$$

The full-length n -bit unsigned Baugh-Wooley partial product array can be divided into three subsets of most significant part (MSP), IC vector and less significant part (LSP) as shown in Fig. 1. To evaluate the accuracy of a fixed-width multiplier, we can exploit the difference between the n-bit fixed-width multiplier output and the 2n-bit full-length multiplier output, which is expressed as

$$= p-p_t$$

Where P is the output of the complete multiplier, and Pt is the output of the fixed-width multiplier. Pt can be expressed as

$$P_{t} = \sum_{j=1}^{n-1} y_{j} 2^{j} \sum_{i=n-j}^{n-1} x_{i} 2^{i} + f(x_{0}y_{n-1}, x_{0}y_{n-2}, ..., x_{n-2}y_{1}, x_{n-1}y_{0})$$
$$= \sum_{j=1}^{n-1} y_{j} 2^{j} \sum_{i=n-j}^{n-1} x_{i} 2^{i} + f(IC)$$



Fig 1: n- bit baugh wooley multiplier partial product array consists of MSP,IC and LSP

Where f (IC) is the error compensation function. In the error compensation function f (IC) is approximated as the sum of input correction vector with corresponding weight. To realize f (IC), the error compensation vector is divided into two disjoined sets and uses two addition trees to compute the error compensation. The error compensation algorithm can be developed as

$$f(IC) - \begin{cases} 0 & \text{if } \beta = 0 \\ \beta, & \text{if } \{x_{n-1}y_0 + x_{n-2}y_1 + x_1y_{n-2} + x_0y_{n-1} = 0, 1\} \\ \beta - 1, \text{if } \{x_{n-1}y_0 + x_{n-2}y_1 + x_1y_{n-2} + x_0y_{n-1} = 2, 3\} \\ \beta - 2, \text{if} \{x_{n-1}y_0 + x_{n-2}y_1 + x_1y_{n-2} + x_0y_{n-1} = 4\} \end{cases}$$

Where  $\beta$  is the summation of all partial product terms in the input correction vector. The first addition tree, which is devoted to lower weight partial products, is a standard onecounter constructed by using full adders and half adders. The lower weight partial products of IC include the most external four partial products, which are, x5 y0, x4 y1, x1 y4, and x0 y5 in the 6-bit multiplier, having a weight of  $2^n$  in error compensation. As for the second addition tree, it utilizes modified half-adders (mHAs) to take into account the contribution of partial products with higher weights. There is higher weight partial products of IC include the other internal partial products, which are x3 y2, and x2 y3 in the 6-bit multiplier, having a weight of 2n-1 in error compensation. In [6], the difference between mHA and standard HA is that when inputs of A and B are both 1, sum=1 and cout=1 in mHA instead of sum=0 and cout=0 in standard HA.

#### 3. PROPOSED ERROR WITH COMPENSATION METHOD

The state-of-the-art designs that can perform the most precise error compensation with efficient hardware among the previous published fixed-width multipliers. However, there are still some compensation errors  $|\varepsilon| > 2n-1$  existing in [6]. The compensation errors can be divided into two categories: the first type is caused by insufficient error compensation, in which output Pt is smaller than ideal value P . In this case,  $\varepsilon =$ P - Pt > 0. On the other hand, the second type is due to over error compensation, in which output is larger than ideal value. In this case,  $\epsilon = P - Pt < 0$ . To consider both approximation error and circuit complexity, we mainly aim at dealing with the case of  $|\varepsilon| > 2n-1$  in this paper. The weight of IC compensation circuit is 2n. We cannot correct all the cases of  $|\varepsilon| > 2n-1$  effectively if we only apply the partial product terms in IC to construct the error compensation function. Therefore, in this paper we adopt IC together with MIC, where MIC is the partial product vector with the MSB of LSP, to amend the error compensation value of f(IC). In this way, the cases of  $|\varepsilon| > 2n-1$  can be reduced effectively.

IC compensation circuit is constructed by dual IC compensation trees, which are the "inner" partial products with higher compensation weight and the "outer" partial products with lower compensation weight. According to the relation of IC and Sav g(IC)  $|\epsilon| > 2n-1$  in Table I,  $|\epsilon|$  we can find out that the average compensation errors in the outer part and inner part are nearly the same, where the average compensation error is 0.0285 in the outer part and it is 0.0300 in the inner part. Here Savg (IC) is the average value of sum of the IC and LSP partial products. However, the number of partial product items with higher weight will increase with the number of bits, while the number of partial product items with lower weight is fixed. Therefore, we only analyze the error compensation tree with lower weight to find out the cases of  $|\varepsilon| > 2n-1$ . the performance of the multiplier can be increased by reducing the carry delay through the half-adder array. To further reduce the compensation errors, we combine IC with MIC to correct the under under-compensated and overcompensated cases. Similarly, a variable correction to include the more partial products columns of LSB part is proposed in [10] to enhance error compensation precision; however, the hardware complexity will increase accordingly.



Fig. 2 Baugh-wooley multiplier partial product array

In our proposed design, we adopt the dual-group MIC vector to further lower the compensation errors in [8] with lower hardware complexity. In order to further reduce the circuit complexity, we apply De-Morgan's law to simplify the proposed error compensation circuit in Cn-1 and Cn. After simplifying through DeMorgan's law and hardware sharing, the transistor count in our proposed error compensation circuit can be reduced from 62 to 40. Here after we combine the IC with MIC to adjust the function of f(IC) to make the compensation error lower than 2n-1. In this way, the error compensation error can be lowered more efficiently.

To find out a precise error correction vector, we analyze the sum of total errors in the cases of  $|\epsilon| > 2n-1$  and  $|\epsilon| < 2n-1$  under various  $\beta$  values in accordance with the compensation algorithm in (5). In order to achieve an efficient error correction, we only amend the error compensation function f(IC) under the cases that the total error summation value of  $|\epsilon| > 2n-1$  is larger than that of  $|\epsilon| < 2n-1$ .

Baugh-wooley multiplier partial product array consists of MSP,IC and LSP in which is the partial product vector with the most significant bits of LSP .By comparing the error summation value of  $|\varepsilon| > 2n-1$  with that of  $|\varepsilon| < 2n-1$  in Table II, it can be observed that some under-compensated errors occur when  $\beta=2$  and  $\beta=4$ . As a result, we combine IC with MIC to correct the under-compensated situations under the cases of  $\beta=2$  and  $\beta=4$ . As for the case of  $\beta=1$ , there exists some over compensation errors. However, the total error summation value of  $|\varepsilon| > 2n-1$  is about the same with that of  $|\varepsilon| < 2n-1$ . We combine IC with MIC to correct the over compensations under the case of  $\beta=1$  and SICh $\neq$ 0 instead of the case of  $\beta=1$ 



Fig. 3 Baugh wooley multiplier for 16bit

only since in such case the error summation value of  $|\epsilon| > 2n-1$  is much lower. Here SICh is the summation of IC that with higher weight, which can be written as

$$S_{ICH} = X_{n\text{-}3}y_2 + X_{n\text{-}4}y_2 + \ldots + X_3y_{n\text{-}4} + Xy_{n\text{-}3}$$

The lower unit with the second most significant bits of truncated partial products is adopted as minor input correction (MIC) vector to reduce the compensation error, which is defined as

In general, to achieve lower compensation error needs more complex compensation algorithm and more complicated circuit hardware. In this paper, we combine IC with MIC to adjust the function of to lower the compensation error. We also analyse the error compensation tree only with lower weight to and out the cases of in our proposed design. Therefore, circuit complexity in the most error compensation circuit is fixed, which will not increase along with input bit number. As a result, the error compensation circuit can be relatively simple, especially as the input bit number increases. As illustrated in Fig. 5, the slope of transistor count increasing as the fixed width multiplier input number increases is gentler in our proposed design. Though in our proposed design we must spend more transistor count in the 16-bit fixed-width multiplier, we spend less transistor count in the cases of input bit number are larger than eight. The superiority in areaefficiency in our design is more obvious as input number increases.



# Fig 4: MIC is divided up to up –MIC, medium term, down MIC

The Fig 4 define Sup-MIC and Sdown-MIC as summation of up-MIC and down-MIC, respectively. And it can be expressed as

 $S_{\text{UP-MIC}} = x_{n-2}y_0 + x_{n-3}y_1 + \dots + x_{n/2}y_{(n/2)-2}$ 

## $S_{DOWN-MIC} = x_{(n/2)-2}y_{(n/2)} + \ldots + x_1y_{n-3} + x_0y_{n-2}$

### 4. CONCLUSION

In 16×16 Baugh Wooley Multiplier has a low-error and areaefficient fixed-width multiplier by using the dual group minor input correction vector. Compared with the state-of-the-art design in the proposed fixed-width multiplier performs not only with lower compensation error but also with lower hardware complexity, especially as multiplier input bits increase.

### 5. REFERENCES

- S. S. Kidambi, F. El-Guibaly, and A. Antoniou, "Areaefficient multipliers for digital signal processing applications," IEEE Trans. Circuits Syst. II, Exp. Briefs, Feb. 1996, vol. 43, no. 2, pp. 90–95.
- [2] J. M. Jou, S. R. Kuang, and R. D. Chen, "Design of lowerror fixed-width multipliers for DSP applications," IEEE Trans. Circuits Syst. II, Exp. Briefs, Jun. 1999, vol. 46, no. 6, pp. 836–842.
- [3] S. J. Jou and H. H. Wang, "Fixed-width multiplier for DSP application," in Proc. IEEE Int. Symp. Comput. Design, 2000, pp. 318–32.
- [4] F. Curticapean and J. Niittylahti, "A hardware efficient direct digital frequency synthesizer," in Proc. IEEE Int. Conf. Electron., Circuits, Syst., 2001, vol. 1, pp. 51–54.
- [5] A. G. M. Strollo, N. Petra, and D. D. Caro, "Dual-tree error compensation for high performance fixed-width multipliers," IEEE Trans. Circuits Syst. II, Exp. Briefs, Aug. 2005, vol. 52, no. 8, pp. 501–507.
- [6] Y. C. Liao, H. C. Chang, and C. W. Liu, "Carry estimation for two's complement fixed-width multipliers," in Proc. Workshop Signal Process. Syst., 2006, pp. 345–350.
- [7] S. R. Kuang and J. P. Wang, "Low-error configurable truncated multipliers for multiply-accumulate applications," Electron. Lett., Aug. 2006,vol. 42, no. 16, pp. 904–905.
- [8] N. Petra, D. D. Caro, V. Garofalo, N. Napoli, and A. G. M. Strollo, "Truncated binary multipliers with variable correction and minimum mean square error," IEEE Trans. Circuits Syst. I, Reg. Papers, Jun. 2010, vol. 57, no. 6, pp. 1312–1325.
- [9] Jiun-Ping Wang, Shiann-Rong Kuang, "High-Accuracy Fixed-Width Modified Booth Multipliers for Lossy Applications" IEEE Transactions on very large scale integration (vlsi) systems, January 2011, vol. 19, no. 1.