# Efficient Hardware Reduction of FIR Filter for Parallel Data in Mixed Signal Processing

## Jayavrinda Vrindavanam

Dept. of Electronic &Comp. Engineering Caledonian College of Engineering Muscat, Sultanate of Oman Jayavrindav@gmail.com

# Muscat, Sultanate of Oman

## Resel Parameswaran

Dept. of Electronic &Comp. Engineering Caledonian College of Engineering

## **ABSTRACT**

The paper presents a novel method of structure for designing and implementation of hardware reduced FIR filter in mixed signal processing domain for parallel data processing. Supported by a review of the literature, the paper demonstrates that the proposed methodology is superior, economical and can be applied in to applications like biomedical and audio signals. The results show improved performance and cost reduction, which has practical implications in terms of applications.

#### **General Terms**

Subband, passband, lowpass filter, mixed signal domain, ADC, DAC, FIR, Hardware reduction, parallel data in mixed signal processing, efficient hardware reduction.

#### **Keywords**

Mixed Domain, Hardware FIR, Multipliers, adders, integrator, summing, integrator, scaling resistors, decimation algorithm, stopband, and sinusoidal wave.

### 1. INTRODUCTION

The traditional finite impulse response (FIR) has been evolved in the digital domain with an analog to digital converter (ADC) of n-bit, number of delays, multipliers and adders. In this methodology, multipliers and adders are the most complex part of the processing. In each block the number of bits required depends upon the ADC resolution; i.e., if an eight bit ADC is used, same number of bits of delays, adders and multipliers are required. Resultantly, the hardware requirement is enhances along with the increase in complexity. Further, if implemented with a processor, the software solution cannot be reduced beyond a certain level.

Based on the above premise the authors had earlier introduced single bit approach for the efficient hardware reduction of FIR in the mixed signal processing domain [1]. In this paper, an

"Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than IJCA must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, needs an acknowledgement to IJCA."

attempt has been made to eliminate the hardware complexity by using intermediate D/A conversions and multiplier free operation from the digital domain. In the proposed approach, the traditional circuit is replaced by R-2R network and scaled

summing amplifier. The proposed system is designed to work in a mixed domain, which we consider as a novel approach to address the existing complexities.

After this brief introductory part , the section II provided a review of the current literature in this area. Section III explains the conventional FIR and Section IV elaborates the proposed system. While Section V on results and discussion, the last section VI is on conclusions and future work.

## 2. REVIEW OF THE LITERATURE

Being the critical part of the theoretical advancement and implementation, FIR filter design continues to be a critical area of on-going research activities. There are many studies which have looked into the hardware simplification requirements and the software ease from this perspective. Gentili and Piazza [12] presented an efficient genetic approach to the design of digital FIR filters with coefficients constrained to be sums of power-of-two terms. To obtain such efficiency (reduction of computational costs and improvement in performance), a specific filter coefficient coding scheme was studied and implemented. The results were then compared experimentally with other techniques, which proved that proposed technique is able to attain results as good as or better than the other methods. Moreover, the authors presented that it can be easily implemented on parallel hardware. The paper has, therefore, demonstrated how to reduce computational cost and to show performance improvement, though much simplification of filter hardware could not be achieved.

In a study undertaken by Vinod [2], after reviewing certain specific previous studies suggested that, vertical common sub-expression elimination (CSE) method does not guarantee hardware reduction over the conventional horizontal CSE method in practical linear phase FIR filter implementations. Alternatively, the paper has introduced a revised methodology to implement FIR filters with a minimum number of adders by efficiently combining horizontal and vertical common sub expressions. Shahramian and Carusone [3] performed partial equalization within the pipelined A/D, thus, reducing the number of taps required for the FIR filter. The paper showed a saving of 54 flip-flops at the cost of -32dB of SNR error for a typical Ethernet channel operator.

Our literature review reveals that Chao and Parhi have made considerable contribution in this filed with their proposition for ensuring hardware reduction. The authors [4] introduced an iterated short convolution (ISC) algorithm, based on the mixed radix algorithm and fast convolution algorithm. This ISC-based linear convolution structure is transposed to obtain a new hardware efficient fast parallel FIR filter structure, which saves a large amount of hardware cost, especially when the length of the FIR filter is large. The paper stated that for a

576-tap filter, the proposed structure saves 17% to 42% of the multiplications, 17% to 44% of the delay elements, and 3% to 27% of the additions, of those of prior fast parallel structures, when the level of parallelism varies from 6 to 72. In yet another paper [5], the authors have introduced a new scheme to further reduce the complexity of parallel FIR filter structures by proposing an 'iterative short convulsion' algorithm. The linear convulsion structure for FIR filter is used as a processing core to implement the sub filters instead of the parallel FIR filter structures. The authors have shown good savings in hardware by the new method. In an extension study, [6] the authors had proposed a new parallel FIR Filter structure with less hardware complexity. In this case, the subfilters in the parallel FIR structures are replaced by a second stage parallel FIR filter. The authors demonstrated that the 2-stage parallel FIR filter structures can efficiently reduce the number of required multiplications and additions at the expense of delay elements. For a 32-parallel 1152-tap FIR filter, the second stage parallel FIR can save 5184 multiplications (67%), 2612 additions (30%), compared to previous parallel FIR structures, at the expense of 10089 delay elements (-133%). Though the approach has brought in considerable level of hardware reduction, the digital domain structure continue to pose processing complexity as the parallel number of adders still exist.

Hai et. al [7] studied the delay properties of uniformly modulated FIR filterbanks. When the perfect reconstruction conditions are relaxed, the paper has developed conditions on the permissible delay for certain class of filterbanks and thereafter accurate linear approximations for the phase and the group delay of the total filterbank were derived. Further, the paper has proposed a tractable quadratic optimization problem for the design of optimal analysis and synthesis filter prototypes. This has supported the development of a new algorithm to solve this optimization problem for the analysis and synthesis filterbanks simultaneously. Dusan [8] attempted to derive a lower bound on filter degradation by considering a general case of a length N filter with a discrete set of allowable coefficients. Further, the paper has presented a theorem that gives the lower bound on the increase in minimax approximation error that is caused by the finite word length restriction; thus it gives a theoretical limit on the performance of a given FIR filter length N and can also be used to significantly reduce the amount of algorithm for optimal word length FIR filter design. The author has demonstrated that the approach significantly reduces the amount of computation. Mehboob et. al [9] in a recent paper has presented yet another method of FIR filter design for optimized hardware implementation. The paper analysed the effects of quantization on frequency response of a filter by successively reducing the number of bits in each coefficient and thereafter introduced a methodology for an optimized design of an FIR filter for hardware implementation. The results have shown a significant reduction in hardware resources can be achieved using this methodology. Further, the focus of the study has been to reduce the number of bit to represent the coefficients, but process complexity still exists. Tayab et. al [10] in a recent work have examined the performance area trade-offs in the design of a short word length FIR filter for a Sigma Delta Modulated FIR filter designed with varying quantization levels. The paper concluded that that, the tradeoffs between hardware area and performance at varying quantization levels and at oversampling ratios of 32 and 64. Using a low-cost FPGA device the SQNR of the filter may be increased by 6-dB at the cost of a increased hardware but a reduction in FMAX of only about 10 percent. Typically, each doubling of OSR increases SQNR by over 9dB at the cost of a doubling in hardware area.

As has been evident, hardware reduction was not the key focus of this study.

Tsao and Choi [11] with the support of a fast FIR algorithms (FFA) introduced new parallel FIR filter structures, which are beneficial to symmetric convolutions in terms of the hardware cost. The proposed parallel FIR structures exploit the inherent nature of the symmetric coefficients reducing half the number of multipliers in subfilter section at the expense of additional adders in preprocessing and post-processing blocks. The paper stated that exchanging multipliers with adders is advantageous because adders weigh much less than multiplier in terms of silicon area, and besides the overhead from the additional adders in preprocessing and post processing blocks stay fixed, not increasing along with the length of the FIR filter, whereas the number of reduced multipliers increases along with the length of the FIR filter. The authors stated that the proposed parallel FIR structures can lead to significant hardware savings for symmetric coefficients from the existing FFA parallel FIR filter, especially when the length of the filter is

As has been evident from the previous advancements as per the literature, though considerable progress has been made in hardware vis-a-vis software reductions, the frontiers of efficiency in hardware reduction could not move beyond certain limits, as the authors attribute the main limiting factor to the mono domain approach. The paper, therefore, postulates an alternative thesis to achieve better result through mixed domain approach so as to redraw the efficiency frontiers. The proposed methodology is explained in the ensuing sections.

## CONVENTIONAL FIR FILTER

In the normal FIR filter,

In the above equation y(n) is the sample output value at each sampling time. The normal FIR algorithm is shown in the Fig 1.

# Fig 1: Conventional FIR block diagram

As explained in the introductory section, the traditional FIR has been evolved in the digital domain with ADC of n-bit, number of delays, multipliers and adders. In this method, in each block, the number of bits required depends upon the ADC resolution; i.e., if an eight bit ADC is used, same number of bits of delays, adders and multipliers are required, consequently, hardware requirement keep increasing leading to increase in complexity. With the odd number of tapes, the coefficients are equivalent in both the sides with respect to the middle- tap (MT). Each delayed waveforms are exactly same as the input with equally spaced phase delay with respect to the middle-tap such that the left-taps (LTs) are in leading phase and right-taps (RTs) are in lagging phase. The addition of all the weighted (coefficients multiplied) lead and lag signals result into a signal which is either same or 180 degree out of phase to the middle-tap. The filter output will be the sum of three MT+LTs+RTs. Depending upon the required frequency response, the coefficients are derived in such a way so as to maintain the desired output passband and stopband levels. The paper attempts to address the complexity vis-à-vis efficiency through the proposed design structure, which completely eliminates the multiplier and adders by a resistive network and summing amplifier in a mixed domain.

#### PROPOSED FIR DESIGN

The FIR filter equation for the order N is;  $y(n) = b0x(n)+b1x(n-1)dt+ \dots bnx(n-N)$ ,

where, n = 0,1,2...N, b0 to bn are tap coefficients.

The continuous stream of samples of taps in time domain can be represented as under:-

$$\int_{0}^{\infty} y(n) = \int_{0}^{\infty} b0x(n) + \int_{0}^{\infty} b1x(n-1) + \dots \int_{0}^{\infty} bnx(n-N);$$

It can be discerned from the above that every term in the RHS is equal to the input signal multiplied with corresponding coefficient and delayed by one sample time. LHS shows that a common integrator in the output can produce this integral effect in tap. Equation also shows that the output sinusoidal wave is the sum of the sinusoidal waves coming out of all the taps with amplitude decided by the corresponding coefficient. The difference between tap output is that each of them are equally phase shifted and the phase shift is directly proportional to the input frequency as the delay remains constant. A simple R/2R network can convert the digitized and delayed tap data in to analogue and an integrator will change it from discrete to continuous.

In the Fig 1.with multi-bit ADC, each delayed parallel data output is converted into analog using DAC. The simple and traditional R/2R network in Fig. 2 can replace the need of DAC. The figure clearly shows that every stage of delay lines in the conventional FIR is replaced as the input data lines d0, d1...d7.of the R/2R network into analog. For demonstration, 5KOhms is used as resistance R and 10KOhms is used for 2R.



Fig 2: R/2R network for 8-bits DAC

The Fig 3 and Table 1 shows the FIR weighted taps and coefficient values rounded to 4 decimal points respectively. As may be observed from the figure, summed up filter output for an input of 1KHz sinusoidal wave for filter Order (N = 8), Fs = 10 KHz. Passband maximum frequency = 100 Hz, stopband minimum frequency = 3 KHz with 80 dB attenuation and ripple level of 1 dB.



Fig 3: Integrated delayed & weighted taps and filter output

Table 1. Resistance values corresponding to the Coefficients

| Coefficients    | Resistors (for F=1K) |
|-----------------|----------------------|
| 0.0078          | 128.21k              |
| 0.0393          | 25.45k               |
| 0.1009          | 9.9k                 |
| 0.1687          | 5.93k                |
| 0.1989(mid tap) | 5.03k                |
| 0.1687          | 5.93k                |
| 0.1009          | 9.9k                 |
| 0.0393          | 25.45k               |
| 0.0078          | 128.21k              |

Resistor values corresponding to the coefficients for an FIR Filter of N=8, Sampling frequency fs = 10k, Passband Maximum f = 100Hz, Stopband Minimum frequency = 3KHz attunation.80 dB and ripple 1dB.

The integrated samples emanating from each delay block is same as the input signal with corresponding scaling and time delay. When the frequency  $(f_{in})$  changes, timedelay  $(t_d)$  remains constant; whereas, the phase delay  $(\alpha)$  varies linearly. Following equation can represent the relationship between phase delay and time delay.

Phase ( $\alpha$ ) = 2 ×  $\pi$  × f<sub>in</sub> × delay time(t<sub>d</sub>).

Generally, in a digital filter system, memory is an essential requirement to store the coefficients. Delayed samples of taps are multiplied with corresponding coefficients and added together using either with discrete dedicated hardware multipliers and adders or with the help of multiplier accumulator (MAC) processors. The complete multiplication and addition process can be replaced with the summing amplifier as coefficient multiplication is nothing but the scaling of amplitude levels of each delayed/phase shifted signal. This property can be achieved in a summing amplifier with a gain of each input equals to coefficient value. If the coefficients are negative, inverted tap (IQ) is multiplied with positive coefficient or tap can be applied to the other input of the summer. The otherway of inverting the tap would be by inverting the MSB of the signed-magnitude formatted ADC data. Thus, the coefficients can be represented with attenuation resistors as shown in Fig 4.



Fig 4: Analog Summing Amplifier with scaling resistors

In the proposed design, the coefficients are stored as resistive gain setting networks. The resistance values, R0,....Rn are calculated using the following equation:

$$Rn = RF / bn$$
.

The equation of inverting summing amplifier can be used for the calculation of resistors corresponding to the coefficients. Table 2 lists both negative and positive coefficients. The resistance values corresponding to the coefficients are calculated with the above equation and for deriving the negative coefficients, the resistors are connected to the inverting terminal of the summing amplifier as given in the Figure 5.

Table 2. Resistance values corresponding to the coefficients

| Coefficients    | Resistors<br>(for RF=1K) |
|-----------------|--------------------------|
| -0.0247         | 40.49k                   |
| 0.0684          | 14.62k                   |
| -0.1274         | 7.85k                    |
| 0.1792          | 5.59k                    |
| 0.8002(mid tap) | 1.25k                    |
| 0.1792          | 5.59k                    |
| -0.1274         | 7.85k                    |
| 0.0684          | 14.62k                   |
| -0.0247         | 40.49k                   |

The Resistor values corresponding to the Coefficients for an FIR Filter of N=8, fs=10k, Passband Maximum f= 3kHz Stopband Minimum f= 5kHz, Attn.60 dB, Ripple 1dB.

# 3.1. Hardware Design of the Proposed System

Based on the proposed system methodology, the complete hardware reduced diagram of an FIR filter for parallel data processing is explained now. This can be summarized as delaylines ,R-2R Network , an Analog Summing Amplifier with scaling resistors and an optional integrator/lowpass filter. As evident from the methodology, the proposed system can work for all types of filters. As a case of experimentation, the

working of the proposed design in respect of LPF is explained (Figure 5). For an 8 bit ADC, sampling is done with the 10Khz (sampling frequency 'fs'). An input with maximum amplitude of 5V, delta is chosen to be 0.02 which is equivalent to 8 bit quantization, that is,  $5v/2^n - 1 = 5v/2^8 - 1 = 0.02v$  quantization level.



Fig 5: Block Diagram of the proposed system.

In the proposed system, if the clock frequency for delay block is divided by an oversampling factor, decimation filter can be designed as illustrated in Figure 6. Decimation filtering method can be used for LPFs when the cut off frequency is very much lower than fs/2. In such a case, fs can be considered as over sampling, which will eliminate the 'aliasing'. According to the decimation factor 'M', down sampling can be applied by using the clock of delay lines, which is divided by the same factor 'M'. Thus, the Figure 6 shows the proposed system with modified decimation algorithm.



Fig 6: Lowpass FIR Decimation filter block diagram

#### 5. RESULTS AND DISCUSSION

The results shows that designed system output frequency response matches with the normal FIR filter of same order in the digital domain. The waves form shown in Figure 2 is obtained at each tap points and output point for the LPF FIR of above mentioned specification in Section 4. Frequency response is shown in Figure 7 with passband of 100Hz and stop band frequency of 3000Hz, with an attenuation 80dB and ripple 1dB. Figure 8 shows, output response for filter Order (N = 8), Fs = 10KHz. Passband maximum frequency = 3000Hz, stopband minimum frequency = 5KHz with 60 dB attenuation and ripple level of 1dB. An analogue integrator at the output can smoothen the wave form at the end stage. Additional ADC is required, if further processes to be continued on digital domain.



Fig 7: Frequency response curve of FIR Lowpass filter (PB-100Hz-SB-3000Hz, Attenuation 80dB-ripple 1dB).



Fig 8: Frequency response curve of FIR Lowpass filter (PB-3000Hz-SB-5KHz,Attenuation 60dB-ripple 1dB).

### 6. CONCLUSION AND FUTURE WORK

As has been evident from the results, the FIR filter has been developed to work in bi-polar mode to overcome the saturation due to DC offset. The total gain of summing amplifier is adjusted to get the output without clipping. The system is mainly recommended for front end lowpass or bandpass filtering of biomedical signals and speech applications with decimation algorithm. The method can be further modified and made applicable for IIR filters also by using an additional ADC at the output of the summer amplifier along with further feedback delaylines.

As the proposed methodology does not cover digital multipliers and adders, this can be easily implemented in VLSI system economically with further reduction in hardware size and less complexity. The memory elements can be replaced with simple resistors or programmable resistors which can be easily and precisely integrated with modern technology available in the VLSI design. This makes the proposed system, which uses only discrete components like resisters, extremely economical and efficient than the existing system. Needless to state, the existing systems requires large number of ICs to perform multiplier and adder operations.

Apart from the experimental design explained in this paper, the authors are further studying the process as part of ongoing research with the objective of making the methodology further more efficient and also to include more applications within the ambit of this methodology.

# 7. ACKNOWLEDGMENTS

During the course of preparation of this novel approach, the authors have extensive interaction with subject experts. The authors gratefully acknowledge their valuable comments and views. The authors thank senior management of Caledonian College of Engineering, Muscat, for their encouragement and constant support.

# 8. REFERENCES

- [1] Resel,P., Jayavrinda, V., 2011; Efficient hardware reduction of FIR Filter in mixed signal processing; proceedings of IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS 2011),IEEE Xplore, China.
- [2] Vinod, A.P., Lai, E.M.-K., Premkuntar, A.B., Lau, C.T. 2003. FIR filter implementation by efficient sharing of horizontal and vertical common subexpressions; Electronics Letters, Issue NO. 2, pages 251-253.
- [3] Shahramian, S., Carusone, T.C. 2004. Hardware reduction by combining pipelined A/D conversion and FIR filtering for channel equalization; Proceedings of the 2004 International Symposium on Circuits and Systems.
- [4] Chao Cheng; Parhi, K.K.; 2004, Hardware efficient fast parallel FIR filter structures based on iterated short convolution, IEEE Transactions on Circuits and Systems I, Issue No 8, pages 1492-1500,
- [5] Chao Cheng., Parhi, K.K. 2005. Further complexity reduction of parallel FIR filters; ISCAS IEEE International Symposium on Circuits and Systems.
- [6] Chao Cheng Parhi, K.K. 2007, Low- Cost Parallel FIR Filter Structures With 2-Stage Parallelism, IEEE

- Transactions on Circuits and Systems, February, Pages 280-290.
- [7] Hai Huyen Dam, Sven Nordholm and Antonio Cantoni 2005; Uniform FIR filterbank Optimisation with Group Delay Specifications, IEEE Transactions on Signal Processing, Vol. 53, No 11, November.
- [8] Dusan M. Kodek, 2005. Performance Limit of Finite Wordlength FIR Digital Filters, IEEE Transactions on Signal Processing, Vol. 53, No 7, July.
- [9] Mehboob, R.; Khan, S.A.; Qamar, R' 2009; FIR filter design methodology for hardware optimised implementation, IEEE Transactions on Consumer Electronics, Issue NO. 3, IEEE Consumer Electronics Society, August.
- [10] Tayab D. Memon, Paul Beckett, Amin Z. Sadik 2009, "Performance-Area Tradeoffs in the Design of a Short Word Length FIR Filter," MEMS, NANO, and Smart Systems.

- [11] Tsao, Yu-chi, Choi, Ken, 2011, Hardware-efficient parallel FIR digital filter structures for symmetric convolutions, IEEE International Symposium on Circuits and Systems, May, Rio de Janeiro, Brazil, pages 2301-2304.
- [12] P. Gentili, F. Piazza, A. Uncini 1995; Efficient genetic algorithm design for power-of-two FIR filters, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol 2.
- [13] Fifth International Conference on MEMS, NANO, and Smart Systems, 2009; Fifth International Conference on MEMS NANO, and Smart Systems; pages. 67-71.