# Hardware Design of 2-D High Speed DWT by using Multiplierless 5/3 Wavelet Filters Husain.K.Bhaldar Department of Electronics & telecommunication College of Engineering Pandharpur. V.K.Bairagi Department of Electronics & telecommunication Sinhgad Academy of Engineering, Kondwa, Pune R.B.Kakkeri Department of Electronics & telecommunication Sinhgad Academy of Engineering, Kondwa, Pune #### **ABSTRACT** This paper represents the hardware implementation of high speed DWT using details of 5/3 wavelet filters for image compression applications. Wavelets also find application in speech compression, which reduces transmission time in mobile applications. The main aim of this work was to show that great complexity reduction with excellent performance can be achieved by multiplier less implementation of DWT on FPGA using 5/3 wavelet filters. DWT performs multiresolution analysis which enables to have a scale-invariant interpretation of image. To optimize high speed and memory requirement, we propose novel VLSI architecture for 2D DWT using Conditional Carry Adder ## **Key words** Discrete wavelet transform (DWT), 5/3 filter, Conditional Carry Adder, Xilinx & FPGA Board. #### 1. INTRODUCTION Digital images are widely used in various fields like computer application. Uncompressed digital images capacity transmission considerable storage and bandwidth. Therefore efficient image compression solutions are more critical with the recent growth of data intensive, multimedia based web application. Data can be represented using different basis so that new representation will reveal the correlation. Compression is achieved by calculating transform associated with basis, by setting coefficients to zero those are below threshold and followed by lossless encoding like entropy coding of non-zero coefficients. Optimal basis can be found out if we know correlation present in data a priory where basis is eigenvectors of correlation matrix. But it is strongly data dependent and calculation of eigenvectors has in general cubic complexity. However even with knowledge of basis, calculation of transform has a quadratic complexity. Hence this approach is unacceptable. A transform like DWT which is independent of data, works faster and is capable of removing correlation is chosen. This paper gives a new approach towards VLSI implementation of the 2-dimentional Discrete Wavelet Transform (DWT). DWT performs multi-resolution analysis (MRA) of a signal with localization in both time and frequency. Multi-resolution decomposition enables to have a scale-invariant interpretation of image [1]. Because of superior energy compaction and correspondence with human visual system, two-dimensional DWT (2D-DWT) has been proven to be a key operation in image processing [1]. The most important fact of this is that, it achieves the complexity of O(n)The 5/3 wavelet filters are suitable for image compression and there coefficients can be easily splits in 2's power. Hence can be implemented without multipliers. Since ASIC can offer higher processing speeds and better throughput besides consuming less silicon area, it would be the obvious choice for implementing architecture for computationally intensive signal processing related applications. However, it is appropriate to confirm the functionality and robustness of proposed architecture by prototyping and using reprogrammable devices (like FPGA's) before taping GDS files for ASIC processing [2]. Due to the availability of Xilinx programmable logic design suite it was further proposed to implement 2D DWT in suitable FPGA and carry out the extensive testing for the same. The main Objective of this Paper were: - 1.To decide a suitable architecture of DWT for VLSI implementation. - 2. Implementation and testing of 1D DWT and 2D DWT in FPGA using 5/3 filters. - 3. To propose and test a suitable architecture for 2D DWT from memory and speed point of view. #### 2. PREVIOUS WORK A lot of literature survey was carried out to understand basics of wavelet transform and its extension to DWT. Maurizio Martina and Guido Masera proposed multiplier less architecture using the advantage of biorthogonality. Architecture is derived using vanishing moment condition, impulse response of filters are arranged in lesser multipliers and made those multipliers in two's power. Also they designed 9/7 biorthogonal wavelets using 5/3 biorthogonal wavelets. Hazem H. Alietal. Recommended the use of mixed parallel and sequential architecture there by reducing the overall numbers of multipliers in comparison with only parallel structure [11]. Sang Yoon Park has presented a totally multiplier less lattice structure reducing the hardware complexity with improved stop band [10]. Distributed arithmetic approach for DWT implementation has been presented by Amit Acharyya which resulted into reduced memory requirement and power. # 3. WAVELET THEORY Over the past several years, the wavelet transform has gained widespread acceptance as an indispensable tool in many applications like computer graphics, numerical analysis, telecommunication, signal processing especially valued with audio/video signal processing. In applications such as image compression, Discrete Wavelet Transform (DWT) based schemes outperform generally used transform Discrete Cosine Transform (DCT). Wavelet provides a time-scale representation of signals as an alternative to traditional time-frequency representation. Wavelets are class of functions used to localize a given function in both time and frequency (space and scaling). They belongs to square integral functions and has zero mean. Family of wavelet is constructed from mother and daughter wavelet. Daughter wavelets are formed by translation and dilation of mother wavelet. Wavelet, Filter Bank and Multi-resolution Signal Analysis have been traditionally used independently in the fields of applied mathematics, signal processing and computer vision, respectively, have now been converged to form a single theory known as Mallats-Herringbone algorithm. According to it, DWT can be efficiently implemented using subband coding (SBC) scheme. Since lower resolution of signal can be computed by linear combination of higher resolution signals, this can be easily proved using refinement equations $$\phi(x) = \sum_{n} h\phi(n)\sqrt{2} \phi(2x - n) \qquad -----(1)$$ This gives $$W\phi(j,k) = h\phi(-n) * W(j+1,m) \Big|_{n=2k} \Big|_{k>0-----}$$ (2) $$W\varphi(j,k) = h\varphi(-n) * W(j+1,m) \mid_{n=2k, k \ge 0-----} (3)$$ These equations imply that wavelet can implemented using Filters as shown in Fig.1. Fig.1: Wavelet Decomposition. This makes easy to analyze wavelet without knowing the formula for mother wavelet. The process reduces the operation of averaging and taking their differences over & over. The 2D DWT can be implemented using 1D DWT applying row wise and then column wise operation as shown in Fig.2[3]. LPF h $\phi$ (-n) and HPF h $\psi$ (-n) along with down sampler forms analysis filter bank (FB). Fig. 2: Analysis FB for 2D DWT ### 4. FILTER BANK (FB) When signal is passed through analysis FB, it is split into two bands. The LPF extracts the coarse information of the signal. The HPF extracts the detail information of the signal. Output of the filter is then decimated by two so as to preserve the total number of samples at any level irrespective of the number of level of decomposition. The analysis bank yields two half length outputs. Synthesis FB consists of interpolator and filters. Filtered outputs are added to get the output as shown in fig.3. In between analysis and synthesis FB, subband signals are compressed or enhanced depending on requirement of application [3]. Fig. 3 Filter Bank Representation The filters used in multi-rate systems are linear time invariant (LTI) but up sampler and down sampler are time variant. These multi-rate operations i.e. down-sampling and upsampling are responsible for aliasing and imaging respectively leading to undesirable and extraneous signals. Filters are designed to remove the distortion and aliasing. All orthogonal wavelets, except Haar wavelet, are asymmetrical in nature. For image processing filters should be linear phase in nature but these wavelets are non-linear due to asymmetry. To overcome this new kind of wavelet is introduced known as biorthogonal wavelet. Biorthogonal wavelets have symmetrical filter coefficients. The biorthogonal functions come from iterating the synthesis bank[4]. Biorthogonal wavelets give invertible matrices and perfect reconstruction[4]. However they are not orthogonal. Hence they consist of some redundant data. Further they are useful for image processing due to their symmetrical coefficients resulting into linear phase. Biorthogonal wavelets can be constructed from B-spline wavelet basis[5]. B-spline of given order can be expressed as linear combination of scaled and translated version of itself i.e. it follows scaling relation but their integer translates, except Haar, are not orthogonal. Hence it becomes necessary to calculate its dual scaling function. JPEG 2000 consists of two types of wavelet filter banks, namely 9/7 biorthogonal filter and 5/3 biorthogonal filter. The former has lossy compression while the later has lossless compression. The numbers indicates the number of LPF and HPF filter coefficients respectively. The low frequency components (smooth variations) constitute the base of an image, and the high frequency components (the edges which give the detail) add upon them to refine the image, thereby giving a detailed image. The biorthogonal 5/3 also known as LeGall 5/3 wavelet as well as Cohen Daubechies -Feauveau wavelet (CDF) is derived from B-spline as given below [5]. JPEG 2000 uses 5/3 filter for lossless compression. The LPF coefficients of 5/3 filters are $\sqrt{2}$ [-1/8; 2/8; 6/8; 2/8;-1/8] and The HPF coefficients of 5/3 filters are $\sqrt{2}$ [-1/2; 1;-1/2]. The $\sqrt{2}$ is normalization factor. #### 5. THEORETICAL DERIVATION Normality of scaling function gives $$2\tilde{h}(0) + 2\tilde{h}(1) + 2\tilde{h}(2) = \sqrt{2} \dots \dots (4)$$ Biorthogonality of primal & translates of dual gives Vanishing moment condition gives $2\tilde{h}(0) - 2\tilde{h}(1) + 2\tilde{h}(2) = 0 \dots \dots (7)$ After solving the equations (4-7) we get the dual LPF coefficients i.e. LPF coefficients of analysis part. Similarly other coefficients can be calculated using biorthogonal wavelet properties [5]. The coefficients are tabulated below. Table1: Coefficients for the Biorthogonal 5/3 Filters. | n | $\mathbf{Z}^2$ | $\mathbf{Z}^1$ | $\mathbf{Z}^0$ | $\mathbf{Z}^{\cdot 1}$ | $\mathbf{Z}^{-2}$ | $\mathbf{Z}^{-3}$ | |---|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------| | ĥ | $\sqrt{2}.\frac{-1}{8}$ | $\sqrt{2}.\frac{1}{4}$ | $\sqrt{2}.\frac{3}{4}$ | $\sqrt{2}.\frac{1}{4}$ | $\sqrt{2}.\frac{-1}{8}$ | 1 | | ğ | | 1 | $\sqrt{2}.\frac{-1}{4}$ | $\sqrt{2}.\frac{1}{2}$ | $\sqrt{2}.\frac{-1}{4}$ | 1 | | h | | $\sqrt{2}.\frac{-1}{4}$ | $\sqrt{2}.\frac{1}{2}$ | $\sqrt{2}.\frac{-1}{4}$ | | | | g | | $\sqrt{2}.\frac{-1}{8}$ | $\sqrt{2}.\frac{1}{4}$ | $\sqrt{2}.\frac{3}{4}$ | $\sqrt{2}.\frac{1}{4}$ | $\sqrt{2}.\frac{-1}{8}$ | #### 6. PROPOSED ARCHITECTURE To implement 5/3 biorthogonal filter multiplier less architecture is proposed which reduces the complexity. Here factor of coefficients comes due to normality. It can be taken care in hardware at either analysis/synthesis part by 1 bit shifting since signal/data gets multiplied by at analysis as well as synthesis part which is effectively multiplication by 2. Coefficients can be converted in the powers of 2, by which only the bit shift operation can be used to implement multiplication. The total system is implemented using registers as a delay element, adders and shifters as MAC unit, multiplexers and demultiplexers as up and down Fig.4: Direct implementation of 5/3 filter Symmetry is exploited by adding input samples (xi) together to obtain w0=xi, w1=xi+1+xi-1 and w2=xi+2+xi-2. This Structure can be built using two approaches. In one approach wi values are first added together, then partial results are shifted and combined to obtain the output values [6]. In the second approach wi are first partially shifted and then added (This approach issued here). To obtain the 5/3 wavelet results, different shift amounts have to be applied to each wi. Polyphase structure can be implemented by separating even and odd samples using register followed by down-samplers. The noble identities justify this interchange of filtering and sampling [2]. In direct form half samples are rejected after convolution if we don't compute them then system woks faster so in poly phase input samples are separated into even and odd samples before convolution. Hence improves the processing. Complexity can be further reduced by Lifting Scheme [7]. Fig. 5 Implementation using registers, Adder and shifters The DWT architecture requires basic blocks like adders, shifters, registers. Here conditional carry adder is used since it gives better performance over Ripple Carry Adder (RCA), Conditional Input Adder, Carry Select Adder, Look Ahead Adder etc. The adders, registers, decimators & interpolators are designed for 16 bit input and output considering fixed point arithmetic (Q point format). It also gives modularity. So many level decomposition and 2D DWT can be easily implemented. # 7. IMPLEMENTATION DETAILS AND RESULTS The above architecture was implemented in Xilinx Spartan 3s400pq208-4 device. The Xilinx EDA tool ISE 10.1 was used for development purpose. The place and route (PAR) report generated by the tool is as given in table 2. The PAR delay reported was 15.93 ns which yields in approximately 64 MHz frequency. Device Utilization Summary after PAR: Total Delay: 15.593ns (6.132ns logic, 9.461ns route) | Number of BUFGMUXs | 1 out of 8 | 12% | |-------------------------|-----------------|-----| | Number of External IOBs | 39 out of 141 | 27% | | Number of LOCed IOBs | 38 out of 39 | 97% | | Number of Slices | 373 out of 3584 | 10% | | Number of SLICEMs | 0 out of 1792 | 0% | **Table 2: Device Utilization Summary.** The analysis and synthesis FB's were designed using VHDL, for both direct and polyphase architecture. The behavioral simulation of these filters was done using VHDL test bench and a waveform analyzer utility of ISE tool. These results were crosschecked against the MATLAB implemented results of same filters. Both the results were found to be almost matching with each other. After the satisfactory simulation and synthesis of the analysis and synthesis FB's of direct form ,the design bit stream generated by EDA tool was downloaded into the above mentioned Spartan devise using trainer kit MXS3FK, which has built-in facility for converting analog signal into 12 bit digital using ADC AD7891 and digital signal into analog using DAC AD7541. Real time audio signal was given using Stereo jacks of kit as shown in fig 6 for the real time functionality of filters for audio applications. Fig6: Testing functionality on Spartan kit The output was verified by calculating SNR and PSNR which were approximately 63 dB and 68 dB respectively which is quite good. A sample i/p X[n] is applied & gives detail & average coefficients of LPF & HPF. $X[n]=\{0.80,592,1104,848,1872,208,120,240,8,1008,1848,1468,2040....\}$ Figure 7:The output of DWT using 5/3 filter coefficient. #### 8. CONCLUSION As mentioned in the introduction section of the paper, 2D DWT is required for image and video processing applications. Since the result of 1D DWT for audio processing are quite encouraging, it is proposed to extend this work to implement 2D DWT as future scope. The architecture can be further modified using the Lifting scheme, pipelining & parallel processing. For multiplier architecture distributed arithmetic algorithm can be used. Results from FPGA shows that 5/3 biorthogonal wavelets are better than 9/7 with respect to memory utilization and speed. Truncation error is negligible for 5/3 biorthogonal wavelet. #### 9. REFERENCES [1] Stephane G. Mallat, "A Theory for Multiresolution Signal Decomposition: The Wavelet Representation", - IEEE Transaction on Pattern Analysis and machine intelligence. Vol II, No 7, July 1989. - [2] U. Meyer Baese, "Digital Signal Processing with FPGA", Springer publication. - [3] Rafael C. Gonzalez and Richard E.Woods, "Digital Image Processing", second edition. - [4] G. Strang and T. Q. Nguyen, Wavelets and Filter Banks. Cambridge, MA: Wellesley, 1996. - [5] K. P. Soman and K. I. Ramchandran, "Insight into Wavelets from Theory to Practice", second Edition, Prentice-Hall India (PHI) - [6] Maurizio Martina and Guido Masera, "Multiplierless, Folded 9/7– 5/3 Wavelet VLSI Architecture", IEEE Transactions on Circuits and Systems—Ii: Express Briefs, Vol. 54, No. 9, September 2007. - [7] Maria E. Angelopoulou and Peter Y. K. Cheung, "Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation - Schedules on FPGAs", Journal of VLSI Signal Processing 2007, DOI: 10.1007/s11265-007-0139-5 - [8] Michel misitietal, Wavelet Toolbox- For Use with MATLAB Stephane G. Mallat, A Theory for Multiresolution Signal Decomposition: The Wavelet Representation, IEEE Transaction on Pattern Analysis and machine intelligence. Vol II, No 7, July 1989. - [9] Jie Guo, Ke-yanWang, Cheng-keWu and Yun-song Li, Efficient FPGA Implementation of Modified DWT for JPEG2000, 978-1-4244-2186-2/08 2008 IEEE. - [10] Sang Yoon Park and Nam Ik Cho, Design of Multiplierless Lattice QMF: Structure and Algorithm Development, 1549-7747 2007 IEEE. - [11] Hazem H. Ali, Hatem M. El-Matbouly, Nader Hamdy, Khaled A. Shehata, VLSI Architecture of QMF for DWT Integrated System, 0-7803-7 150 @ZOO 1 IEEE.