# Image Processor Using 3D-DWT as Part of Health Care Management System

Kyung-Chang PARK, Yun-ki HONG, Sang-Jin LEE, Yeon-Ho KIM, Younggap YOU, Tae Won CHO,

Kyoung-Rok CHO and Kamran ESHRAGHIAN

World Class University (WCU) Program

**College of Electronics and Information Engineering** 

Chungbuk National University, Cheongju, Chungbuk, 361-763, Korea

## ABSTRACT

This paper presents a low power and high speed 3D-DWT (three-dimensional discrete wavelet transform) architecture using stacked silicon dies for image compression of medical images. The interconnections of stacked chips are based on TSV (through silicon via) techniques. Its low power operation is due to short signal paths between layers. The area of 3D architecture is much smaller than that of 2D counterpart having the same performance. Each circuit/system layer can be optimized since it can be fabricated using a different technology.

The 3D-DWT architecture consists of two processing elements (PE): a PE-odd (processing elements-odd) and a PE-even (processing elements-even) layer. Each layer processes pixel data derived from rows of the y axis, scanning from left to right side of the image data. Each layer operates in parallel yielding high throughput. The architecture can be used to compress medical image such as X-ray, MRI, NRI, CT and endoscopy by processing images frame by frame.

**Keywords**: 3D-DWT, wavelet, image compression, TSV, stacked chip

# **1. INTRODUCTION**

Digital medical imaging systems are beginning to influence the way practitioners and radiologists interact. Video images introduce heavy demand on the storage capacity of the memory layer of a processing system due to large volume of Various compression techniques have been data. implemented to overcome the complexity. Representative compression standards include JPEG, MPEG and H.26x. and DCT (discrete cosine transform) which is the basis for JPEG and MPEG. Unfortunately the approach suffers from blocking effects. Wavelet based image processing such as DWT has emerged as an option to DCT based schemes. DWT allows image processing from human vision perspective and as the consequence the technique has been adopted as the JPEG 2000 standard for still image compression [1]. Substantial clinical research has been devoted to applications of image compression for medical equipments. The medical image standard DICOM (Digital Imaging and Communications in Medicine) employs JPEG2000 [2-5]. DWT based research has extended to two dimensional algorithms along with the proliferation of JPEG2000. The 2D-DWT is effective for compression of a single image. However the approach experiences severe limitation in compressing large size medical images such as CT (Computed Tomography), CR (Computed Radiology) and MRI (Magnetic Resonance Imaging).

3D-DWT provides interesting possibilities and has been studied to resolve the problem associated with compression of large size images, where compression of successive images becomes manageable within 3D-DWT algorithms [6][7]. This scheme however, requires excessively longer time. processing Parallel computation algorithm implemented in VLSI (very large scale integration) is a promising candidate for such an implementation and provides a plausible option for technology mapping. Current activities on 3D-DWT are based on single chip planar VLSI architecture. The problems with single planar approach are utilization of large number of I/O circuitry. Furthermore increased packaging density in turn makes implementation of low power and high performance systems rather difficult. Conventional 3D-DWT architectures that utilize system-inpackage (SiP) approach experience difficulties introduced by the thickness of stacked chips and the squeezed bonding pitch of bonding wires. Hence system performance improvement expected from the 3D approach is somewhat degraded. Substantial developmental activities are in progress to build stacked chip structures that can exploits TSV technology to obtain higher processing speed with less power consumption [8-13]. A promising solution for 3D-DWT system is the implementation of stacked chip architecture by utilization of TSV. The approach has the potential to resolve the difficulties associated with SiP [8]. Therefore in this paper we present an image processing architecture that employs 3D-DWT algorithm and then we provide a comparison as to the effectiveness of the 3D-DWT with respect to the conventional 2D systems. 3D image processing approach can be employed in health care and point of care management systems. Section 2 reviews the 3D discrete wavelet transform; Section 3 presents the proposed architecture along with subsystems description such as processing element, processing element-x and the

## 2. 3D DISCRETE WAVELET TRANSFORM

filter; Section 4 analyzes the architecture; and finally

Section 5 concludes with evaluation of the approach.

DWT compression is performed on the resultant data using arithmetic coding. A 3D-DWT performs wavelet transform in the three x, y, and z directions on the image data. A 3D image is an extension of 2D images along the time axis whereby processing in 3D-DWT is carried out on the pixel values of the same location along the time axis. Figure 1 show one level of 3D-DWT where the H-pass and L-pass represent the high pass filter and the low pass filters respectively. The down sampling of the filtered results is denoted as " $\downarrow$ 2" which can be decomposed further into smaller data through multi-level image decomposition processes.



## **3. PROPOSED 3D-DWT ARCHITECTURE**

Conventional 3D-DWT is inefficient since it requires to access all the image frames on the same time axis, and thereby requires significant amount of memory space to perform DWT. The concept of group of frames (GOF) which is similar to the group of pictures in MPEG is introduced to overcome the drawbacks associated with conventional 3D-DWT; unfortunately this approach has its limitations from compression efficiency perspective.



Figure 2. 3D-DWT architecture

The proposed architecture being introduced is based on two layered system elements and addresses this frame access issue. Figure 2 highlights the proposed architecture comprising the PE-odd (processing element - odd) layer and the PE-even (processing element even) layer. The approach employs stacked chip architecture, and thereby alleviates the frame access bottleneck. The architecture permits accesses to all frames on the same time axis, thus providing better data compression efficiency.

#### **Processing element architecture**

The processing element (PE) carries nine filter bank-1s (FB-1s), four filter bank-2s (FB-2s) and their interconnections. The nine filter bank-1s down sample in the z direction. The remaining four filter bank-2s decompose images in the y direction. Figure 3 shows implementation of the PE structure.



Figure 3. PE architecture

## Processing element-x architecture

The processing element-x (PE-x) includes four filter bank-1s which down sample in the x direction. The PE-x as illustrated in Figure 4, receive and process the outputs of the PEs. This structure performs 3D-DWT on the entire data, when all the data become available.



#### Filter design

The filter architecture is based on the Daubechies 9/7 filter. The PSNR and the size of the circuitry are the key parameters in design of the filter. Cao et al. [14] used 13-bit coefficients providing high level of accuracy in its fixedpoint implementation.

Our approach is based upon modified Cao's filter bank-1 which maps adder compressor array into two parallel structures in order for the high and the low signals to be generated simultaneously. Figure 5 shows the modified filter bank-1 design.

The modified filter bank-2 design processes 9 pixel data at a time without the need for a delay lines. Other features are the same as the filter bank-1 design.



Table 1 summarizes the array input convertor. In this table, symbols ">>" and "-" refers to right shift and invert operations, and "H1 ~ H13, L1 ~ L13" refers to the input of adder compressor array notation used in Figure 5.

| High pass                | Low pass                  |  |
|--------------------------|---------------------------|--|
| $H1 = -W_1$              | $L1 = -W_8$               |  |
| $H2 = W_{13} >> 1$       | $L2 = W_1 >> 1$           |  |
| $H3 = W_4 >> 2$          | $L3 = W_8 >> 2$           |  |
| $H4 = W_{13} >> 3$       | $L4 = W_2 >> 3$           |  |
| ${ m H5}={ m W_6}>>4$    | $L5 = W_1 >> 4$           |  |
| ${ m H6}={ m W}_{11}>>5$ | $L6 = W_3 >> 5$           |  |
| $H7 = W_1 >> 6$          | $L7 = W_5 >> 6$           |  |
| $H8 = W_{13} >> 7$       | $L8 = W_4 >> 7$           |  |
| $H9 = W_7 >> 8$          | $L9 = W_7 >> 8$           |  |
| $H10 = W_7 >> 9$         | $L10 = W_1 >> 9$          |  |
| $H11 = W_{12} >> 10$     | $L11 = W_9 >> 10$         |  |
| $H12 = W_{10} >> 11$     | $L12 = W_4 >> 11$         |  |
| $H13 = W_7 >> 12$        | L13= W <sub>1</sub> >> 12 |  |

Table 1. Array input convertor

## Scanning pattern

The scanning pattern employed here down samples in the sequences of the z, y and x directions respectively. Conventional schemes employ the down sampling sequences of the x, y and z direction. Thus better performance is observed through the change in the sampling sequences.

## **Processing direction**

Figure 6 illustrates the pixels processing in PE-odd layer and PE-even layer. Each layer decomposes images along the direction of the z axis (time), the y axis (vertical) and the x axis (horizontal). The PE-odd layer processes pixels in the x direction based on the odd-th sampling on the y direction. The PE-even layer processes pixels in the x direction based on the even-th sampling on the y direction. It means that 81 pixels are processed in the three dimensions.



Figure 6. Pixels processing in PE-odd and PE-even layer

## 4. RESULTS

There are two cases used as the basis of comparison: a 2D -DWT case and a 3D-DWT with different scanning sequences. The PSNR of each is measured using:

$$PSNR = 10\log_{10}(\frac{255^2}{MSE})[dB]$$
(1)

$$MSE = \frac{1}{L*M*N} \sum_{t=1}^{L} \sum_{x=1}^{M} \sum_{y=1}^{N} [f_t(x, y) - \hat{f}_t(x, y)]^2$$
(2)

The MSE (Mean Square Error) defines the quadratic average in the difference between the original and the restored images. M\*N represents the number of pixels of a frame, and L is the number of frames. The same compression ratio has been applied to all the cases. Evaluation of results is summarized in Table 2 in which the new architecture shows the improved performance among the three cases. The LL component carries the largest amount of image information. The 3D-DWT allows more aggressive compression of components while keeping the LLL component intact, and thereby yields the highest PSNR. The changes in scanning sequences also help to obtain better PSNR.

Table 2. PSNR results

|          | 2D-DWT | 3D-DWT<br>(z->x->y) | New<br>Architecture |
|----------|--------|---------------------|---------------------|
| PSNR(dB) | 38.18  | 40.69               | 41.23               |

#### **5. CONCLUSIONS**

In this paper we presented a 3D architecture for image compression based on 3D-DWT. The 3D-DWT has been considered as an efficient compression method but suffers from the want for very large memory space and an increase in computation time. Our modified architecture resolves these problems associated with conventional 3D-DWT by using stacked layers of processing and memory functions that are connected using TSVs. The stacked 3D structure employs modified scanning sequences that yield promising performance with respect to other alternatives.

The new architecture can be adapted to the more aggressive environments such as medical imaging systems where high quality and high resolution imaging is essential [3][5]. Application of stacked 3D-DWT, using TSV most likely will open new areas in image compression including those required for medical diagnosis.

# 6. ACKNOWLEDGEMENT

This work was supported by grant No. R33-2008-000-1040-0 from the World Class University (WCU) project of MEST and KOSEF through Chungbuk National University.

## 7. REFERENCES

- M. Boliek, C. Christopoulos, and Eric Majani, "JPEG 2000 part-I final draft international standard", ISO/IEC JTC1/SC29 WG1, 24 Aug. 2000.
- [2] <u>http://www.dclunie.com/dicom-status/status.html</u>, March. 2009
- [3] Yongjai Lee, "Quality Evaluation of JPEG2000 Compressed Images in PACS Environments", Korea Computer Congress 2005(KCC 2005), vol.32, no1(B), 2005, pp.682-684.
- [4] D. Dhouib, A. Naït-Ali, C. Olivier, M. S. Naceur, "Performance Evaluation of Wavelet Based Coders on Brain MRI Volumetric Medical Datasets for Storage and Wireless Transmission", International Journal of Biological, Biomedical and Medical Sciences, vol.3, no.3, 2008, pp.147-156.
- [5] Soon Joo Cha et al.,"Clinical Evaluation of the JPEG2000 Compression Rate of CT and MR Images for Long Term Archiving in PACS", J Korean Radiol Soc, vol.54, 2006, pp.227-233.
- [6] R.M.Jiang , D.Crookes, "FPGA implementation of 3D discrete wavelet transform for real-time medical imaging", Circuit Theory and Design 2007, 27-30 Aug. 2007.
- [7] M. Jiang, D.Crookes, "Area-efficient high-speed 3D DWT processor architecture", Electronics Letters, Vol. 43, Issue 9, 26 April . 2007.
- [8] M. Koyanagi, T. Fukushima, T.Tanaka, "High-Density Through Silicon Vias for 3-D LSIs", Proceedings of the IEEE, Vol. 97, Issue 1, Jan. 2009.
- [9] M. Koyanagi, H. Kurino, K. W. Lee, K. Sakuma, N. Miyakawa, H. Itani, "Future system-on-silicon LSI

chips", IEEE Micro, vol. 18, no. 4, 1998, pp. 17-22.

- [10] M. Koyanagi, "Progress of three-dimensional integration technology", Extended Abstr. Int. Conf. Solid State Devices Mater., 2000, pp. 422–423.
- [11] J. A. Davis, R. Venkatesan, A. Kaloyeros, M. Beylansky, S. J. Souri, K. Banerjee, K. C. Saraswat, A. Rahman, R. Reif, J. D. Meindl, "Interconnect limits on gigascale integration (GSI) in the 21st century", Proceedings of the IEEE, vol. 89, no. 3, 2001, pp. 305–324.
- [12] P. Ramm, D. Bonfert, H. Gieser, J. Haufe, F. Iberl, A. Klumpp, A. Kux, and R. Wieland, BInterchip via technology for vertical system integration", Proc. IEEE Int. Interconnect Technol. Conf. (IITC), 2001, pp. 160–162.
- [13] J. Burns, L. McIlrath, C. Keast, C. Lewis, A. Loomis, K. Warner, and P. Wyatt, "Three-dimensional integrated circuits for low-power, high-bandwidth systems on a Chip", Proc. IEEE Int. Solid State Circuits Conf., 2001, pp. 268–269.
- [14] Xixin Cao, Qingqing Xie, Chungan Peng, Qingchun Wang, Dunshan Yu, "An Efficient VLSI Implementation of Distributed Architecture for DWT", Multimedia Signal Processing, 2006 IEEE 8th Workshop, 3-6 Oct. 2006, pp.364 – 367.