# A New 2-D Digital Filter Using a Locally Broadcast Scheme and Its Cascade Form

Lan-Da Van, Shing Tenqchen\*, Chi-Hong Chang\*\*, and Wu-Shiung Feng\*\*

Lab 331, Department of Electrical Engineering, National Taiwan University, Taipei, China

\* Also with Chunghwa Telecom Telecommunication Labs., 12, Lane 551, Sec. 5,

Min-Tsu Rd., Yang-Mei Zien, Tao-Yuan County, China

\*\*Department of Electronics Engineering, Chang Gung University, Taoyuan, China

E-mail: d86029@cad.ce.ntu.edu.tw and steadms.chitl.com.tw

# Abstract

In this paper, we propose a new two-dimensional (2-D) systolic-array digital filter using locally broadcast scheme that is the hybrid of a modified reordering and another systolic transformation. This architecture occupies locally broadcast, lower quantization error and zero latency without sacrificing the number of multipliers as well as delay elements under the accepted critical period. In addition, the widely used 2-D cascaded systolic digital filter can be described and reconstructed by similar methodology.

# I. Introduction

Since the rapidly demand and development of video consumer products, the high performance 2-D digital filters applied to image/video processing such as image restoration and intraframe noise reduction [1] has been recently required. The architectures of digital filter are mainly derived from the discrete-time domain [2], transformation domain [3-4], and etc. However, we can obviously find that the input and output signals globally broadcast in these structures [2-4]. This is our motivation to improve the structures to be more suitable for VLSI design. In this paper, we propose a new 2-D systolic architecture by a modified reordering of the transfer function and by using the different systolic transformation. The paper is organized as follows. A new systolic architecture for 2-D digital filters is presented in Section II. The cascade form applying the novel 2-D systolic digital filter is also proposed in Section III. The quantization effect due to finite precision arithmetic can be discussed in Section IV. In Section V, comparison results of various 2-D IIR and FIR digital filter architectures are tabulated. Section VI concludes this paper.

## II. A New 2-D Systolic Architecture

The transfer function of a 2-D IIR digital filter can be represented as follows:

$$H(z_1, z_2) = \frac{\sum_{i=0}^{N_1} \sum_{j=0}^{N_2} a_{i,j} z_1^{-i} z_2^{-i}}{1 - \sum_{i=0}^{N_1} \sum_{j=0}^{N_2} b_{i,j} z_1^{-i} z_2^{-j}},$$
 (1)

where  $b_{0,0} = 0$ ,  $a_{i,j}$  and  $b_{i,j}$  are the coefficients of the

 This work was supported by National Science Council under NSC: 88-2216-E-002-018.

# digital filter. We deduce the structure under the assumption of $N_1 = N_2 = N$ , and then we pay attention to derive 2-D IIR digital filter architecture. For the sake of reducing the critical period, we apply another systolic transformation as shown in Fig. 1(a), which is different from Fig. 1(b), to replace the block diagram as shown in Fig. 1(c). Applying this systolic transformation as well as tree structure, we can eliminate one of two existing critical paths of [4] and obtain a temporary filter structure. Obviously, it is commented that there exists the problems of input and output global broadcast and the other unreduced critical path of the temporary filter. In fact, the global broadcast results in the critical path of this temporary filter structure, so we take a modified reordering of delay elements and summations of the transfer function to solve the remaining defects. Given the input $X = X(z_1, z_2)$ and output $Y = Y(z_1, z_2)$ in z transform domain, we can rewrite Eq. (1) as follows:

IIR digital filter, and  $N_1 \times N_2$  is the order of the IIR

$$Y = \sum_{i=0}^{N} \sum_{j=0}^{N} a_{i,j} z_{1}^{-i} z_{2}^{-j} X + \sum_{i=0}^{N} \sum_{j=0}^{N} b_{i,j} z_{1}^{-i} z_{2}^{-j} Y$$
  
$$= \sum_{j=0}^{N} a_{0,j} z_{2}^{-j} X + \sum_{j=0}^{N} b_{0,j} z_{2}^{-j} Y$$
  
$$+ \sum_{i=1}^{N} z_{1}^{-i} z_{2}^{i \times l'} \left[ (\sum_{j=0}^{N} a_{i,j} z_{2}^{-j}) (z_{2}^{-i \times l'} X) + (\sum_{j=0}^{N} b_{i,j} z_{2}^{-j}) (z_{2}^{-i \times l'} Y) \right].$$
  
(2)

Furthermore, we define some terms to simplify the representation of Eq. (2) as follows:

$$F(i) \equiv \sum_{j=0}^{N} a_{i,j} z_2^{-j}$$
, for  $i = 0, 1, ..., N$ , (3.a)

$$G(i) = \sum_{i=0}^{N} b_{i,i} z_2^{-i} , \quad \text{for} \quad i = 0, 1, ..., N.$$
 (3.b)

Substituting Eqs. (3.a) and (3.b) into Eq. (2), we can rewrite Eq. (2) as follows:

$$Y = [F(0)X + G(0)Y] + \sum_{i=1}^{N} z_{1}^{i} z_{2}^{i,p} [F(i)(z_{2}^{-i,p}X) + G(i)(z_{2}^{-i,p}Y)]$$
$$= [F(0)X + G(0)Y] + z_{1}^{-1} z_{2}^{p} ([F(1)\hat{X}_{1} + G(1)\hat{Y}_{1}]$$

## 0-7803-6253-5/00/\$10.00 ©2000 IEEE.

$$+ z_1^{-1} z_2'' ([F(2)\hat{X}_2 + G(2)\hat{Y}_2] + \cdots + z_1^{-1} z_2'' ([F(N)\hat{X}_N + G(N)\hat{Y}_N]) \cdots)), \qquad (4)$$

 $+z_1'z_2'([F(N)X_N + G(N)Y_N])\cdots)),$  (4) where  $\hat{X}_k = z_2^{-P'}\hat{X}_{k-1}, \quad \hat{Y}_k = z_2^{-P'}\hat{Y}_{k-1}$  for k = 2, 3, ..., N, and  $\hat{X}_1 = z_2^{-P'}X, \quad \hat{Y}_1 = z_2^{-P'}Y$ . The integer *P* is in the range of 1 and M-1, where *M* is the width of an image. In Eq. (4), the summations in square brackets can be realized by the subblocks as shown in Fig. 1(a) and externally adding two registers whose size are *P*. Since the image is in the raster scan of this paper, the delay  $z_2' = z'$  and the delay  $z_1^{-1} = z^{-M}$  can be realized by a unit delay element and a shift-register (*SR*) with the size of *M*, respectively. Thus, the mapping of Eq. (4) onto a systolic architecture is depicted in Fig. 2 for N = 2. More importantly, this systolic architecture has no global broadcast signals anywhere even though we implement higher order 2-D IIR digital filters. In addition, the users merely set  $b_{i,j}$  to zero such that a new systolic architecture of an FIR digital filter can be obtained.

#### III. A New 2-D Cascade-Form Digital Filter

Among the applications of IIR digital filters, the cascade of the 1-D second-order IIR digital filter is widely used. The realization of digital filters by cascading the second-order IIR digital filter has many desirable features, such as less sensitivity to the coefficient quantization error and better roundoff noise performance than the direct-form realization while in the fixed-point realization [5]. Here, we use the preceding developed 2-D IIR digital filter as shown in Fig. 2 to construct the proposed 2-D IIR digital filter in cascade form. Before proceeding, the block diagram in Fig. 2 can be represented by two types of PE's, i.e.,  $PE_0$ ,  $PE_1$ , as shown in Figs. 3 (a), (b), respectively.

Under the assumption of  $N_1 = N_2 = N$  in Eq. (1), the transfer function of the cascade of the  $2 \times 2$  order IIR digital filter can be written as

$$H(z_1, z_2) = \prod_{l=1}^{N_x} \frac{\sum_{j=0}^{2} \sum_{j=0}^{2} a_{i,j,l} z_1^{-i} z_2^{-j}}{1 - \sum_{i=0}^{2} \sum_{j=0}^{2} b_{i,j,l} z_1^{-i} z_2^{-j}},$$
(5)

and

$$N_{s} = \lfloor (N+1)/2 \rfloor, \tag{6}$$

where  $\lfloor \bullet \rfloor$  denotes the maximum integer less than or equal to  $\bullet$ . Let N = 4 and the resulting systolic architecture of a 2-D cascade-form IIR digital filter is revealed in Fig. 4, where  $N_s = 2$ . Furthermore, this architecture behaves locally broadcast everywhere.

# IV. Quantization Error Analysis of 2-D Digital filters

In this section, we investigate the error analysis due to finite word-length arithmetic in 2-D IIR digital filters. Here, we adopt most notations as defined in [4, 6] to analyze the quantization errors including input, coefficient quantization error, and roundoff as well as storage error. Thus, the total combined errors are written as

$$f(n,m) \approx m_E + c_E + i_E + s_E + \sum_{i=0}^{N} \sum_{j=0}^{N} b_{i,j} e_Y(n-i,m-j),$$
(7)

where

$$m_{i:} = \sum_{i=0}^{N} \sum_{j=0}^{N} (p_{i,j}(n-i,m-j) + r_{i,j}(n-i,m-j)), \quad (8.a)$$

$$c_{j:} = \sum_{i=0}^{N} \sum_{j=0}^{N} (\alpha_{i,j} x(n-i, m-j) + \beta_{i,j} y(n-i, m-j)), (8.b)$$

$$i_E = \sum_{i=0}^{N} \sum_{j=0}^{N} a_{i,j} e_X (n-i, m-j), \qquad (8.c)$$

$$s_{j:} = \sum_{i=0}^{N} \sum_{j'=0}^{\lfloor N/3 \rfloor} s_{i,j'} + \sum_{i=1}^{N} s_{i} , \qquad (8.d)$$

 $m_E$ ,  $c_E$ ,  $i_E$ , and  $s_E$  represent roundoff error. coefficient quantization error, input quantization, and storage error, respectively. Note that the integer variable. P, does not affect the quantization error and, in fact, its main aim is to provide several locally broadcast architectures. The variable,  $s_E$ , consists of  $(N+1) \times (|N/3|+1)$  storage error and N storage error due to this new systolic transformation and addition of each 1-D digital filter output, respectively. In Eq. (7), on the other hand, the roundoff errors due to finite precision multiplication can be significantly reduced while applying a fixed-width multiplier with lower average error and lower variance [7]. Other terms will be identical to those in other proposed architectures. Extending Eq. (7), the quantization error of a 2-D cascade form digital filter, as described in Section III, can be easily to quantify, so we ignore here.

# V. Comparison Results

Comparison results of IIR digital filters are tabulated in Table I. It is in terms of quantization error, critical period, latency, the number of multipliers and delay elements, and, importantly, whether the input and output signals locally broadcast in these structures. In Table I, the hybrid of two schemes eliminates globally broadcast paths. Besides, the latter scheme, systolic transformation, leads to lower storage error than that in [2-4]. In Table I,  $\left[\bullet\right]$  denotes a minimum integer that greater than or equal to  $\bullet$ , and  $T_m$ and  $T_a$  represent the operation time required for the multiplier and adder, respectively. So as to minimize periods in the critical paths as appeared in [2-4] and Fig. 2. we properly apply tree method to those structures and then separately evaluate the throughput. In our work, the resulting period is less than that in [2] for  $N \ge 2$  and [4]. In practical cases, since M is much larger than N, the number of delay elements is dominated by the product-term MN. Hence, the number of delay elements in this work is almost equal to that of [3] as well as [4], and less than that of [2]. Analogously, we compare the hardware characteristics of FIR digital filters as listed in Table II. From Tables I and II, it turns out that the proposed architecture has no global broadcast, and maintain the lower quantization error as well as zero. latency under the accepted critical period.

#### VI. Conclusions

A new systolic architecture for the implementation of 2-D IIR and FIR digital filters has been accomplished by using a locally scheme that is hybrid of a modified reordering and another systolic transformation. The realization yields locally broadcast, lower quantization error, zero latency and the satisfied critical period without sacrificing other hardware characteristics. In addition, we extend the new second-order 2-D architecture to a 2-D cascaded-form systolic digital filter that is widely used to image/video processing applications.

# References

- [1] A. M. Tekalp, Digital Video Processing. Englewood Cliffs, NJ: Prentice-Hall, 1995, ch 14.
- [2] M. A. Sid-Ahmed, "A systolic realization for 2-D digital filters," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-37, pp. 560-565, Apr. 1989.
- [3] S. Sunder, F. El-Guibaly, and A. Antoniou, "Systolic implementations of two-dimensional recursive digital filters," in Proc. IEEE Int. Symp. Circuits Syst., May 1990, pp. 1034-1037.
- [4] N. R. Shanbhag, "An improved systolic architecture for 2-D digital filters," IEEE Trans. Signal Processing, vol. 39, no. 5, pp. 1195-1202, May 1991.
- [5] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989, ch 6.
- [6] D. Raghuramireddy and R. Unbehauen, "Error analysis of a systolic realization of 2-D Filters," IEEE Trans. Signal Processing, vol. 40, no. 2, pp. 478-482, Feb. 1992.
- [7] L. D. Van, S. S. Wang, S. Tenqchen, W. S. Fegn, and B. S. Jeng, "Design of a lower error fixed-width multiplier for speech processing application," in Proc. IEEE Int. Symp. Circuits Syst., May 1999, vol. 3, pp. 130-133, Orlando, Florida.

| Table I Comparison Results among the IIR Digital Filter Architectures |
|-----------------------------------------------------------------------|
|-----------------------------------------------------------------------|

| Parameters                | Sid-Ahmed [2]                | SCH3 of S-G-A [3]  | Shanbhag [4]             | This Work                                                   |
|---------------------------|------------------------------|--------------------|--------------------------|-------------------------------------------------------------|
| Global Broadcast          | Input and Output             | Input and Output   | Input and Output         | No                                                          |
| Quantization Error        | No Reduction                 | No Reduction       | Reduction in $S_{\nu}$   | More Reduction                                              |
| $(m_E + c_E + i_E + s_E)$ |                              |                    |                          | in <i>s<sub>E</sub></i>                                     |
| Critical Period           | $T_m + (2 + \log_2(N+1))T_a$ | $T_m + 2T_a$       | $2T_m + 2T_a$            | $T_m + 3T_a$                                                |
| Latency                   | 1                            | 1                  | 0                        | 0                                                           |
| No. of Multipliers        | $2(N+1)^2 - 1$               | $2(N+1)^2 - 1$     | $2(N+1)^2 - 1$           | $2(N+1)^2 - 1$                                              |
| No. of<br>Delay Elements  | $(N+1)^2 + 2MN$              | $(N+1)^2 + (M+1)N$ | $\frac{3N(N+1)}{2} + MN$ | $\frac{3N(N+1)}{2} + 2\left\lfloor\frac{N}{3}\right\rfloor$ |
|                           |                              |                    |                          | +(M+P)N                                                     |

# Table II Comparison Results among the FIR Digital Filter Architectures

| Parameters                | Sid-Ahmed [2]                                         | SCH3 of S-G-A [3] | Shanbhag [4]         | This Work                   |  |  |
|---------------------------|-------------------------------------------------------|-------------------|----------------------|-----------------------------|--|--|
| Global Broadcast          | Input                                                 | Input             | Input                | No                          |  |  |
| Quantization Error        | No Reduction                                          | No Reduction      | Reduction in $S_{k}$ | More Reduction              |  |  |
| $(m_E + c_E + i_E + s_E)$ |                                                       |                   |                      | in <i>S<sub>E</sub></i>     |  |  |
| Critical Period           | $\max\{(T_m + T_a), (\lceil \log_2(N+1) \rceil)T_a\}$ | $T_m + T_a$       | $T_m + 2T_a$         | $T_m + 2T_a$                |  |  |
| Latency                   | 1                                                     | 1                 | 0                    | 0                           |  |  |
| No. of Multipliers        | $(N+1)^2$                                             | $(N+1)^2$         | $(N+1)^2$            | $(N+1)^2$                   |  |  |
| No. of<br>Delay Elements  | $(N+1)^2 + MN$                                        | $(N+1)^2 + MN$    | N(N+1) + MN          | N(N+1) + MN                 |  |  |
|                           |                                                       |                   |                      | $+\left[\frac{N}{3}\right]$ |  |  |



Fig. 1. Systolic transformation, (a) applied by this paper, (b) applied by [4], and (c) original second-order relationship.



Fig. 2. A new proposed IIR digital filter with the locally broadcast for N = 2.



Fig. 3. The detailed block diagrams of two types of PE's, (a)  $PE_0$  and (b)  $PE_1$ .



Fig. 4. A cascade-form systolic architectrue of a 2-D IIR digital filter for N = 4.