# A BiCMOS Dynamic Full Adder Circuit for VLSI Implementation of High-Speed Parallel Multipliers Using Wallace Tree Reduction Architecture

J. B. Kuo, H. J. Liao, and H. P. Chen

Rm. 526, Dept. of Electrical Eng., National Taiwan University #1, Roosevelt Rd., Sec. 4, Taipei, Taiwan 106-17 FAX:886-2-363-8247, Telephone:886-2-363-5251 x285

Abstract — This paper presents a BiCMOS dynamic full adder circuit for VLSI implementation of high-speed parallel multipliers using Wallace tree reduction architecture. With the BiCMOS dynamic full adder circuit, an 8x8 multiplier designed based on a  $2\mu m$  BiCMOS technology shows a 6x improvement in speed as compared to the CMOS static one.

### Summary

High-speed multipliers are usually realized by parallel architectures [1], where Wallace reduction structure [1] and carry look ahead circuit have been used to enhance the speed performance. In a high-speed parallel multiplier using Wallace tree reduction structure, the most important building cell is the full adder circuit. Although CMOS dynamic technique [2] can provide a speed advantage over the static one for implementing serial adders, it's not suitable for realizing the full adder circuit for parallel multipliers using Wallace tree reduction structure due to race problems. Currently, BiCMOS static logic circuits have been proved to be helpful for realizing high-speed VLSI systems [3]. In fact, BiCMOS dynamic logic circuits can also be very helpful for implementing high-speed digital systems. Recently, a BiCMOS dynamic carry look ahead circuit, which is built by cascading BiCMOS dynamic logic gates without race problems, has been reported [4]. In this paper, a BiCMOS dynamic full adder circuit suitable for VLSI implementation of parallel multipliers without race problems is described. In following sections, the BiCMOS dynamic full adder circuit based on the BiCMOS dynamic logic circuit techniques is described first, followed by the 8x8 multiplier circuit.

Fig. 1 shows the BiCMOS dynamic full adder circuit composed of cascading N and P type BiCMOS dynamic logic cells [4]. The N type BiCMOS dynamic logic cell is used to implement the *carry* signal (carry=AB+BC+AC) and the P type cell is used to realize the sum signal (sum=*carry*(A+B+C)+ABC). As in a dynamic BiCMOS digital circuit [4], during the precharge/predischarge period, the clock signal (CK) is low and the outputs of N and P type cells are charged to high and discharged to low, respectively. During the logic evalua-

tion period, the clock signal (CK) is high. If at least any two of the three inputs are high, the carry signal is pulled low by the pull-down bipolar transistor. Then, the sum signal will be pulled up only when all of the three signals  $\overline{A}, \overline{B}, \overline{C}$ are low or when the <u>carry</u> signal is high and at least one of the three signals  $\overline{A}$ ,  $\overline{B}$ ,  $\overline{C}$  is low. In the P type cell, three signals  $\overline{A}$ ,  $\overline{B}$ ,  $\overline{C}$  are generated from the inputs A, B, C via static CMOS inverters, which are used to ensure that the arrival time of the signals  $\overline{A}$ ,  $\overline{B}$ ,  $\overline{C}$  is later than that of the carry signal to the P type cell. Using the cascading N and P type BiCMOS dynamic logic cells, the full adder can be used in the high-speed parallel multiplier with Wallace reduction structure without race problems since signals are passed via a series of cascading N type and P type logic cells. During the precharge/predischarge period, the outputs of the N and P type logic cells are charged to high or discharged to low, respectively. Consequently, during the logic evaluation period, two prohibited states to any next stage logic circuit have been naturally eliminated. As a result, race problems have been successfully avoided.

Fig. 2 shows the  $\overline{carry}$  and sum signals at the outputs of the N and P type dynamic logic cells during the transients for three possible input situations - a. one input signal switches from low to high. b. two input signals switch from low to high. c. three input signals switch from low to high. As indicated in Fig. 2, for all three possible situations, the carry and sum signals respond to their corresponding states accordingly. Fig. 3 shows the layout of the BiCMOS dynamic full adder and the CMOS static full adder using a 2µm BiCMOS technology. The BiCMOS dynamic full adder occupies an area of  $196\mu m \times 173\mu m$ . The CMOS static one has an area of  $278\mu m \times 194\mu m$ . The BiCMOS dynamic full adder is 30% smaller as a result of fewer transistors used in the dynamic circuits. Fig. 4 shows the propagation delay in the BiCMOS dynamic full adder vs. load capacitance. Also shown in Fig. 4 is the propagation delay in the CMOS static full adder. The propagation delay of the BiCMOS dynamic full adder is determined by the delay associated with the path from the P type cell input  $\overline{A}$  via two PMOS transistors to the output. As shown in Fig. 4, for a wide range of load, the propagation delay of the BiCMOS dynamic full adder is about 3x shorter as compared to that of the CMOS static one. The consistent higher switching speed advantage of the BiCMOS dynamic full adder is very important for realizing parallel multipliers using Wallace tree reduction architecture.

High-speed parallel multipliers with Wallace tree reduction structure have been realized by CMOS static circuits [5] but they suffer from the speed penalty as a result of complex routing, long wiring, irregular layout of the architecture [5]. Due to race problems, CMOS dynamic circuits are not suitable for building high-speed parallel multipliers with Wallace tree reduction structure. In fact, the BiCMOS dynamic circuits are appropriate for implementing Wallace tree reduction architecture with complicated wiring. In order to show the versatilities of the BiCMOS dynamic full adder circuit for constructing parallel multipliers with Wallace tree reduction structure, a test chip including two 8x8 parallel multipliers have been designed using a  $2\mu m$  BiCMOS technology. As shown in Fig.5(a), one is the BiCMOS dynamic parallel multiplier using Wallace tree reduction architecture with BiC-MOS dynamic full adders and the BiCMOS carry look ahead circuit [4]. The other one is the CMOS static one without carry look ahead circuit and without the Wallace tree reduction structure as shown in Fig. 5(b). Figs. 6 show the layout of the two multipliers. As shown in Figs. 6, the BiCMOS dynamic multiplier occupies an area of  $2908 \mu m \times 3160 \mu m$ and the CMOS one has an area of  $2054\mu m \times 2580\mu m$ . The die area of the BiCMOS dynamic multiplier is 73% larger as a result of a large portion of the space is allocated for the complex routing, long wiring and irregular layout for realizing the Wallace tree reduction architecture although the BiCMOS dynamic full adder occupies a much smaller area as compared to the CMOS one.

Fig. 7 shows the transient waveforms at the internal nodes along the longest path in the 8x8 BiCMOS dynamic multiplier. The overall delay associated with the longest path in the Wallace tree reduction structure is 10ns. With the BiC-MOS carry look ahead circuit, the overall propagation delay of the 8x8 BiCMOS dynamic multiplier is 22ns. In fact, the BiCMOS dynamic full adder has a very good expansion capability. Fig. 7 shows the calculated propagation delay vs. bit number of a parallel multiplier using the BiCMOS dynamic and CMOS static circuits. In a CMOS static multiplier without Wallace tree reduction structure and carry look ahead circuit, the propagation delay increases linearly as bit number increments. On the other hand, the propagation delay in a BiCMOS dynamic multiplier with Wallace tree reduction structure and carry look ahead circuit is relatively insensitive to its size. For the 8x8 multiplier, the BiCMOS dynamic one has a 5.7x speed advantage over the CMOS static one. For the 32x32 configuration, the speed enhancement is 11.6x. In spite of the 73% larger layout area for complex wiring, the BiCMOS dynamic multiplier still provides a very attractive speed performance as a result of the intrinsic BiCMOS excellent property in driving a large load.

## Acknowledgments

This work is supported under R.O.C. National Science Council Contract #81-0404-E002-102 & 111

#### References

[1] S. Waser et. al., "Introduction to Arithmetic for Digital Systems Designers," HRW 1982

[2] N.F. Goncalves and H.J. DeMan, "NORA:A Racefree Dynamic CMOS Technique for Pipeline Logic Structures," *IEEE J Solid St Ckt*, 6/83

[3] M. Kubo, et. al., "Perspective on BiCMOS VLSI's," IEEE J. Solid St Ckt, 2/88 [4] J. B. Kuo, H. J. Liao, and H. P. Chen, "BiCMOS Dynamic Manchester Carry Look Ahead Circuit for High Speed Arithmetic Unit VLSI," IEE Electronics Letters, 1992

[5] D. G. Grawley, "8 x 8 bit Pipelined Dadda Multiplier in CMOS," IEE Proc. vol. 135, Pt. G. No. 6, 12/88

## IEEE 1992 Bipolar Circuits and Technology Meeting 9.2



Fig. 1 The BiCMOS dynamic full adder circuit.





Fig. 2 The sum and  $\overline{carry}$  at the output of the BiCMOS full adder for three input cases – a. one input b. two inputs c. three inputs switch(es) from low to high.



Fig. 3 Layout of the BiCMOS dynamic full adder and the CMOS static full adder. The BiCMOS dynamic one occupies an area of  $196 \mu m \times 173 \mu m$ . The CMOS one has an area of  $278 \mu m \times 194 \mu m$ .



Fig. 4 The propagation delay of the BiCMOS dyanmic full adder and the CMOS static full adder vs. the output load capacitance.



Fig. 5 (a)Block diagram of the 8x8 parallel multiplier using Wallace tree reduction architecture with BiCMOS dynamic full adders and carry look ahead circuit. b. Block diagram of the CMOS 8x8 parallel multiplier without using Wallace tree reduction struction without carry look ahead circuit.



Fig. 6 Layout of the 8x8 parallel multipliers. a. the BiCMOS one using Wallace tree reduction architecture with BiCMOS dynamic full adders and carry look ahead circuit. The die area is  $2908\mu m \times 3160\mu m$ . b. the CMOS static one. The die area is  $2054\mu m \times 2580\mu m$ .



Fig. 7 Transient waveform at the internal nodes of the 8x8 multiplier using BiCMOS dynamic full adder circuits.



Fig. 8 The speed performance of CMOS static and BiCMOS dynamic full multipliers vs. bit number.