# ENERGY ANALYSIS OF BIPARTITION ARCHITECTURE FOR PIPELINED CIRCUITS Shanq-Jang Ruan\*, Edwin Naroska, Yen-Ren Chang, Chia-Lin Ho and Feipei Lai Dept. of EE and Dept. of CSIE National Taiwan University Taipei 106, Taiwan Phone:+886-2-23914116, Fax:886-2-23637204, flai@cc.ee.ntu.edu.tw Computer Engineering Institute University of Dortmund Dortmund 44221, Germany Phone:+49-231-7552406, Fax:+49-231-7553251, edwin@ds.e-technik.uni-dortmund.de ### ABSTRACT Energy consumption has recently emerged as one of the most critical design constraints. It directly relates to the operating time of a portable device. Most researches on pipelined circuits address the optimization of logic blocks to achieve low power. Among the power reduction techniques, the bipartition approach is comparatively effective as it partitions a given circuit into two subcircuits such that only a selected subcircuit is activated at a time, hence reducing unnecessary signal transitions. However operation time of a system is correlated to energy dissipation rather than power dissipation. In this paper, we propose a bipartition algorithm which aims to reduce switching probabilities such that the energy dissipation of combinational block as well as pipelined registers are reduced. Transistor-level simulation results show that our proposed algorithm reduce not only the power dissipation but also delay for most of the benchmark circuits. ### 1. INTRODUCTION Pipelined circuits have been widely utilized in designing unpluged digital systems, e.g.cellular phones, laptops or PDAs. For these devices low energy consumption is crucial to meet the design requirements in reliability, performance and battery life. As a matter of course, the lead circuit designers have to cope with excessive power dissipation, which in no doubt urges for synthesis tools specifically designed for energy efficient circuits. Towards minimizing the power dissipation of pipelined circuits, a lot of works have been done including the methods in precomputation [1], gated pipelined registers [2, 3], guarded evaluation [4, 5] and circuit par- The author was sponsored in part by Synopsys Inc. This work was sponsored in part by the National Science Council of Taiwan under Grant NSC 90-2213-E002-108 tition [6, 7, 8]. Precomputation disables the parts of the pipelined registers which do not affect the output value [1]. The authors of [2, 3] successfully reduce the switching activities of the pipelined registers by using control signals to gate the clock. In [4, 5], the authors isolate the operands which result in redundant operations to save unnecessary power dissipation. Chen et al. separate the original circuit into two parts: a small part (with respect to area) that is active most of the time and a bigger part which is infrequently activated. Because of the separation, power is saved as the switching activity is confined to the smaller part most of the time. In [7], Choi and Hwang partition the combinational logic block of a pipelined circuit into multiple subcircuits through a recursive application of Shannon expansion with respect to the selected input variables. Furthermore, Ruan shows that partitioning circuits into more than two subcircuits often fails in saving power due to the overhead introduced by the additional circuitry [8]. In this paper, we propose a bipartition algorithm to reduce the switching probability of two subcircuits by using Shannon expansion to select a partition variable that results in minimal switching probability of the circuit. In addition, bipartitioning also reduces length of the critical path. This implies that we can extend the operation time of an unpluged system due to the reduced power-delay product. To validate the results, we employ an accurate transistor-level power estimator to obtain energy dissipation for bipartition algorithm. ## Comparison to other partition techniques As stated in the introduction, there are several other partition techniques designed for lowering power consumption, but not for energy dissipation<sup>2</sup>. In [6], cir- <sup>&</sup>lt;sup>1</sup>EPIC PowerMill was developed by EPIC Design Technology, INC. <sup>&</sup>lt;sup>2</sup>Energy dissipation is a measure of power-delay product. cuits are bipartitioned based upon profile information. The approach detects the frequent output pattern and synthesizes them into a small subcircuit. However, for selecting the active subcircuit additional control circuitry is required, which increases time delay. [7] and [8] partition circuits using Shannon expansion. As the circuit architectures in [7, 8] do not contain the latches and AND gates shown in Fig. 3 the circuits may suffer from glitches that can cause incorrect behaviors. This unfortunately can not be detected while running gatelevel simulation. The authors of [7] and [8] estimated power/energy dissipation in gate-level is less accurate than that in transistor-level let alone the power dissipation of each component of the partition architecture is still unclear. In contrast to the other approaches, we provide the following viewpoints: - we bipartite circuit in terms of reducing switching probability instead of area; - we adopt accurate transistor-level simulation. This ensures the correctness of circuit behavior and energy dissipation; and - the power dissipation of each component in the bipartition architecture is disclosed and analyzed. The paper is organized as follows. In Section 2, we provide some basic concepts and briefly introduce our bipartition architecture, while Section 3 explains the bipartition algorithm. Section 4 describes the experimental results that confirm the effectiveness of the proposed method. The conclusion of the paper is summarized in Section 5. ## 2. PRELIMINARIES ## 2.1. Power Distribution In this paper, we adopt a single-phase edge-triggered pipelined circuit shown in Fig. 1 that is used in a large number of ASIC applications. Pipelined circuits have no state feedback lines because they do not need information about the past cycles for running the present cycle. In order to compare energy dissipation results obtained by bipartition method, we begin our the power analysis with the original benchmark circuits. We implement the circuits presented in Fig. 2 using the primitive standard cells provided by CCL<sup>3</sup> (0.8µm technology). Power dissipation were estimated using EPIC PowerMill. Fig. 2 shows that the pipelined register Figure 1: A combinational pipelined circuit Figure 2: Power dissipation for several MCNC benchmarks takes a large fraction of total power dissipation in most circuits. On average they account for 57.5% of the total power budget. Obviously, the registers are almost the largest contributor to the power budget of a pipelined circuit. We can say that pipelined registers should be taken into account when optimizing for low power. ## 2.2. Power Estimation Using Entropy Consider a combinational logic circuit with m-bit input X and n-bit output Y. The power consumption can be obtained as follow [?]: $$Power \propto \frac{2^{n+1}}{3n(m+n)}H(Y)[H(X)+H(Y)], \qquad (1)$$ The entropy measures H(X) and H(Y) are typically obtained by monitoring the signals X and Y during high-level simulation of the circuit. In general, the outputs of a logic gate make fewer transitions than its inputs. It can be theoretically shown that $H(Y) \leq H(X)$ . The entropy is defined as follows: $$H(x) = \sum_{i=1}^{2^k} p_i \cdot \log_2 \frac{1}{p_i},$$ (2) where x is a discrete variable which can take k different values, while $p_i$ is the probability that x takes the i-th <sup>&</sup>lt;sup>3</sup>CCL stands for Computer and Communication Research Labs and is one of the members of Industrial Technology Research Institute in Taiwan. Figure 3: Bipartition architecture for low power. value $x_i$ . It should be noted that entropy is intuitively related to switching activity [9]. #### 2.3. Bipartition Architecture In this paper, our bipartition architecture is based on Shannon expansion. In Fig. 3, an input variable SEL has been selected and the original combinational block is transformed into two co-factor subcircuits by applying Shannon expansion. Power reduction is achieved by disabling one of the two subcircuits during operation. Note that the choice of SEL is critical to this approach. **Example:** Considering the three-input, two-output function $\mathbf{f} = \begin{bmatrix} f_1 \\ f_2 \end{bmatrix}$ , where $f_1 = ab + ab'$ and $f_2 = ab + bc + ac$ . The function is represented by the following truth table: | abc | $f_1f_2$ | |-----|----------| | 000 | 00 | | 001 | 10 | | 010 | 00 | | 011 | 11 | | 100 | 00 | | 101 | 01 | | 110 | 11 | | 111 | 11 | | | | Let us assume that the inputs cycle thorough $000 \rightarrow 111 \rightarrow 000 \dots$ most of the time. As a result, all input register bits will flip nearly at each clock cycle. If we select a as SEL signal, the table will be partitioned as follows: | a = 0 | | a = 1 | | | |-------|----------|-------|----------|--| | bc | $f_1f_2$ | bc | $f_1f_2$ | | | 00 | 00 | 00 | 00 | | | 01 | 10 | 01 | 01 | | | 10 | 00 | 10 | 11 | | | 11 | 11 | 11 | 11 | | Where the output function are $f_1 = c$ and $f_2 = bc$ when a = 1, and $f_1 = b$ and $f_2 = b + c$ when a = 0. Because register $R_1$ is activated for SEL=0 and $R_2$ for SEL=1, input pattern 000 will be stored in $R_1$ while pattern 111 is routed to $R_2$ . In case of a input sequence that alternates between 000 and 111 both register will stop toggeling. ### 3. PROPOSED METHOD #### 3.1. Problem Formulation As previously discussed in Section 1, we want to biparition a pipelined circuit such that the switching probabilities of $R_1$ , $R_2$ are low and so are the entropies of $Subcircuit_1$ and $Subcircuit_2$ . In fact, if we reduce the switching probilities of $R_1$ and $R_2$ , we also obtain low entropy of $Subcircuit_1$ and $Subcircuit_2$ . Therefore, the bipartition problem can be simplified down to a task of finding an input variable such that the total input switching probability of both partitions is minimal. As the switching probability is data dependent we need a input pattern sequence which is typically applied to the circuit. E.g., this test sequence can be obtained during simulation. ## 3.2. Energy-Efficient Bipartition Algorithm Consider a circuit with n input pins. If we want to bipartition it by using Shannon expansion, there will be n different combinations. In order to find the best solution we test each of these combinations. As there are only n combinations the runtime for this brute force approach is usually acceptable. In detail, we perform the following operations for each input pin: - Bipartition the original circuit by the input pin. - Calculate the total switching probabilities of the two partitions based on the test input sequence. A solution with minimal switching probability will be selected as the final result. ## 4. EXPERIMENTAL RESULTS The bipartition algorithm has been implemented in C++ on a Pentium III desktop PC. We used SIS<sup>4</sup> to synthesize our partition results and estimate power dissipation by PowerMill. The benchmark circuits with input test patterns taken from LGSynth91 were used to demonstrate our algorithm. In the experiment, 5v supply voltage and a clock frequency of 20MHz were <sup>&</sup>lt;sup>4</sup>SIS is a sequential circuit synthesis systems which was developed by U.C. Berkeley. Table 1: Power dissipations of the registers of original circuit and bipartition architectures | Circuits | Orig_Reg | Bipar_Reg | PR% | |----------|----------|-----------|-------| | sao2 | 376.8 | 311.4 | 17.4% | | 5xp1 | 282.1 | 202.5 | 28.2% | | bw | 198.1 | 133.7 | 32.5% | | clip | 356.1 | 268.7 | 24.5% | | con1 | 242.8 | 151.2 | 37.7% | | misex1 | 305.2 | 179.5 | 41.2% | | rd53 | 181.9 | 134.6 | 26.0% | | rd73 | 266.0 | 198.2 | 25.5% | | xor5 | 184.8 | 125.0 | 32.3% | | 9sym | 355.7 | 297.0 | 16.5% | | Average | | | 28.8% | assumed. The rugged script of SIS was used to optimize the benchmarks. The registers of the output part are unchanged in our architectures so that we do not consider the effect of these registers. The delay, area and power units are nS, $128\mu m^2$ and $\mu W$ in our experiments, respectively. Table 1 shows the average pipelined registers power reduction is 28.8%. These results are consistent with earlier discussion which shows the proposed algorithm reduces switching activity. Table 2 shows the power and delay for the original circuits as well as the bipartitioned circuits. The ER% column represents the reduction of power-delay product. As shown in Table 2, performing bipartition reduces not only power but aslo delay time of the circuit. On average, energy (power-delay product) reduction of 25.6% could be obtained. Note that some circuits are not suitable for bipartition architecture and have no energy reduction (negative reduction rate). In Fig. 4, we demonstrate the average power distribution of bipartition architecture. As shown in Fig. 4, the pipelined registers are still the largest power contributors in a pipeline stage even though their contribution decreased from 57.5% to 45%. "Clock" represents two latches and AND gates, which also consumes a large amount of power in a bipartition architecture due to the frequent switching. As shown in the figure the subcircuits only consume 22% of the total power, which proves that our bipartition algorithm can effectively reduce entropy to obtain low power in combinational blocks. ## 5. CONCLUSION We have proposed a simple bipartition algorithm that selects an input variable to bipartite the original Table 2: Energy consumption of original circuit and bipartition architectures | Circuits | Original | | Bipartition | | | |----------|----------|-------|-------------|-------|--------| | | power | delay | power | delay | ER% | | sao2 | 573.4 | 14.7 | 521.8 | 12.3 | 24.2% | | 5xp1 | 577.0 | 16.1 | 562.0 | 15.1 | 8.8% | | bw | 568.7 | 26.2 | 546.9 | 14.8 | 45.9% | | clip | 647.1 | 14.3 | 588.5 | 12.3 | 21.7% | | con1 | 272.3 | 2.6 | 249.4 | 3.5 | -24.6% | | misex1 | 387.2 | 5.7 | 365.3 | 6.0 | 0.3% | | rd53 | 259.3 | 7.9 | 296.9 | 7.9 | -14.0% | | rd73 | 444.4 | 9.7 | 439.3 | 8.0 | 18.6% | | xor5 | 223.8 | 5.5 | 261.2 | 5.6 | -17.6% | | 9sym | 829.3 | 9.3 | 610.4 | 8.6 | 31.9% | | Average | 454.0 | 11.2 | 419.7 | 9.4 | 25.6% | Figure 4: Average power dissipation of bipartition architecture circuit which results in minimal switching probability in bipartition architecture in order to reduce power consumption. Further, we reduced not only the power of pipelined registers but also the delay of the circuit. This paper also analyzes power distribution, delay and energy before and after applying our partitioning approach. ## REFERENCES - M. Alidina, J. Monterio, S. Devadas, S. Devadas, A. Ghosh, and M. Papaefthymiou, "Precomputation-Based Sequential Logic Optimization for Low Power," *IEEE Trans. VLSI Syst.*, vol. 2, pp. 426-436, Dec. 1994. - [2] W. Ye and M. J. Irwin, "Power Analysis of Gated Pipeline Registers," in Proceeding of Twelfth Annual IEEE International ASIC/SOC Conference, pp. 281-285, 1999. - [3] H. Kapadia, L. Benini, and G. D. Micheli, "Reducing Switching Activity on Datapath Buses with Control-Signal Gating," IEEE J. Solid-State Circuits, vol. 34, pp. 405-414, March 1999. - [4] V. Tiwari, S. Malik, and P. Ashar, "Guarded Evaluation: Pushing Power Management to Logic Synthesis/Design," *IEEE Trans. Computer-Aided De*sign, vol. 17, pp. 1051–1060, Oct. 1998. - [5] M. Munch, B. Wurth, R.Mehra, J. Sproch, and N. Wehn, "Automating RT-Level Operand Isolatoin to Minimize Power Consumption in Datapaths," in Proceeding of Design, Automation and Test in Europe Conference and Exhibition, pp. 624– 631, 2000. - [6] S.-J. Chen, R.-J. Shang, X.-J. Huang, S.-J. Ruan, and F. Lai, "Bipartition and Synthesis in Low Power Pipelined Circuits," *IEICE Trans. Fundamentals of Electronics, Communication and Computer Scientice*, vol. E81-A, pp. 664-671, Apr. 1998. - [7] I.-S. Choi and S.-Y. Hwang, "Circuit Partition Algorithm for Low-Power Design Under Area Constraint Using Simulated Annealing," *IEE Proc. Circuit Devices Syst.*, vol. 146, pp. 8-15, Feb. 1999. - [8] S.-J. Ruan, J.-C. Lin, P.-H. Chen, F. Lai, K.-L. Tsai, and C.-W. Yu, "An Effective Output-Oriented Algorithm for Low Power Multipartition Architecture," in *IEEE Int'l. Conf. on Electronics, Circuits and Systems*, pp. 609-612, 2000. - [9] K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design. New York: John Wiley, 2000.