### 行政院國家科學委員會專題研究計畫 期中進度報告 子計畫二:時序就是一切:論電源震盪,雜訊,及溫度對時序 之影響(1/3) 計畫類別:整合型計畫 計畫編號: NSC92-2220-E-002-019- <u>執行期間</u>: 92 年 11 月 01 日至 93 年 10 月 31 日 執行單位: 國立臺灣大學電子工程學研究所 <u>計畫主持人</u> 陳中平 <u>共同主持人</u> 林永隆 報告類型: 完整報告 處理方式: 本計畫可公開查詢 中華民國93年7月3日 # Timing is Everything: Power Delivery, Signal Integrity, and Temperature (1/3) Number: 92-2220-E-002-019- Abstract—Due to coupling noises, process avariations, and power delivery fluctuations, design uncertainties of on-chip global interconnect systems rise sharply with deep-sub-micron (DSM) technology. It is increasingly difficult to assume deterministic and error-free signal transmission over global wires. Instead, on-chip global interconnect wires must be analyzed as an errorprone communication channel characterized by probability of bit error, and statistical timing distributions. In this paper, a novel statistical timing analysis approach is developed to analyze the behavior of two practically important pipelined multiple clock-cycle global interconnect architectures, namely, the flipflop inserted global wire and the latch inserted global wire. We present analytical formula that are based on parameters obtained using Monto Carlo simulation. These results enable a global interconnect designer to explore design trade-offs between clock frequency and probabilty of bit-error during data transmission, and to evaluate cost-effectiveness of reliability enhancement measures such as bus coding. #### I. Introduction Continuing shrinkage of feature size and ever-increasing SoC design complexity present great challenges to design high-reliability, high throughput on-chip interconnection circuitry. Design uncertainties rise sharply due to process variations, thermal noises, supply voltage fluctuation, cross coupling and non-controllable clock jitters. For the current generation VLSI, significant guard-banding has become a common practice to ensure signal integrity and avoid communication failures. Resource-hungry stopgap measures, such as shielding, buffer insertion, sizing, spacing, and extended timing margins are also aggressively applied. These existing approaches focus on the worst case scenario and choose design parameters conservatively to ensure the circuit behavior so derived can be predicted with certainty. For DSM design, such a conservative design approach may become impractical as it may yield no viable solution. Instead, statistical characterization of the performance of global interconnect becomes necessary. Along this direction, in this paper, we performed statistical timing analysis on two types of pipelined, multiple clock-cycle, global interconnect architectures: a flip-flop inserted global interconnect wire [1]–[7], and a latch inserted global interconnect wire. Our goal is to analyze the impacts of parameter variations and noise on the successful transmission of data in such a circuit. Specifically, we assume that the statistical variations of individual parameters, such as clock jitter, process parameters, and noise statistics are known. Using these statistics, we are able to derive analytical formula that characterize the distribution of timing delay of these multiple clock-cycle global interconnect wires. These distributions then allows one to estimate the probability of erroneous transmission of a single bit through the wire at a particular clock frequency. By varying the operating clock frequency, a bit-error-rate versus clock frequency plot can be drawn for each of these two global interconnect architectures. Such a plot for both the flipflop inserted and the latch inserted global interconnect system allows one to compare the different features of these two architectures. Two preliminary findings are very interesting: First, we observe that the latch-based architecture can operate at a higher clock frequency than the flip-flop based architecture with about the same bit error probability. Secondly, when the clock frequency is too much lower than the desired operating frequency, the latch based architecture may yield excessively high bit error rate due to potential race of signal propagation when the clock signal is held high. We note that existing statistical timing analysis results [8]—[14] focused on timing variations of combinational logics. The nonlinear effect of sequential elements such as flip-flops or latches are not taken into account. On the other hand, the worst-case based deterministic timing analysis for flip flop has been talked almost in every circuit textbook. This similar timing analysis for latch is also given in [15], [16]. In this paper, however, the close form of the probability distribution function(PDF) for both the flip-flop based and the latch based pipelined interconnect systems are successfully derived. Numerical analysis are given base on example designs. These statistical analysis is then used to evaluate the performance and reliability of these two interconnect pipelining methods. The rest of this paper is organized as following: Section II and III derive the close form of the PDF functions for D-type flip flop(DFF) and latch-based interconnect pipelining. In section IV, the STA parameters are extracted from the Monte Carlo simulation experiments of the example circuits. Section V does the numerical analysis by using the derived PDF equations and extracted STA parameters. With these results, both interconnect pipelining schemes are evaluated in terms of performance and reliability. Section VI briefly discusses the STA methodology for general sequential circuits and section Vi makes the conclusion. ## II. STATISTICAL TIMING ANALYSIS OF A DFF-PIPELINED INTERCONNECT A typical DFF-based interconnect pipelining stage is shown in Figure 1 where each DFF is rising-edge-triggered All pipelining stages are uniform and each stage has a flip flop with size s and a wire with length of l driven by an inverter with size z and terminated by another inverter with size w. Fig. 1. DFF Pipelined Interconnect Stage Each DFF will have an input terminal D and an output terminal Q. After the rising edge of the clock, data of the DFF in stage i will appear at its Q terminal after a propagation delay of $\tau_{prop}$ . These data will then reach the D terminal of the DFF in stage i+1 after wire delay of $\tau_{wire}$ . To satisfy the setup time constraint of the DFF in stage i+1, new data has to arrive at the D terminal earlier than the next rising clock edge with amount of the setup time $\tau_{setup}$ . For a data transition propagating through a DFF-pipelined interconnect stages, $x_i$ , the arrival time of the $i^{th}$ stage is defined at the D terminal of the DFF in $(i+1)^{th}$ stage. Clearly, $x_i$ has to be within a range of $$iT_{CLK} \leq x_i \leq (i+1)T_{CLK} - \tau_{setup}$$ where $T_{CLK}$ is the clock period measured from the rising edge. If this condition is violated, the $(i+1)^{th}$ stage will not be able to register the signal transition and a bit error may occur. The relative data arrival time in the $i^{th}$ stage is defined as $p_i = x_i - iT_{CLK}$ . Since data in the $(i-1)^{th}$ stage will propagate into $i^{th}$ stage only after the $i^{th}$ rising clock edge, $$p_i = \tau_{prop} + \tau_{wire} \tag{1}$$ according to the definition of $\tau_{prop}$ and $\tau_{wire}$ . Therefore, data transmission from the $i^{th}$ stage to the $(i+1)^{th}$ stage will be free of error only if $$0 \le p_i \le T_{CLK} - \tau_{setup}$$ In other words, if a data bit is already correctly registered by the DFF in $i^{th}$ stage, the probability to have correct data transmission between the $i^{th}$ and $(i+1)^{th}$ stage can be expressed as: $$q = Pr(0 \le p_i \le T_{CLK} - \tau_{setup}) \tag{2}$$ We model the data arrival time $p_i$ as a random variable with probability density function (p.d.f.) $P(p_i)$ . Noted that from equation (1), $p_i$ is the sum of two delay terms, $\tau_{prop}$ and $\tau_{wire}$ which are also random variables with p.d.f.s of $P(\tau_{prop})$ and $P(\tau_{wire})$ respectively. So $P(\tau_{stage})$ can be found as the convolution of $P(\tau_{prop})$ and $P(\tau_{wire})$ : $$P(p_i) = P(\tau_{prop}) * *P(\tau_{wire})$$ (3) where "\*\*" is the convolution operator. Since the clock period $T_{CLK}$ and the setup time of the DFF, $\tau_{setup}$ are also random variables, the difference between them, $A = T_{CLK} - \tau_{setup}$ will also be a random variable. We denote the p.d.f. of these three random variable as $P(T_{CLK})$ , $P(\tau_{setup})$ and P(A) respectively. Hence the probability that a data bit is successfully transmitted from stage i to stage i+1 can be evaluated as: $$q = \int_{-\infty}^{\infty} \left\{ \int_{0}^{T_{CLK} - \tau_{setup}} P(p_i) dp_i \right\} P(A) dA \quad (4)$$ where $P(A) = P(T_{CLK}) * *P(-\tau_{setup})$ It is impossible to have $p_i < 0$ since $p_i$ is the sum of delays. $$q = \int_{-\infty}^{\infty} \left\{ \int_{-\infty}^{T_{CLK} - \tau_{setup}} P(p_i) dp_i \right\} P(A) dA \quad (5)$$ One may define yet another random variable $$\delta = \tau_{prop} + \tau_{wire} + \tau_{setup} - T_{CLK} \tag{6}$$ whose p.d.f. is: $$P(\delta) = P(\tau_{prop}) * *P(\tau_{wire}) * *P(\tau_{setup}) * *P(-T_{CLK})$$ The probability to have correct data transmission from stage i to i+1 is then: $$q = \int_{-\infty}^{\infty} P(A)dA \int_{-\infty}^{0} P(\delta)d\delta = \int_{-\infty}^{0} P(\delta)d\delta \quad (7)$$ The error probability that a single data bit is transmitted through a *N*-stage DFF-pipelined global interconnect wire, denoted by BER, is the bit error rate when data is transmitted through this on-chip communication channel. Due to the presence of a DFF, the probability of correct data transmission at each stage is independent of each other. Hence, $$BER = 1 - q^N \tag{8}$$ If we impose the normal random variable assumptions about $\tau_{prop}$ , $\tau_{wire}$ , $\tau_{setup}$ , and $T_{CLK}$ , the random variable $\delta$ will have a normal density function such that its mean and variance are computed as: $$\mu_{\delta} = \mu_{\tau_{prop}} + \mu_{\tau_{wire}} + \mu_{\tau_{setup}} - \mu_{T_{CLK}}$$ (9) $$\sigma_{\delta}^2 = \sigma_{\tau_{prop}}^2 + \sigma_{\tau_{wire}}^2 + \sigma_{\tau_{setup}}^2 + \sigma_{T_{CLK}}^2$$ (10) The corresponding probability of correct data transmission from stage i to stage i+1 then can be computed as $$q = Pr(\delta \le 0) = \frac{1}{2} + erf\left(\frac{\mu_{\delta}}{\sigma_{\delta}}\right) \tag{11}$$ where $$erf(x) = \frac{1}{\sqrt{2\pi}} \int_0^x exp\left(-\frac{t^2}{2}\right) dt$$ Fig. 2. Latch Pipelined Interconnect Stage ### III. LATCH-PIPELINED INTERCONNECT A typical latch-pipelined interconnect stage is shown in Figure 2 where positive-passing latches with size of s are used. Again, we assume identical pipelined stages with each stage having a wire of length l, driven by a driver of size z, and terminated by an acceptor buffer with size of w. There are three kinds of delays associated with a latch: (i) $\tau_{data}$ , the delay between data input terminal D and output terminal Q when clock is high. (ii) $\tau_{prop}$ , the delay between positive clock edge and Q. And (iii) $\tau_{setup}$ , the time data signal must held steady before the falling edge of the clock signal so that the latch may latch on to the correct data value. We also denote $\tau_{wire}$ to the delay incurred over the wire segments within a pipelined stage. Fig. 3. Latch Timing ### A. Timing Analysis In figure 3, the timing constrains of a latch-piplined stage are illustrated. The square wave in solid line represents the ideal clock waveform with a period equal to $T_{CLK}$ . The rising clock edge is designated as the origin of each clock period. We assume 50% duty cycle so that the width of the clock pulse is $0.5T_{CLK}$ . We use point B to mark the falling edge of clock pulse, and point C to mark the point $0.5 \cdot T_{CLK} - \tau_{setup}$ within each clock cycle. $x_i$ is the arrival time of a data bit at the input D of the latch at stage i. The $i^{th}$ clock period is defined as $\{t|(i-0.5)T_{CLK} \le t \le (i+0.5)T_{CLK}\}$ . Obviously, if $x_i$ falls outside the current clock period, it is a faulty transmission. For convenience, we also denote a relative data arrival time with respect to the rising clock edge of the current clock period at the $i^{th}$ pipelined stage as $p_i = x_i - i \cdot T_{CLK}$ . Hence the range of $p_i$ within the $i^{th}$ clock period is $\{t \mid -0.5 \cdot T_{CLK} \leq t \leq$ $0.5 \cdot T_{CLK}$ $\}$ . If the data arrives within $\tau_{setup}$ time units prior to the falling clock edge B, then the D latch may not have sufficient time to register this new data, and the data transmission will fail. Hence the interval $F = \{t | C = 0.5T_{CLK} - \tau_{setup} < t < 0.5T_{CLK}\}$ is denoted as the *faulty region* within each clock period. As illustrated in figure 3, $x_{i-1}$ falls inside an *opaque region* $O = \{t | -0.5 \cdot T_{CLK} \le t \le 0\}$ prior to the rising clock edge. During this interval, the clock signal is low, and the latch is off. Thus, the latch appears to be *opaque* to the data signal at its input D. The data will remain there until the rising clock edge at the end of this interval. Then it will propagate through the latch and wire in pipelined stage i and arrive the output of the acceptor at time $x_i$ . Thus $$x_i = (i-1)T_{CLK} + \tau_{prop} + \tau_{wire} \tag{12}$$ Equivalently, we say if $p_{i-1} \in O$ , $$p_i = -T_{CLK} + \tau_{prop} + \tau_{wire} \tag{13}$$ Note that $p_i$ is independent of $p_{i-1}$ . In figure 3, $x_i$ arrives after the rising clock edge (t=0) and falls within a transparent region $T=\{t|0 \le p_i \le 0.5T_{CLK}-\tau_{setup}\}$ . Since the latch is on, it appears transparent to the input signal at D. The incoming data immediately starts passing through the latch and subsequent buffers, and wire. Hence $$x_{i+1} = x_i + \tau_{data} + \tau_{wire} \tag{14}$$ Equivalently, $$p_{i+1} = p_i + \tau_{data} + \tau_{wire} - T_{CLK} \tag{15}$$ To summarize, we have $$p_{i+1} = \begin{cases} \tau_{prop} + \tau_{wire} - T_{CLK} & (p_i \in O) \\ p_i + \tau_{data} + \tau_{wire} - T_{CLK} & (p_i \in T) \end{cases}$$ (16) If $p_i \in F$ , then the data transmission is considered as a failure and there will be no $p_{i+1}$ . Note that when $p_i \in T$ , $p_{i+1}$ is dependent on $p_i$ . This is not the case of a DFF pipelined global interconnect where the delay within each pipelined stage is independent of other pipelined stages. ### B. Probability Density Function of Data Arrival Time In order to calculate the probability of successful transmission of a single data bit through a N-stage D latch-pipelined global interconnect, one needs to compute the p.d.f. of the data bit arrival time $P(p_i)$ for each pipelined stage. This task is much more complicated for a latch-pipelined global wire than a DFF-pipelined global wire, because $p_i$ may depend on $p_{i-1}$ if $p_{i-1} \in T$ , or $p_i$ may not be defined if $p_{i-1} \in F$ . Therefore, we will derive a formula for a conditional p.d.f. $P(p_{i+1}|p_i \in O \cup T)$ instead. Since $\{p_i \in O\}$ and $\{p_i \in T\}$ are disjoint events, one has $$P(p_{i+1}) = \frac{P(p_{i+1}|p_i \in O \cup T)}{Pr(p_i \in O \cup T)}$$ $$= \frac{Pr(p_i \in O)}{Pr(p_i \in O) + Pr(p_i \in T)} P(p_{i+1}|p_i \in O)$$ $$+ \frac{Pr(p_i \in T)}{Pr(p_i \in O) + Pr(p_i \in T)} P(p_{i+1}|p_i \in T)$$ (17) The conditional p.d.f. $P(p_{i+1}|p_i \in O)$ is the convolution of the p.d.f.s of $P(\tau_{prop})$ , $P(\tau_{wire})$ , and $P(T_{CLK})$ : $$P(p_{i+1}|p_i \in O)$$ $$= P(\tau_{prop}) * *P(\tau_{wire}) * *P(-T_{CLK})$$ (18) Note that this p.d.f. is the same for each stage index i. The conditional p.d.f. $P(p_{i+1}|p_i \in T)$ on the other hand, is the convolution of the p.d.f.s of four independent random variables: $$P(p_{i+1}|p_i \in T)$$ $$= P(p_i|p_i \in T) **P(\tau_{data}) **P(\tau_{wire}) **P(-T_{CLK})$$ where $P(p_i|p_i \in T)$ is defined as: $$P(p_i|p_i \in T) = \begin{cases} 0 & (p_i \notin T) \\ \frac{P(p_i)}{Pr(p_i \in T)} & (p_i \in T) \end{cases}$$ (20) Given the initial p.d.f., $P(p_0)$ as the input of the first stage, then, recursively, $P(p_i)$ can be computed once $P(p_{i-1})$ is computed. ### C. Bite Error Rate(BER) Given that all previous stages have correct data transmission, the probability that stage i can correctly send a data bit to stage i+1 is: $$q_{i} = Pr(p_{i} \in O \cup T)$$ $$= \iint_{-\infty}^{+\infty} P(B)P(C)dBdC \int_{B}^{C} P(p_{i})dp_{i}$$ (21) where random variables $B = -0.5T_{CLK}$ , $C = 0.5T_{CLK} - \tau_{setup}$ and their p.d.f.s are $P(B) = P(-0.5T_{CLK})$ and $P(C) = P(0.5T_{CLK}) * P(-\tau_{setup})$ respectively. To have a data bit correctly propagating through N latchpipelined stages, one must have all N stages correctly transfer this data bit. So the overall bit error rate(BER) equals: $$BER = 1 - \underbrace{q_1 q_2 \cdots q_N}_{N} = 1 - \prod_{i=1}^{N} q_i$$ (22) # IV. EMPIRICAL PROBABILITY DENSITY FUNCTION ESTIMATION OF TECHNOLOGY DEPENDENT DELAY PARAMETER The p.d.f.s of $\tau_{setup}$ , $\tau_{data}$ , $\tau_{prop}$ , and $\tau_{wire}$ are needed to facilitate the statistical analysis. Rather than assuming these are normal random variables, we have conducted Monto Carlo simulation using $0.18\mu m$ technology parameters to obtain the mean and standard deviation of these random variables and use these numbers for subsequent statistical timing analysis. For this purpose, we have made a few assumptions: - Temperature fluctuation is set to $100^{\circ}C$ and uniformly distributed from $27^{\circ}C$ to $127^{\circ}C$ . - Supply voltage uncertainty is set to be 10% and uniformly distributed. - Each process parameter variation is assumed to obey a uniform distribution within the limits provided by technology files. - Crosstalk effect is modeled by putting parallel aggressor wires in both sides of the victim wire and passing random signals through these aggressor wires. We estimate the clock uncertainty based on work reported in [17] that claimed the best effort clock uncertainty is 51ps in $0.18\mu m$ technology. If the clock cycle follows Gaussian distribution, it is then reasonable to assume the standard deviation of the clock cycle is $\sigma_c=17ps$ . The designed clock cycle is the mean value of the clock cycle distribution, $\mu_c$ . The expected bit rate sent through the interconnect is $1/\mu_c$ if bits are NRZ-coded. A common wire segment with $w=24\mu m$ and $z=40\mu m$ is pipelined between DFFs or latches with size of $s=16\mu m$ . The latches are designed in such a way that $\tau_{setup}\equiv\tau_{data}$ . The designed stage wire delay, $\mu_w$ is the mean value of the wire delay and varies if the wire length l in a stage changes. The variation of the stage wire delay $\sigma_w$ , however, is assumed to be constant. Fig. 4. P.d.f.s of $\tau_{setup}$ , $\tau_{prop}$ , and $\tau_{wire}$ in DFF Pipelining Fig. 5. P.d.f.s of $\tau_{setup}$ , $\tau_{prop}$ , and $\tau_{wire}$ in latch Pipelining The delay parameters extracted from Monte Carlo simulation results using 300 independent trials are shown in Figure 4 for DFF pipelining and Figure 5 for latch pipelining. The *p.d.f.*s of these delay parameters can be approximated well with Gaussian density functions whose empirical means and standard deviations are summarized in Table I. ### V. THROUGHPUT AND RELIABILITY It is desirable to transfer data bits through interconnect as fast as possible. But due to the statistical uncertainty of the interconnect, bit error will happen in certain rate. TABLE I DISTRIBUTION PARAMETERS FOR DELAY ELEMENTS | | $\mu$ | | σ | | |----------------|---------|-------|------|-------| | | DFF | Latch | DFF | Latch | | $\tau_{setup}$ | 94ps | 119ps | 9ps | 13ps | | $ au_{prop}$ | 120ps | 116ps | 13ps | 13ps | | $ au_{wire}$ | $\mu_w$ | | 9ps | | | $T_{CLK}$ | $\mu_c$ | | 17ps | | So an appropriate criteria to measure the throughput of the interconnect is the number of correct data bits transferred per second(bps). If data bits, encoded as NRZ, are transferred through the interconnect with the same rate as the designed operation clock frequency( $1/\mu_c$ ), the overall throughput *Thput* will be: $$Thput = \frac{1 - BER}{\mu_c} \tag{23}$$ If error checking or correcting code(ECC) is included, then the throughput will be furthermore compromised by the fraction of bits used for ECC. The reliability of the interconnect is measured by its BER. The less the BER is, the better the reliability. Since error correcting is usually a very expensive operation, it is demanded not only to have highly reliable interconnect, but also to precisely predict the BER value to identify the reliable region since even very small variation of the BER will result in severe performance degradation as indicated by Amdahl's law. In the following discussions, a system is defined to be reliable if its $BER \leq 1\%$ . And the reliable region is defined as the region of the designed clock frequency $1/\mu_c$ where the system shows $BER \leq 1\%$ . The reliable region width, if exists, is the difference between the maximum frequency and minimum frequency inside the reliable region. This 1% critical BER is just selected as an example used to illustrate the STA methodology in the following section. Different application could have different BER critical values but the STA methodology remains the same. ### A. Operation Clock Frequency $(1/\mu_c)$ Figure 6 shows the STA analysis of the reliability and throughput for a pipelined interconnect which has 8 pipelining stages with each stage has a wire length of $1.4\text{mm}(\mu_w=110ps)$ . (a) BER (b) Throughput Fig. 6. BER and Throughput v.s. Mean Clock Frequency Calculated from Figure 6(a), the reliable region of the DFF-pipelined interconnect is single-side bounded: $1/\mu_c \leq 2.34GHz$ . But for the latch-pipelined interconnect, the reliable region is double-side bounded: $3.34GHz \leq 1/\mu_c \leq 3.44GHz$ . The reliable region width for latch pipelining is 0.1GHz. As shown in Figure 6(b), both DFF and latch pipelining have their maximum throughput, 2.35Gbps and 3.38Gbps respectively. Latch pipelining shows 44% of speed up over DFF pipelining. In the case of DFF pipelining, the static worst-case timing analysis gives the maximum frequency of 2.2GHz for $BER \leq 1\%$ . This is too pessimistic and 7% of throughput is lost compared with the STA result. The worst-case analysis performs worse in the case of latch pipelining since it predicts to have no reliable region. Furthermore, if the intuitive method is used, the operation frequency will be decided as the center of the feasible region in the average case. This intuitive operation frequency, 3.46GHz is actually outside the reliable region of $3.34GHz \le 1/\mu_c \le 3.44GHz$ as predicted by STA. So it is clear that STA is necessary to decide the operation frequency of high performance interconnect pipelining. ### B. Interconnect Throughput Optimization Static worst-case timing analysis is not able to catch the change of BER for different design, so it will predict the maximum throughput defined in (23) for both DFF and latch pipelining is independent on the number of pipelining stages N if the stage wire delays $\mu_w$ maintains the same. (a) Pipelining Stage Number (b (b) Stage Wire Delay Fig. 7. Interconnect Throughput But, clearly shown in Figure 7(a), the maximum throughput of the interconnect for both pipelining slightly decreases when number of stages increases. For a fixed overall length of wire, the stage wire delay will decrease if the number of pipelining stages increases. Static timing analysis will give a conclusion that the maximum throughput of the latch pipelining will monotonically increase if the number of pipelining stage increase since the stage wire delay decreases. But STA analysis shown in Figure 7(b) that the maximum throughput will actually drop when the stage wire delay is smaller than a critical value. So there will be an optimum value for the number of latch pipelining stages. ### C. Reliability of Latch Pipelining Reliability may not be a big issue for DFF pipelining because its reliable region is single-side bounded. But it is an important design consideration in the case of latch pipelining. Both pipelining stage number and stage wire delay greatly affect the reliability of latch pipelining. The width of reliable region is shown in Figure 8 with different pipelining stages and wire delays. (a) Pipelining Stage Number (b) Stage Wire Delay Fig. 8. Reliability of Latch Pipelining It is clear that longer $\mu_w$ and smaller N will gives wider reliable region and improve the reliability of latch pipelining. So for a fixed overall length of interconnect, less pipelining stages and longer wire length per stage will give better reliability. This is actually contradict to which is preferred in throughput maximization where very small stage wire delay is desired. So by using STA method, detailed trade-off throughput with reliability can be accomplished while with static timing analysis, it is impossible. ### VI. RESEARCH PUBLICATIONS In this support research, we achieve the following publications: - 1. Jeng-Laing Tsai, Tsung-Hao Chen, and Charlie Chung-Ping Chen, "Zero-Skew Clock-Tree Optimization with Buffer-Insertion/Sizing and Wire-Sizing," IEEE Transactions on Computer-Aided Design of Integrated Circuits And Systems (TCAD), 2004. - 2. Lizheng Zhang, Yu Hen Hu, and Charlie Chung-Ping Chen, "Statistical Timing Analysis in Sequential Circuit for On-Chip Global Interconnect Pipelining," IEEE/ACM Design Automation Conference (DAC), 2004. - 3. Rong Jiang and Charlie Chung-Ping Chen, "ESPRIT: A Compact Reluctance Based Interconnect Model Considering Lossy Substrate Eddy Current," International Microwave Symposium (IMS), 2004. - 4. Ting-Yuan Wang, Jeng-Liang Tsai, and Charlie Chung-Ping Chen, "Power-Delivery Networks Optimization with Thermal Reliability Integrity," ACM International Symposium on Physical Design (ISPD), April, 2004. - 5. Ting-Yuan Wang and Charlie Chung-Ping Chen, "SPICE-Compatible Thermal Simulation with Lumped Circuit Modeling for Thermal Reliability Analysis based on Model Reduction," 5th International Symposium on Quality Electronic Design (ISQED), March, 2004. - 6. Ting-Yuan Wang, Jeng-Liang Tsai, and Charlie Chung-Ping Chen, "Thermal and Power Integrity based Power/Ground Networks Optimization," Design, Automation and Test in Europe Conference and Exhibition (DATE), February, 2004. - 7. Rong Jiang and Charlie Chung-Ping Chen, "SCORE: SPICE Compatible Reluctance Extraction," Design, Automation and Test in Europe Conference and Exhibition (DATE), February, 2004. - 8. Rong Jiang and Charlie Chung-Ping Chen, "Realizable Reduction for Electromagnetically Coupled RLMC Interconnects," Design, Automation and Test in Europe Conference and Exhibition (DATE), February, 2004. - 9. Lizheng Zhang, Yuhen Hu, and Charlie Chung-Ping Chen, "Wave-pipelined On-Chip Global Interconnect," ACM/IEEE TAU Workshop on Timing Issues in the Specification and Synthesis of Digital Systems, 2004. ### VII. CONCLUSION In this paper, we present a novel, comprehensive design methodology to perform analytical and empirical statistical timing analysis of two popular pipelined global interconnection architectures: DFF-pipined global interconnect and latch-pipelined global interconnect. This design methodology allows a designer to perform efficient, yet accurate trade-offs analysis to search for optimal design in terms of number of pipelined stages, maximum operating clock frequency, and bit error rate. We have found that the DFF-pipelined design is relatively easy to analyse. Yet, its constraints limit the theoretically attainable performance when compared to a latch pipelined global interconnect architecture. The two-sided timing constraints of a latch-pipelined architecture highlights the extra care that must be taken to properly design a latch-pipelined interconnect architecture. ### REFERENCES - H. Shah, P. Shin, B. Bell, M. Aldredge, N. Sopory, and J. Davis, "Repeater insertion and wire sizing optimization for throughput-centric vlsi global interconnects," *IEEE/ACM International Conference on Computer Aided Design(ICCAD 2002)*, pp. 280 –284, 2002. - [2] K. Banerjee and A. Mehrotra, "A power-optimal repeater insertion methodology for global interconnects in nanometer designs," *IEEE Transactions on Electron Devices*, vol. 49, no. 11, pp. 2001 –2007, Nov 2002. - [3] S. Srinivasaraghavan and W. Burleson, "Interconnect effort -a unification of repeater insertion and logical effort," *Proceedings of IEEE Computer Society Annual Symposium on VLSI*, pp. 55 –61, Feb 2003. - [4] C. Chu and D. F. Wong, "Closed form solutions to simultaneous buffer insertion/sizing and wire sizing," ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 6, no. 3, pp. 343–371, July 2001. - [5] L. Scheffer, "Methodologies and tools for pipelined on-chip interconnect," *IEEE International Conference on Computer Design(ICCD)*, 2002 - [6] P. Cocchini, "Concurrent flip-flop and repeater insertion for high performance integrated circuits," *IEEE/ACM International Conference on Computer Aided Design(ICCAD)*, pp. 268 –273, 2002. - [7] R. Lu, G. Zhong, C. K. Koh, and K. Y. Chao, "Flip-flop and repeater insertion for early interconnect planning," *Proceedings of Design, Au*tomation and Test in Europe Conference and Exhibition, 2002, pp. 690 –695. March 2002 - [8] J. J. Liou, K. T. Cheng, S. Kundu, and A. Krstic, "Fast statistical timing analysis by probabilistic event propagation," *Design Automation Conference, Proceedings*, pp. 661–666, June 2001. - [9] J. J. Liou, C. L. Wang, A. Krstic, and K. T. Cheng, "Experience in critical path selection for deep sub-micron delay test and timing validation," *Design Automation Conference*, 2003. Proceedings of the ASP-DAC 2003. Asia and South Pacific, pp. 751 –756, Jan 2003. - [10] A. Agarwal, V. Zolotov, and D. Blaauw, "Statistical timing analysis using bounds and selective enumeration," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 22, no. 9, pp. 1243 –1260, Sept 2003. - [11] B. Choi and D. Walker, "Timing analysis of combinational circuits including capacitive coupling and statistical process variation," VLSI Test Symposium, 2000. Proceedings. 18th IEEE, pp. 49–54, May 2000. - [12] H. Jyu, S. Malik, S. Devadas, and K. Keutzer, "Statistical timing analysis of combinational logic circuits," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 1, no. 2, pp. 126 –137, June 1993. - [13] R.-B. Lin and M.-C. Wu, "A new statistical approach to timing analysis of vlsi circuits," VLSI Design, 1998. Proceedings., 1998 Eleventh International Conference on, pp. 507 –513, Jan 1998. - [14] R. Brawhear, N. Menezes, C. Oh, L. Pillage, and M. Mercer, "Predicting circuit performance using circuit-level statistical timing analysis," European Design and Test Conference, 1994. EDAC, The European Conference on Design Automation. ETC European Test Conference. EUROASIC, The European Event in ASIC Design, Proceedings., pp. 332 –337, Mar 1994. - [15] I. Lin, J. A. Ludwig, and K. Eng, "Analyzing cycle stealing on synchronus circuits with level sensitive latches," 29th ACM/IEEE Design Automation Conference, 1992. - [16] B. Taskin and I. S. Kourtev, "Performance optimization of single-phase level sensitive circuits using time borrowing and non-zero clock skew," *TAU* 2002, Dec 2002. - [17] N. Kurd, J. Barkatullah, R. Dizon, T. Fletcher, and P. Madland, "A multigigahertz clocking scheme for the pentium(r) 4 microprocessor," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 11, pp. 1647 –1653, Nov. 2001.