# Jump Scan: A DFT Technique for Low Power Testing

Min-Hao Chiu and James C.-M. Li

Laboratory of Dependable Systems, GIEE Electrical Engineering Department, National Taiwan University cmli@cc.ee.ntu.edu.tw

## Abstract

This paper presents a Jump scan technique (or Jscan) for low power testing. The J-scan shifts two bits of scan data per clock cycle so the scan clock frequency is halved without increasing the test time. The experimental data show that the proposed technique effectively reduces the test power by two thirds compared with the traditional MUX scan. The presented technique requires very few changes in the existing MUX-scan design for testability methodology and needs no extra computation. The penalties are area overhead and speed degradation.

# 1. Introduction

Circuit power dissipation in test mode is much higher than the power dissipation in function mode [Zorian 93]. One possible reason is that automatic test pattern generators (ATPG) try to activate as many faults as possible to minimize the test application time [Wang 97]. Low power design for testability (DFT) techniques are gaining more and more importance recently [Girard 02]. The first advantage of low power DFT techniques is to avoid the risk of damaging the Circuits Under Test (CUT). High temperature and high current in test mode not only cause catastrophic damage at the time of testing but also accelerate reliability failures (such as electromigration). Low power DFT techniques save the cost of expensive packages or external cooling devices for heat dissipation. In addition, low power DFT techniques enable parallel testing of multiple cores in the system on a chip (SOC). Power consumption in test mode is one of the major constraints when scheduling tests for multiple cores [Chou 94]. By applying the low power DFT techniques, many cores can be tested at the same time and hence the overall SOC test time is reduced. Last, low power DFT techniques prevent on-chip power integrity problems in test mode. High current in test mode results in excessive Vdd drop or ground bounce, which may cause the CUT to malfunction. Low power DFT techniques ensure correct operations of the CUT in test mode.

This paper presents the Jump-scan (or J-scan) DFT technique for low power testing. As opposed to traditional Mux-scan chains which shift one bit per clock cycle, the proposed J-scan chains shift two bits per clock cycle. J-scan halves the clock frequency without increasing the test time. This is achieved by modifying the scan cells and adding an extra routing for scan signals.  $J_{QN}$ -scan is an enhanced version of Jscan by adding the QN-scan toggle suppression technique. The simulation results show that J<sub>ON</sub>-scan saves up to 67% test power compared with the traditional MUX scan. The proposed technique has two important applications. The first application is parallel testing of multiple cores on a SOC because the J-scan test power is even lower than the power in function mode. Alternatively, J-scan can be also applied to double the test data rate and save the test time by the half.

In addition to test power reduction, the other advantages of J-scan are as follows. First, J-scan is compatible with the existing MUX-scan DFT methodology. Neither extra computation nor special ATPG is needed to implement J-scan. Second, J-scan technique needs no modification to the clock trees, which avoids the risk of clock skew. Third, the  $J_{QN}$ -scan technique is applicable to delay fault testing as well as stuck-at fault testing. The cost of J-scan include area overhead and speed degradation.

The organization of this paper is as follows. Section two introduces the background knowledge of low power testing and reviews past publications in this area. The third section explains our J-scan DFT technique in detail. Section four shows the experimental data collected on ISCAS'89 benchmark circuits. The fifth section discusses some issues related to the presented idea. And finally the last section concludes the paper.

# 2. Background

## 2.1 Power Dissipation

The dynamic power dissipation of CMOS circuits can be classified into two major components: the short circuit power and the switching power. The former is caused by the temporary short circuits at the moment of signal transition, when both PMOS and NMOS transistors are turned on for a short period of time. The short circuit power can be calculated by eq.1.  $E_i$  is the energy consumed per transition of gate i.  $TR_i$  is the toggle rate of output of gate i.  $E_i$  is usually provided by the ASIC vendor and  $TR_i$  is usually obtained by simulation. The short circuit power is consumed by the combinational logic and the sequential circuits, such as scan flip-flops.

$$P_{SC} = \sum_{\text{for all gate } i} (E_i \cdot TR_i)$$
(eq. 1)

The switching power is consumed by charging and discharging of the capacitors. It can be calculated by eq. 2.  $C_{\text{load i}}$  is the total load capacitance connected to net i.  $C_{\text{load}}$  can be extracted from the physical layout or estimated by the synthesis tool. TR<sub>i</sub> is the toggle rate on net i, which can be a gate internal node or a piece of interconnection wire. Again, the TR can be obtained by simulation. The switching power is consumed by signals as well as clocks. The clock power dissipation is a significant component of the switching power because the clock network is heavily loaded.

$$P_{SW} = \frac{V_{DD}^{2}}{2} \cdot \sum_{\text{for all net } i} (C_{\text{load } i} \cdot TR_{i})$$
(eq. 2)

## 2.2 Past Research

Past research in low power testing can be summarized as follows. Reordering the sequence of the scan cells to reduce the test power is proposed in [Dabholkar 98]. The problem of scan chain reordering is that the optimal order for one test set (*e.g.* stuck-at fault test set) may not be optimal for another test set (*e.g.* delay fault test set). Disabling or gating the clock of certain scan chains also helps to reduce the power [Sankaralingam 01] [Bonhomme 01] [Whetsel 00]. Disabling the clocks not only increases the risk of skew problems but also imposes some constraints on the test patterns generation. Inserting gates (like inverters, XOR, XNOR) into scan chains can minimize the toggling when scan chains shift [Sinanoglu 02]. This technique requires not only extra gates but also computation for optimal positions to insert these gates. The toggle suppression technique separates the data outputs and scan outputs of scan cells [Hertwig 98] [Gerstendorfer 99]. By suppressing the data outputs, the power consumption in combinational circuits is reduced. However, the skew-load delay fault testing cannot be applied because of suppressed data outputs in scan mode. An improved toggle suppression technique, the quiet-noisy scan (or QN scan), is proposed for low power delay fault testing [Li 04]. The QN scan operation, which is composed of quiet scans followed by a noisy scan in the last cycle, makes the skew-load delay fault testing possible.

## 3. J-scan Technique

## 3.1 J-scan DFF and J-scan chain

Figure 1 shows the structure of a J-scan DFF. It contains a negative latch (NL), a positive latch (PL) and two multiplexers. This is a master-slave implementation of a rising edge triggered flip-flop. This scan cell is called J-scan DFF for the 'jumping behavior' when test patterns are shifting in the scan chain. Compared with a Mux-scan (M-scan) DFF, the J-scan DFF has an additional Jump Input (JI) pin and an additional Jump Output (JO) pin. The negative latch (NL) is transparent when the clock is in negative phase; the positive latch (PL) is transparent when the clock is in positive phase. The two multiplexers are controlled by the scan enable (SE) signal. When the SE is de-asserted (function mode), Mux1 and Mux2 select the data input (DI) and the output of NL, respectively. In function mode, a J-scan DFF is the same as a rising edge triggered DFF. When the SE is asserted (scan mode), Mux1 and Mux2 select the SI and the JI, respectively. In the negative phase of the clock, the NL is transparent (from SI to JO) and the PL latches the data from JI. In the positive phase of the clock, the PL is transparent (from JI to SO) and the NL latches the data from SI. By doing so, the SI is shifted to the JO output and the JI is shifted to the SO output. In this figure, the Data Output (DO) and the Scan Output (SO) are shared. These two pins can be separated as long as the QN output of the PL is available.



Figure 1. J-scan DFF





Figure 2 illustrates a J-scan chain with four J-scan DFFs. The multiplexers, SE signals and DI signals are omitted for clarity. The J-scan DFFs are numbered in increasing order, from the scan input to the scan output of the chain. In J-scan chain, two J-scan DFFs are connected by two wires. One of the routing paths is called the scan path (from SO to SI). Another routing path is called the jump path (from JO to JI). For example, the SO output of the first J-scan DFF (JSD<sub>1</sub>) is connected to the SI input of JSD<sub>2</sub> as a scan path. The JO output of  $JSD_1$  is connected to the JI input of  $JSD_2$ as a jump path. Note that the scan input of the scan chain (Scan In ) is connected to both the JI and the SI of JSD<sub>1</sub>. Figure 3 illustrates the Scan In and the clock waveforms of M-scan and J-scan. The waveforms are divided into four time periods, marked from I to IV. The input data, A to D, are applied at the beginning of every time period. The clock frequency of the M-scan is two times higher than that of the J-scan. For the Mscan, every time period has a negative phase and a positive phase. For the J-scan, every time period has only one negative phase or one positive phase.



Figure 3. Waveforms of M-scan and J-scan

Table 1 shows the scan input data and the contents of all J-scan DFFs in every time period. Only two clock cycles of J-scan (instead of four clock cycles of M-scan) are needed to shift in four bits of test data. The contents of latches differ from those in the previous time period are underlined. Test data A and C shift via the thin lines in Figure 2; test data B and D shift via the bold lines. In time period IV, test data A to D (highlighted in gray) are settled in JSD<sub>1</sub> to JSD<sub>4</sub>, respectively. The scan out waveforms are the same as the scan in waveforms shown in Figure 3. Note that a multiplexer (Mux3) has to be inserted between the last J-scan DFF and the Scan Out of the scan chain (see

Figure 2). In the negative phase of the clock, Mux3 selects SO of  $JSD_4$ . In the positive phase of the clock, Mux3 selects JO of  $JSD_4$ .

Table 1. Contents of J-scan chain

| Time | ScanIn | NL1      | PL1      | NL2 | PL2 | NL3 | PL3 | NL4 | PL4 |
|------|--------|----------|----------|-----|-----|-----|-----|-----|-----|
| Ι    | А      | A        |          |     |     |     |     |     |     |
| II   | В      | Α        | <u>B</u> |     | A   |     |     |     |     |
| III  | С      | <u>C</u> | В        | B   | Α   | A   |     |     |     |
| IV   | D      | С        | D        | В   | C   | Α   | B   |     | A   |

Penalties of the J-scan technique are area overhead and speed degradation. These two penalties are analyzed as follows. Compared with the M-scan chain, the J-scan chain requires larger scan cells and one extra routing, the jump path. According to the numbers in TSMC 0.25 µm technology standard cell library, a J-scan DFF is 161.3  $\mu$ m<sup>2</sup>, which is 40 % larger than the M-scan DFF. The cell area of the Jscan DFF is obtained by adding up the areas of components as individual standard cells. Besides scan cell area overhead, the J-scan chain has an addition routing than the M-scan chain to shift the scan data. Although this additional routing can be long, this signal is not timing critical. By allowing long delay for this additional signal, the area overhead can be minimized. As far as the speed degradation is concerned, the J-scan DFF has an additional Mux (*i.e.* Mux2 in Figure 1) inserted between the NL and PL latches. According to the TSMC 0.25 µm library, the delay of a Mux is about 270 ps. In scan mode, this extra delay is not significant since the CUTs are usually operated at a low frequency when scan chains are shifting. In function mode, Mux2 introduces a delay from the NL to the PL. This extra delay makes the propagation delay of the J-scan DFF larger than that of the M-scan DFF. The area and the delay overhead can be reduced if the J-scan DFF is laid out as a single customized cell.

Also note that the J-scan chain requires even number of scan cells because two bits are shifted in a clock cycle. If the number of scan cells in the original chain is odd, one dummy scan cell has to be inserted at the Scan\_In of the scan chain.





## 3.2 J<sub>ON</sub>-scan DFF

The J-scan low power technique can be applied together with the Quite-Noisy scan technique to further reduce the test power. Figure 4 shows the structure of a  $J_{QN}$ -scan DFF. Compared to the J-scan DFF, the  $J_{QN}$  scan DFF has an extra reset pin. In addition, the scan output (SO) pin and the data output (DO) pin are now separated. The  $J_{QN}$  scan DFF is bigger than the J-scan DFF because the former has two additional NOR gate and one extra inverter. The cell area of the  $J_{QN}$ -scan DFF in TSMC .25 library is 201.6µm<sup>2</sup> (75% larger than an M-scan DFF with reset).



When both the reset and SE are high, the J<sub>ON</sub>-scan DFF operates in the same way as the J-scan DFF except that the output of DO pin is tied to logic zero. This is called the quite scan mode because the toggle activity in the combinational logic is suppressed. When the reset is zero and the SE is one, the J<sub>ON</sub>-scan DFF operates in the same way as the M-scan DFF in scan mode. This is called a noisy scan mode because the output of DO is not suppressed to zero. As opposed to the quiet scan mode which shifts two bits per clock, the noisy scan mode shifts only one bit per clock. Figure 5 shows the waveforms of delay fault testing using the J<sub>ON</sub>-scan technique. After reseting the circuit, the SE and reset signals are both asserted and the test pattern is quietly scanned in. During the quiet scan, the clock frequency is halved and the DO outputs are always zero. After N/2 cycles of quiet scan (N equals total number of DFFs in the chain), the reset signal is de-asserted and the pattern P<sub>1</sub> appears at DO. In the (N/2+1)<sub>th</sub> cycle, the circuit is in a noisy scan mode and pattern P<sub>2</sub> appears at DO. Then the SE is de-asserted so the scan cells are in function mode. The responses of the circuit are captured in the flip-flops. Finally, the responses are quietly scanned out. By applying the quiet-noisy scan, skew-load two-pattern tests are possible. The J<sub>QN</sub> scan reduces the test power and, at the same time, preserves the delay fault coverage. Please see [Li 04] for more details about the quiet-noisy scan technique.

# **4** Experimental Results

#### 4.1 Power Dissipation

Table 2 lists the power dissipation of ISCAS'89 benchmark circuits of four different versions. The nonscan version is obtained by mapping the benchmark circuits to the TSMC 0.25µm standard cell library. The M-scan version is generated by changing non-scan flip-flops to traditional M-scan DFFs, which are chained into one single scan chain. The M-scan DFFs are then replaced by either J-scan DFFs or J<sub>ON</sub>-scan DFFs. The frequency of the scan clock is 10MHz in the simulation. The system clock is of the same rate as the scan clock. The simulations are performed in a bitby-bit shifting way so that the circuit activity is accurately modeled. The absolute power dissipation is shown in micro-watts. On the average, the power reduction of J-scan and J<sub>ON</sub>-scan are 39% and 67% with respect to the M-scan. The power of non-scan versions is also shown for reference. Because the ISCAS benchmark circuits have no functional test patterns, the power of non-scan versions is regarded as the power in function mode. It is shown that the test power of J<sub>ON</sub>-scan is even lower than the power in function mode. The gate count (G) and the flip-flop count (FF) are shown for reference.

|         |        |       |          |          | - V                   |          |
|---------|--------|-------|----------|----------|-----------------------|----------|
| CUT     | G      | FF    | M-scan   | J-scan   | J <sub>QN</sub> -scan | Non-scan |
| S526    | 193    | 21    | 229.0    | 149.5    | 114.9                 | 89.5     |
| s1494   | 647    | 6     | 582.7    | 355.0    | 275.0                 | 220.0    |
| s5378   | 2,779  | 179   | 1,731.5  | 1,080.1  | 700.0                 | 979.9    |
| s9234   | 5,597  | 211   | 2,822.6  | 1,766.2  | 833.4                 | 1,303.3  |
| s15850  | 9,772  | 534   | 5,447.5  | 3,522.2  | 2,046.7               | 2,950.6  |
| s38417  | 22,179 | 1,636 | 19,110.8 | 11,378.2 | 5,909.8               | 7,142.7  |
| Average |        |       | 4,987.4  | 3,041.9  | 1,646.6               | 2,114.3  |

Table 2. Power Dissipation (µW)

To further analyze the power consumption, Figure 6 shows the breakdown of the power dissipation for s9234. The power dissipation is comprised of three major components. The components marked as "SC COMB" are the short circuit power dissipated within the combinational logic cells. The components marked as "SC SEQ" are the short circuit power dissipated within the sequential cells (*i.e.*, flip-flops or latches). The "SW" components represent the switching power dissipated when charging and discharging the capacitors connected to interconnect wires. The J-scan effectively reduces all three types of power because of the halved clock rate. The J<sub>QN</sub> scan further reduces the SC power in the combinational logic and the switching power.



# 4.2 Area Overhead

Table 3 shows the area overhead of the benchmark circuits. The first two columns show the area overhead of the J-scan and  $J_{QN}$ -scan versions with respect to their M-scan versions. Over all benchmark circuits, the average area overheads of the J-scan and  $J_{QN}$ -scan are 12.8 % and 23.9 %, respectively. Because the J-scan and  $J_{QN}$  scan DFFs are not available in the library, their areas are estimated by multiplying the area of M-scan DFFs by 1.4 and 1.75 respectively. The third column shows the area overhead of  $J_{QN}$ -scan with respect to the resetable M-scan versions. This is a fair comparison because the  $J_{QN}$ -scan versions supports a

reset mode. The area overhead of  $J_{QN}$ -scan versions is only 9.5% compared to the resetable M-scan versions.

|         |        |                       | S. 1                                               |
|---------|--------|-----------------------|----------------------------------------------------|
| CUT     | J-scan | J <sub>ON</sub> -scan | J <sub>QN</sub> -scan<br>(w.r.t. resetable M-scan) |
| s526    | 14.4%  | 27.0%                 | 10.7%                                              |
| s1494   | 1.7%   | 3.3%                  | 1.3%                                               |
| s5378   | 13.2%  | 24.7%                 | 9.7%                                               |
| s9234   | 8.8%   | 16.4%                 | 6.5%                                               |
| s15850  | 11.7%  | 21.9%                 | 8.7%                                               |
| s38417  | 14.3%  | 26.8%                 | 10.6%                                              |
| Average | 12.8%  | 23.9%                 | 9.5%                                               |

Table 3. Area overhead of J-scan and  $J_{ON}$ -scan

# 4.3 Comparison against Other Techniques

Table 4 shows the comparison of J-scan and J<sub>ON</sub>scan against four other low power DFT techniques. The power reduction percentage numbers are obtained by taking the average over all the cases in the original paper. J<sub>ON</sub>-scan has the highest power reduction percentage against all the other techniques. The second column shows if the DFT technique supports delay fault testing. The first four techniques do not consider delay fault testing in the paper. These techniques may be able to apply delay fault but probably need extra work and modifications. The J-scan and J<sub>ON</sub>-scan support delay fault testing without problem. The third column shows the hardware overhead. Overall speaking, the J-scan and J<sub>ON</sub>-scan are effective low power DFT techniques compared with the other previous techniques.

| Techniques            | Power     | Delay | HW        |
|-----------------------|-----------|-------|-----------|
|                       | Reduction | test? | overhead  |
| Reorder               | 18%       | NA    | 0         |
| [Dabholkar 98]        |           |       |           |
| Disable               | 23%       | NA    | disable   |
| [Sankaralingam 01]    |           |       | circuitry |
| Gated Clock           | 40%       | NA    | extra     |
| [Bonhomme 01]         |           |       | clock     |
| Insert Gate           | 12%       | NA    | 3.3%      |
| [Sinanoglu02]         |           |       |           |
| J-scan                | 39%       | Yes   | 12.8%     |
| J <sub>QN</sub> -scan | 67%       | Yes   | 9.5-23.9% |

Table 4. Comparison of low power techniques

# 5. Discussions

# 5.1 Double Edge Trigger Scan FF

Double edge trigger (DET) FFs are often used in low power circuit design [Afghahi 91]. DET FFs change their outputs at both positive and negative clock edges. As far as the test power is concerned, DET consumes more power than J-scan because the outputs of DET FFs toggle twice per clock cycle but the outputs of J-scan FFs toggle only once per clock cycle. In terms of the speed degradation in function mode, the conventional DET FF has longer setup time and propagation time than a single edge trigger FF [Llopis 96]. The J-scan FFs introduce only propagation time, not setup time, degradation in function mode compared to M-scan FFs. Furthermore, the DET designs are vulnerable to clock skew problems in function mode because of double active edges. Timing checks have to be performed carefully to avoid timing violations in both clock edges. The J-scan, on the contrary, does not require extra timing checks in function mode for the inactive clock edge. The cell area of a DET FF is approximately the same as that of a J-scan but the later requires an extra routing for the jump scan path.

# 5.2 Double Data Rate Scan

If the test time, instead of the test power, is the bottleneck of the testing, the proposed techniques can also be used to reduce the test time. The double data rate scan (or DDR scan) is achieved by testing the circuits with  $J_{\text{QN}}$ -scan chains at the same clock frequency as the M-scan. Since two scan data bits are shifted in one clock period, the DDR scan double the scan data rate and hence save 50% test time. There are, however, three issues needed to be addressed before applying the DDR scan. First, the scan data now have only half a clock cycle time to propagate so scan paths and jump paths have to be routed carefully. Second, the ATE has to support double data rate scan input and scan output test channels. The last concern is the power dissipation of DDR scan. For the ISCAS benchmark circuits, the average DDR scan power consumption is about 65.7% of that of the M-scan (same clock frequency).

# 6. Summary

The presented J-scan and  $J_{QN}$ -scan techniques effectively reduce 39% and 67% of the test power of a traditional M-scan, respectively. The advantages of the proposed low power testing technique include (1) minimal impact to existing DFT/ATPG flow, (2) no increase in test time, and (3) no change in clock tree design. The proposed technique is also applicable to delay fault testing. The penalties of the J-scan include area overhead of scan DFF, one extra routing and speed degradation.

# Acknowledgement

This research is supported by the National Science Council of Taiwan under contract number NSC93-2220-E-002-012.

# References

- [Afghahi 91] M. Afghahi and J. Yuan, "Double Edge Trigger D-Flip-Flop for High-Speed CMOS Circuits". *IEEE J.* of Solid-State Circuits, August 1991, pp.1168-1070.
- [Bonhomme 01] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitvh, "A Gated Clock Scheme for Low Power Scan Testing of Logic ICs or Embedded Cores," *Proc.* 10<sup>th</sup> Asian Test Symp., pp.253-258, 2001.
- [Chou 94] R.M. Chou, K.K. Saluja and V.D. Agrawal, "Power Constraint Scheduling of Tests", *IEEE Int. Conf. on VLSI Design*, pp. 271-274,1994
- [Dabholkar 98] Dabholkar r, V.; Chakravarty, S.; Pomeranz, I.; Reddy, S.; "Techniques for Reducing Power Dissipation During Test Application in Full Scan Circuits," *IEEE Trans. Computer-Aided Design*, Vol. 17, no 12, Dec. 1998, pp1325-1333.
- [Gerstendorfer 99] S. Gerstendorfer, H.J. Wunderlich, "Minimized Power Consumption for Scan-Based BIST," *Proc. IEEE Int'l Test Conf.*, pp77-84, 1999.
- [Girard 02] Girard, P, "Survey of Low-Power Testing of VLSI Circuits," *IEEE Design and Test of Computers*, pp. 82-92, May-June 2002.
- [Hertwig 98] Hertwig, A. and H.J. Wunderlich, "Low Power Serial Built-in Self Test," *Proc.* 3<sup>rd</sup> European Test Workshop, pp.49-53, 1998.
- [Li 04] Li, J. C.M, "A Design for Testability Technique for Low Power Delay Fault Testing," *IEICE Transactions* on Electronics, v E87-C, n 4, April, 2004, pp.621-628.
- [Llopis 96] Llopis, R.P.; Sachdev, M., "Low power, testable dual edge triggered flip-flops," Int'l Symp. on Low Power Electronics and Design, pp.341–345, 1996.
- [Sankaralingam 01] R. Sankaralingam, B. Pouya and N. A. Touba, "Reducing Power Dissipation During Test Using Scan Chain Disable," *Proc. IEEE 19<sup>th</sup> VLSI Test Symp.*, pp. 319-324, 2001.
- [Sinanoglu 02] P. Sinanoglu, I. Bayraktaroglu, and A. Orailoglu, "Test Power Reduction through Minimization of scan Chain Transitions," *Proc. IEEE* 20<sup>th</sup> VLSI Test Symp., 2002.
- [Wang 97] Wang, S. and S.K. Gupta, "DS-LFSR: A New BIST TPG for Low Heat Dissipation," *Proc. Int'l Test Conf.*, pp. 848-857, 1997.
- [Whetsel 00] Whetsel, Lee" Adapting scan architectures for low power operation" *IEEE International Test Conference*, 2000, p 863-872.
- [Zorian 93] [1] Zorian, Y., "A Distributed BIST Control Scheme for Complex VLSI Design," Proc. 11th IEEE VLSI Test Symp., pp. 4-9, 1993.

