# Multiple Channel Programmable Timing Generators With Single Cyclic Delay Line

Ting-Yuan Wang, Member, IEEE, Shih-Min Lin, and Hen-Wai Tsao, Member, IEEE

ref clk

Abstract-In this paper, we present the design and measurement results of multiple channel programmable timing generators (TGs) using single cyclic delay line for high-speed automatic test equipment (ATE) with 37.5 ps resolution and 5 ms programmable delay range. There are three TGs, and each one consists of a 19-bit 360-MHz count-up counter, a 19-bit cycle comparator, a zero cycle detector, a control word splitter, and an output selector with an 8X-interpolator. A 32-stage cyclic delay line is constructed via a pulsewidth self-controlled delay cell (PWSCDC). The proposed timing generator uses the TSMC 0.35  $\mu$ m 1P4M process with a die size of 2.33 mm  $\times$  2.17 mm. The dynamic nonlinearity (DNL) is less than  $\pm 0.6$  LSB (37.5 ps). The integral nonlinearity (INL) is between -1 LSB and 7 LSB before calibration, and is between  $\pm 0.4$  LSB after root-mean-square (rms) value calibration. The multichannel phase mismatch (MCPM) is 19 ps (rms), and jitter is 13.7 ps (rms).

Index Terms-Cyclic delay line, multichannel-phase mismatch (MCPM), pulsewidth self-controlled delay cell (PWSCDC), timing generator (TG).

#### I. INTRODUCTION

S continual improvements are made on CMOS technology, there is increased demand for digital testing systems, or automatic test equipment (ATEs), with higher timing edge accuracy, higher data rates, higher levels of integration, and multitrigger capability in a single test cycle. To implement multiple testing channels in an ATE, the integrated designs of timing generators (TGs) and multichannel-phase mismatch (MCPM) compensation are equally important.

Several existing architectures have been used to implement a programmable timing generator. For example, the AD9501 (Analog Device Corp.) achieves 10 ps delay resolution with a ramp-DAC architecture [1]. Others use a counter as the coarse delay generator and voltage controlled delay lines or different path loadings as its fine delay generator with a resolution of tens of picoseconds [3]. There are three common problems with the existing architectures. First, owing to the use of individual delay circuits for each TG, skew compensation among multiple channels is difficult. Second, due to the demand of the large number of testing pin counts, multiple TG chips are needed in today's ATEs. Each chip has its own delay variation when the chips are distributed over a large area on the system board; cross-chip compensation for MCPM is even more difficult. Third, owing

Manuscript received June 15, 2003; revised April 2, 2004.

The authors are with the Department of Electrical Engineering and Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan, 106, R.O.C. (e-mail: dbt@ms12.hinet.net, tsaohw@cc.ee.ntu.edu.tw).

Digital Object Identifier 10.1109/TIM.2004.830592

DLL **Cyclic Delay Line** Shorthort cll t0 clk \*\*\*\* Pulse Generator Buffer Clear nerato dummy\_TG ctrl\_0 Control clr Codec ctrl clr **Output Selector 0(MUX)** Codec Output Selector 1(MUX) ctrl Codec **Output Selector 2(MUX)** 

Fig. 1. System block diagram of the TGs.

to the nature of the architecture, ATEs are difficult to implement in a single chip.

We propose a programmable multiple channel timing generator circuit using a single cyclic delay line to meet these demands. As shown in Fig. 1, the TG circuit has seven major component blocks: a delay locked loop (DLL), cyclic delay line, buffer stage, short-pulse generator (SPG), control codec, and output selector with an 8X-interpolator. The designed circuit uses the test cycle clock (t0\_clk) to trigger an SPG to generate a pulse signal that propagates in the delay ring cyclically as the timing vernier. Each TG selects the correctly delayed vernier signal copied from the delay line by the output selector according to its control signal, and an interpolator generates the fine delay with 37.5 ps resolution. Since only one cyclic delay line is used in this architecture, a highly integrated multiple-channel ATE with low MCPM is more easily achievable.

In this paper, we describe the methodology of using the pulsewidth self-controlled delay cell (PWSCDC) to construct a cyclic delay line that operates like an infinite delay line with a 37.5-ps resolution [2] to solve the multiple channels timing problems. The circuit operates over a wide clock range between 100 MHz and 400 Hz (10 ns-2.5 ms). As each TG in a single chip is triggered by the same vernier signal in our proposed architecture, the dynamic nonlinearity (DNL) and integral nonlinearity (INL) become periodically predictable and can be calibrated easily.

The remainder of this paper is organized as follows. Section II describes the proposed TG architecture, design, and design constraints. Section III presents the measurement results. Section IV concludes the paper.

0018-9456/04\$20.00 © 2004 IEEE

ver

VDD

OUT

IN2

M3

M6

M1

IN

OUT

OUT

SW1

SW2

SW3

SW3

INV

INV3

(a)

clr

M2

SW2

 $I_{M4}$ 

OUT

To Buffer

t<sub>SW2</sub> SW

SW



Fig. 2. The relationship between six timing verniers.



Fig. 3. Using cyclic delay line as infinite delay line.

# II. DESIGN OF THE TG

In a typical ATE, the timing generator is used to generate the time trigger at the programmed time point with specific definition function for test-wave format generator uses. The wave format generator always generates the required waveform, nonreturn-zero (NRZ) for example, which is triggered by the starting trigger and stopped by the end trigger, and so on, as defined by the format generator. These functional defined time triggers are denoted as d0, d1, and d2 individually in Fig. 2 [3]. Fig. 2 shows that a traditional TG generates a timing vernier, say TG0 for example, by delaying t0\_clk with a time period of  $t_{d0}$  individually. In our design, d1 is generated by delaying d0 with a time period of  $t_{12}$ . We can implement this architecture using single trigger source and a "recirculating" cyclic delay line that acts like an infinite delay chain as shown in Fig. 3. The delayed signal can reuse the delay cells in the cyclic delay line in different time slices until the system stop it. Therefore, the trigger signal propagates in the cyclic delay line over and over as it does in an infinite delay line and each TG can generate the specific vernier pulse at any time. Since TGs generate vernier pulses from a single signal source, the channel skew can be reduced easily.

The cyclic delay line works as follows: Assume the delay ring is constructed with a 32-stage delay cell; the signal delayed with a time period of  $t_{101}$  is equivalent to the signal delayed with a time period of  $t_{d3,5}$  at the fifth delay output of the third cycle on the time axis in Fig. 3.

$$t_N \left[\frac{N}{n}\right] \cdot t_{t,\text{cycle}} + t_{d(N \mod n)} = t_{d[N/n],(N \mod n)} \quad (1)$$

where  $t_d$  is propagation time of a delay cell;  $t_N$  is total delay time at the Nth delay output in the infinite delay line;  $t_{d,cycle}$  is



 $I_{SW2}$ 

Fig. 4. (a) Schematic of the PWSCDC. (b) Time chart of the PWSCDC.

the total propagation time of the dth delay output of the specific cycle in the cyclic delay line; n is the number of delay stages of the cyclic delay line.

Oscillation may occur in the cyclic delay line at its natural frequency as a voltage controlled oscillator (VCO), and the process mismatch between N-MOS and P-MOS will make the pulsewidth broaden or narrow after propagating in the delay line for a long time if we construct the voltage control delay line (VCDL) as a delay ring. We propose a current-starved pulsewidth self-controlled delay cell (PWSCDC) to solve these problems. It operates as well as a current starved delay cell does when the signal is triggered by the positive input edge and cleared by the feedback signal, which is independent of the input signal.

After the cyclic delay line has been triggered and the signal is propagating in the delay ring cyclically, each TG generates control signal via external control words and control circuits for the output selector to choose the corresponding delayed signal as the time vernier.

# A. PWSCDC

Fig. 4(a) shows the schematic of the PWSCDC. The input signal and the cyclic feedback signal feeds into the first delay cell can trigger the delay cell, as the general current-starved delay cell does, at two symmetric inputs, say IN1 and IN2 individually. The pulsewidth control mechanism of the PWSCDC is realized by a feedback loop. The inverter chain is constructed by tree components: INV2, INV3, and INV4. Each time, the rising edge of the output pulse is triggered by the input signal

and the falling edge is triggered by the transistor M5, pulling up the node OUT to VDD, which is turned on by the node SW3.

Unfortunately, IN1 is triggered by the previous delay cell, and it is impossible to turn off transistor M1 before transistor M5 is turned on. Therefore, the charge current from transistor M5 leaks through M1 at the same time. OUT\_ will remain at a meta-stable state, and there will be a large dc power loss. Consequently, the propagation delay will not be well defined. Contrary to the traditional current starved delay cell, an additional switch transistor M3 is necessary. This transistor is turned off by the feedback loop therefore node SW2 before M5 is turned on. It is used to break the leakage current path, which is constructed of transistors M5, M1, M3, and M4. After OUT\_ is charged to VDD, the switch transistor is turned on again by the feedback loop, and the delay cell is ready for retrigger.

Although an insertion switch transistor is necessary to implement a self-pulsewidth control mechanism, the on-off switching noise of the switch transistor will be coupled into the control voltage VCN by  $C_{gd,M4}$  (the gate-drain overlap capacitor of a MOS transistor)[4]. Therefore, we use transistor M7 as a MOS capacitor for reducing the ripple noise of control voltage.

Fig. 4(b) shows the timing characteristics of the PWSCDC. The propagation delay  $t_d$  is decided based on the time needed for M1 to be turned on by the current of transistor M4 ( $I_{M4}$ ). The four factors  $C_{gs,INV1}$  (the gate-source overlap capacitor at the input node of INV1), the parasitic layout capacitor ( $C_{\text{layout}}$ , about 20 fF), channel turn-on charge of M1, and  $I_{M4}$ , determine the  $t_{\text{PHL,OUT}}$  as shown in (2) [4]: [see (2) at the bottom of the page] where

| $I_{M4}$          | 640 $\mu$ A;                                         |
|-------------------|------------------------------------------------------|
| $t_{\rm PHL,OUT}$ | 150 ps;                                              |
| $t_d$             | $t_{\rm PHL,OUT_{-}} + t_{d,INV1} = 300 \text{ ps};$ |
| N                 | dopant density of CMOS;                              |
| H                 | depth of CMOS diffusion region under the gate;       |
| $V_{\rm OUT}$     | 2.64 V(10%–90% of 3.3 V step).                       |
|                   |                                                      |

When node SW3 is discharged, transistor M5 will pull up OUT\_gradually. Thus,  $t_{PLH,OUT}$  (the rising time of the waveform at the node OUT\_) and the pulsewidth  $t_{W,OUT}$  can be expressed as (3) and (4) [4].

$$t_{PLH,OUT} = \frac{(C_{\text{layout}} + \frac{1}{2}CoxWL_{M1} + \frac{1}{2}CoxWL_{M2}) \times \Delta V}{I_{M5}}$$
$$= \frac{(C_{\text{layout}} + CoxWL_{M1}) \times \Delta V}{I_{M5}} = 750 \text{ ps}$$
(3)

 $t_{W,OUT}$ 

$$= t_{d,INV2} + t_{d,INV3} + t_{d,INV4} + t_{PLH,OUT} + t_{PHL,OUT} + t_{d,INV1}$$

 $t_{W,OUT}$ 

$$= 50 \text{ ps} + 270 \text{ ps} + 80 \text{ ps} + 100 \text{ ps} + 750 \text{ ps} + 100 \text{ ps}$$
$$= 1350 \text{ ps}. \tag{4}$$

To ensure correct function of the pulsewidth control mechanism, switch transistor M3 must be totally turned off before transistor M5 is turned on by node SW3. Therefore, the time required to turn off M3 must be shorter than  $t_{d,INV3} + t_{d,INV4}$ when SW1 switches to low, as shown in (5)

$$t_{d,M3} = t_{d,AND} + t_{PHL,SW2}$$
  
=  $t_{d,AND} + \frac{\frac{2}{3}CoxWL_{M3} \times \Delta V}{I_{AND}}$   
<  $t_{d,INV3} + t_{d,INV4}.$  (5)

Because the output of each delay cell is the input of the next delay cell, the output voltage must be lower than 0.3 V (10% of 3.3 V) when M3 of the next stage is turned on again. Otherwise, the next stage will be retriggered, causing the cyclic delay ring to start to oscillate. Therefore, the timing constraints of pulsewidth  $t_{W,OUT}$  between two adjacent delay cells can be expressed as in (6) and (7)

$$\Delta t = t_{PHL,OUT\_} + t_{d,INV1} + t_{PHL,SW1} + t_{d,AND} + t_{W,SW1} + 10\% \times t_{PLH,SW1} > t_{W,OUT}$$
(6)  
$$t_{W,OUT} < t_{PHL,OUT\_} + t_{d,INV1} + t_{d,INV2} + t_{d,INV3} + t_{d,INV4} + t_{PLH,OUT\_} + t_{PLH,OUT} + t_{PHL,SW1} + t_{PHL,SW2}$$
(7)

where  $\Delta t$  is the propagation time delay for the input trigger to reach the 10% voltage level of VDD at SW2.

# B. SPG

The test cycle clock starts the delay ring. However, the pulsewidth of a test cycle clock (5 ns–12.5 ms, 50% duty cycle) is too wide to be used as an input trigger for the delay ring, which will cause the delay ring to start to oscillate. It is unwise to tune the duty cycle of the test cycle clock over a wide operation frequency range. Instead, a SPG is employed to solve this problem by generating short pulses with a fixed

$$t_{\text{PHL,OUT}} = \frac{C_{\text{total}} \times \Delta V_{OUT}}{I_{M4}} + \frac{Q_{\text{channel},M1}}{I_{M4}}$$
$$= \frac{\left(\frac{2}{3}C_{ox}WL_{M1} + \frac{1}{2}C_{ox}WL_{M2} + \frac{1}{2}C_{ox}WL_{M5} + \frac{1}{2}C_{ox}WL_{M6} + C_{gs,INV1} + C_{\text{layout}}\right) \times \Delta V_{\text{OUT}}}{I_{M4}}$$
$$+ \frac{NHWL_{M1} \times Q}{I_{M4}} \tag{2}$$



Fig. 5. (a) Schematic of the SPG. (b) Time chart of the SPG.

width  $(t_{W,SPG})$ . Fig. 5(a) and (b) shows the schematic and the time chart of the SPG.

After SPG has been triggered by t0\_clk, M4 is switched off by t0\_clk\_ gradually before the feedback signal turns on M2 by FB, which consists of INV2, INV3, INV4, and INV5. Although OUT is triggered by t0\_clk, the output pulsewidth ( $t_{W,SPG}$ ) in SPG is controlled by the feedback delay loop. It is independent of the duty cycle and operation frequency of t0\_clk. The design constraint of SPG is given in (8) as

$$t_{W,SPG} = t_{d,INV3} + t_{d,INV4} + t_{d,INV5}$$
$$+ t_{PLH,OUT\_} - t_{d,INV2}$$
$$> t_{W,OUT}$$
$$t_{d,INV1} < t_{PHL,OUT\_}.$$
(8)

# C. Cyclic Delay Line

The proposed cyclic delay line is very different from a ring VCO. A TG with a wide operation frequency range and on-the-fly frequency switching capability can only be realized by a nonoscillating cyclic delay line which can't be approached by the traditional ring VCO.

As we operate the PWSCDC's as a cyclic delay line, harmonic oscillation may occur if the total propagation delay of the delay line is too short. In a cyclic delay line with N stages, for example, there are two ways for that the cyclic delay line to oscillate.

One is that the switch transistor M3 in Fig. 4(a) is turned off when the first stage is triggered and must be turned on again before the feedback signal triggers the input IN2 of the first delay cell with a time period  $N \times t_d$ . Otherwise, the first PWSCDC cannot operate correctly because M3 has not been completely turned on yet.

Another is that the transistor M5 of the 1st delay cell must be turned off before the next feedback trigger arrives in a time of  $N \times t_d$ . Otherwise, the PWSCDC will stay in a meta-stable state since M6, M3, M4, and M2 are on simultaneously. So the stage number of the delay ring must satisfy

$$t_d + t_{PHL,INV2} + t_{PLH,INV3} + t_{PLH,OUT\_} + t_{PLH,INV1}$$
$$-t_{PHL,INV2} + t_{d,AND} < N \times t_d. \quad (9)$$

Another constraint is that the total propagation delay must be longer than the input pulsewidth of the PWSCDC,  $t_{W,OUT}$ , or the feedback trigger and input trigger will activate the delay cell at the same time. Then the rising edge of the cyclic feedback edge will overlap with input trigger and vanish in the delay line.

# D. Control Codec

The control codec consists of a cycle counter, a splitter, a comparator, a zero cycle detector, a control generator, and two index-flip-flops (ID1 and ID2). Once triggered, the input propagates in the delay line cyclically. Each buffer stage continuously copies the delayed signal. Only the delayed signal at the particular cycle and stage output that are chosen by the control codec trigger the output selector circuit. In our design, we use a 32-stage cyclic delay line with 300 ps resolution.

A 19-bit, 360-MHz up-counter [5] is designed for each control codec as a cycle counter to count how many cycles the signal has propagated. It is triggered by the dummy output of the first stage buffer.

Considering the counter settling time and comparator propagation delay, the control signal is generated 3.6 ns after the cycle counter has been triggered. In other words, it is impossible for the codec controlling the output selector to choose the third delayed signal as the output in time if the counter is triggered by the first delayed signal with a 600-ps time interval. Therefore, we partition the delay line into three sections with different output control mechanisms, as shown in Fig. 6.

S1: starts from the first stage and ends at the eleventh stage; S2: starts from the twelfth stage and ends at the twentythird stage;

S3: starts from the twenty-fourth stage and ends at the thirty-second stage.

All the delayed signals in section S3 are generated at least 7.2 ns after time the counter is toggled to start counting. As a result, there are adequate time margins for the control codec to choose an output trigger that belongs to section S3.

For the same reason, if the signal belongs to section S1 at the Nth cycle, it is impossible for the control codec to generate a control word with the cycle counter being toggled at the same cycle. To solve this issue, cycle index flip-flops (ID1 and ID2) are added to the twenty-third and thirty-first buffers' outputs. The comparator output enables only at the (N - 1)th cycle when ID1 and ID2 are toggled by the twenty-third and



Fig. 6. The block diagram of the control codec.

thirty-first stage-delayed signals, respectively. Then the control codec chooses the delayed signal belonging to section S1 at the next cycle with ID1. The control codec has about a 1.5-ns timing margin with which to operate. In the same manner, there is about a 3.6-ns timing margin to generate control words for the delayed signal that belongs to section S2 with ID2. The partitioning of the delay line into three sections is done for two other reasons. First, if the delay line is divided into two sections with ID1 at the twenty-third stage only, the undesirable output may occur at the twenty-third stage. Assume the twenty-second delayed signal is chosen at the Nth cycle. If the twenty-second delayed signal has not propagated through the delay cell, and the twenty-third delayed signal has been generated, triggering ID1 to generate an output signal at the (N-1)th cycle, the undesirable output of the twenty-second stage will generate at (N-1) cycles at an unpredictable time vernier. No matter where ID1 is, this problem exists at the boundary if the delay line is partitioned only into only two sections.

There is no barrier to taking advantage of counter and indexflip-flop in fetching delay signals from the PWSCDC that belongs to sections S1 and S2 in the *N*th cycle, but the first cycle is not included. The cycle counter, ID1, and ID2 are inactive at the first cycle. There are two ways to generate delay signals at the first cycle: First, the signal belonging to section S3 can be chosen by the output selector. Second, the delay signal generated in sections S1 and S2 can use a zero cycle detector to generate a control signal without considering the results of the comparator. The control signal is enabled in the beginning when both ID1 and ID2 are low. The control signal to choose the output of section S1 will be disabled when ID1 changes to high in the same manner as section S2 does.

The splitter is used to split the input control word into two different parts. One is the data that the comparator compares with the cycle counter outputs. The other one is the control word to control the output selector. After the splitter fetches the input control word, it generates two data (N and N - 1) at the same time. Then the splitting MUX will choose the proper output



Fig. 7. Architecture of the output selector.

from N and N - 1, determined by the section in which the delayed signal is generated. For example, if the delayed signal is from section S3, then data N is selected; otherwise N - 1 is selected.

The zero-cycle detector is used to detect whether the signal is generated at the first cycle

#### E. Coarse Output Selector

In the output stage, two 16-to-1 multiplexers are adopted to collect even and odd delay stage outputs, respectively, to generate two adjacent delay outputs, as shown in Fig. 7. The permute block sequences two timing signals (*even* and *odd*) from two 16-to-1 multiplexer outputs and then generates two adjacent timing outputs (*out1* and *out2*) for interpolator use. When the control word S0 = 0, the outputs, OUT1 and OUT2, the even and odd outputs, respectively, comes from the two multiplexers. When S0 = 1, the permute block switches the outputs, OUT1 and OUT2, odd and even outputs, respectively.



Fig. 8. (a) Architecture of the 8X-interpolator. (b) Time chart of the 8X-interpolator. (c) Schematic of the tuneable inverter.

(c)

## F. Fine Output Selector (8X-Interpolator)

We propose a two-stage  $8 \times$  phase interpolator composed of one  $2 \times$  and one  $4 \times$  phase interpolator, as shown in Fig. 8. In the first stage, a conventional architecture is used to achieve a  $2 \times$  timing resolution improvement [6]. The second stage also uses the conventional architecture, except a tunable inverter is used to achieve another  $4 \times$  timing resolution improvement.

The tunable inverter can be regarded as a current controlled inverter, as shown in Fig. 8(c). The one-hot control signal (c2, c1 and c0) will turn on a current source. Therefore, the tunable inverter can generate three different delays. In this way, the interpolated timing signal can lie in one of three points separated equally by two adjacent timing signals. Fig. 8(b) shows the timing chart of the interpolating signals (*out21*, *out22*, *out23*, and *out24*). The interpolating timing signals (*out22* and *out24*) can be tuned to lie in three positions.

### **III. MEASUREMENT RESULTS**

The proposed prototype circuits containing one cyclic delay line circuit and three TGs have been fabricated with the TSMC  $0.35 \ \mu\text{m}$  1P4M process with a die size of 2.33 mm × 2.17 mm using full custom design. The die area of 1.057 mm × 0.31 mm is used for the cyclic delay, and the die area of 0.9 mm × 0.18 mm is for the two TG circuits. The programmable delay time resolution is 37.5 ps and is stabilized by a DLL with an operation range (in terms of test cycle clock) from 100 MHz to 400 Hz. Fig. 9(a) and (b) is the floor plan and die photo, respectively.

We applied 100-, 50-, and 10-MHz clock signals as test cycle clocks. We increased the control word in 1 LSB steps for each measurement. The DNL of TGs are within  $\pm 0.6$  LSB, which is shown in Fig. 10.

The INL is shown in Fig. 11. Under a 50-MHz testing clock, INL lies between -1 LSB and 4 LSB. Under a 10-MHz testing clock, the INL lies between 0 and 7 LSB with predictable periodic variations. For a 10-MHz system clock application, a programmable delay with 260 ps (7 LSB) skew is short enough to be neglected when compared with the testing rate. The INL with 10 MHz or lower testing rates is shown in Fig. 11. This periodic phenomenon occurs because all the signals propagate cyclically. It is easy to use a minimum mean-square error line to fit the optimum programmable time vernier. Then it will be easy to calibrate out the errors by shifting the control word calculated by the software as the heavy line shown in Fig. 11. The DNL and INL can be reduced to within  $\pm 0.4$  LSB after calibration.

The MCPM is less than 80 ps (P-P) and 19 ps (rms) with a 50-MHz testing clock, as shown in Fig. 12. All the MCPM is monotonic since the delayed signal is generated by the same delay line, symmetric control codec block, and output selector. The main cause of the MCPM stems from the length of the connection between the output selector of the TGs and the cyclic delay line on a die.

The jitter of the TG output is less than 13.7 ps (rms) with 100 MHz testing clock rate as shown in Fig. 13.



(a)



Fig. 9. (a) Foor plan of the TGs. (b) Die photo of TGs.

## IV. CONCLUSION

The proposed TG with a cyclic delay line allows for a highly integrated design, predictable DNL compensation, and MCPM cancellation in an ATE. In this paper, we present a design with a 32-stage cyclic delay line using a PWSCDC and an 8X-interpolator to achieve a 37.5 ps resolution. The DNL is less than  $\pm 0.6$  LSB, and the INL is less than  $\pm 0.4$  LSB, after rms calibration. The MCPM between TGs is kept under 20 ps with optimal control word shift calibration. The cyclic delay line structure not only allows multiple TGs to be densely integrated on a single chip to provide several testing channels but also simplifies application designs that need a long equivalent delay line or



Fig. 10. The DNL with 10-MHz testing frequency.



Fig. 11. The INL with 10-MHz operation frequency.



Fig. 12. The MCPM with 50-MHz operation frequency.

a predictable and calibrated skew such as in TDC, clock distribution and de-skew circuits, etc.

As shown in the measurement results, the channel phase mismatch is small and can be easily calibrated without external



Fig. 13. Jitter estimation with 100-MHz operation frequency.

| Operating voltage                  | 3.3v±0.3v       |
|------------------------------------|-----------------|
| Process                            | TSMC 0.35u 1p4m |
| Test cycle frequency range         | 400hz~100Mhz    |
| Coarse timing resolution           | <300ps          |
| Fine timing resolution             | <37.5ps         |
| Min. programmable delay            | <600ps          |
| Intrinsic delay                    | < 4.5ns         |
| (DNL, INL)                         | < ±0.6LSB       |
| Power consumption                  | 237.6mW         |
|                                    | @100Mhz         |
| Jitter                             | 13.69ps(RMS)    |
| TG output (100Mhz,the first cycle) | 80ps(pk-pk)     |

TABLE I THE SPECIFICATION OF THE TG

compensation circuits. The proposed design can be easily enhanced to provide an improved resolution of tens of picoseconds using some extra interpolator circuits.

#### REFERENCES

- A. Hill, "Using Digitally Programmable Delay Generators," Analog Device Application Note an-260.
- [2] P. Chen and S.-I. Liu, "A high resolution digital CMOS time-to-Digital converter based on nested delay locked loops," in *Proc. IEEE Int. Symp. Circuits System*, 1999, pp. II537–II540.
- [3] "SC212 System Description," Credence System Corp., 1991.
- [4] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, *Analysis and Design of Analog Integrated Circuits*, 4th ed. New York: Wiley, 2001.
- [5] B. Hoppe, C. Kroh, H. Meuth, and M. Stohr, "A 440 MHz 16 bit counter in CMOS standard cells," in 11th Annual IEEE Int. Application Specific Integrated Circuits (ASIC) Conf., 1998, pp. 241–244.
- [6] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid-State Circuits*, vol. 34, pp. 632–644, May 1999.
- [7] J.-M. Wang, S.-C. Fang, and W.-S. Feng, "New efficient designs for XOR and XNOR functions on the transistor level," *IEEE J. Solid State Circuits*, vol. 29, pp. 780–786, July 1994.



**Ting-Yuan Wang** (M'03) received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1997 and 1999, respectively.

Since 1999, he has been a Ph.D. degree candidate with the Department of Electrical Engineering, National Taiwan University. His main research interests are high-speed digital circuit, timing generation, microprocessor architecture, and electronic instrumentation.



**Hen-Wai Tsao** (M'90) received the B.S., M.S., and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1971, 1975, and 1985, respectively.

Since 1978, he has been with the Department of Electrical Engineering, National Taiwan University, where he is currently a Professor. His main research interests are optical fiber communication system, communication electronics, and electronic instrumentation.



Shih-Min Lin was born in Taichung, Taiwan, R.O.C., in 1978. He received the B.S. and M.S. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, in 2000 and 2002, respectively.

Since 2002, he has been a circuit designer with VIA Technologies, Inc., Taiwan. His research interests are in high-speed IO circuit design.