# A Wide-Range and Fast-Locking All-Digital Cycle-Controlled Delay-Locked Loop

Hsiang-Hui Chang, Student Member, IEEE, and Shen-Iuan Liu, Senior Member, IEEE

Abstract-An all-digital cycle-controlled delay-locked loop (DLL) is presented to achieve wide range operation, fast lock and process immunity. Utilizing the cycle-controlled delay unit, the proposed DLL reuses the delay units to enlarge the operating frequency range rather than cascade a huge number of delay units. Adopting binary search scheme, the two-step successive-approximation-register (SAR) controller ensures the proposed DLL to lock the input clock within 32 clock cycles regardless of input frequencies. The DLL operates in open-loop fashion once lock occurs in order to achieve low jitter operation with small area and low power dissipation. Since the DLL will not track temperature or supply variations once it is in lock, it is best suited for burst mode operation. Given a supplied reference input with 50% duty cycle, the DLL generates an output clock with the duty cycle of nearly 50% over the entire operating frequency range. Fabricated in a 0.18- $\mu$ m CMOS one-poly six-metal (1P6M) technology, the experimental prototype exhibits a wide locking range from 2 to 700 MHz while consuming a maximum power of 23 mW. When the operating frequency is 700 MHz, the measured peak-to-peak jitter and rms jitter is 17.6 ps and 2.0 ps, respectively.

*Index Terms*—All digital, cycle-controlled delay unit (CCDU), delay-locked loop (DLL), phase-locked loop (PLL), successive-approximation-register (SAR).

### I. INTRODUCTION

WITH advances in deep-submicron technologies, the demand for high-performance and short time-to-market integrated circuits has dramatically grown recently. Scalable microprocessor and graphic-processor systems could cost-effectively port to advanced technologies to increase the clocking rate, lower the power dissipations, and reduce design turn-around time. The synchronization among IC modules will become an important issue. Thus, considerable efforts have been focused on high-performance digital interface circuits to communicate with these digital systems.

Phase-locked loops (PLLs) [1]–[3] and delay-locked loops (DLLs) [4]–[8] have been widely used in many high-speed microprocessors and memories. If the frequency synthesis is not needed, the DLLs are preferred for their unconditional stability, faster locking time, and better jitter performance [9]. The traditional analog DLL [4], [6]–[8] generally has better jitter and skew performances, but it is process-dependent and needs a long design time. Conversely, the digital DLL can be migrated over different processes. Moreover, with benefits from scaling

The authors are with the Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R.O.C. (e-mail: lsi@cc.ee.ntu.edu.tw).

Digital Object Identifier 10.1109/JSSC.2005.843596

# CMOS technologies, the digital DLL has a lower supply voltage and the potential for good power management [10]–[14].

To facilitate the digital DLL [10]-[14] for various clock-generation circuits or phase-alignment circuits, the operating frequency range should be as large as possible to meet different product's specifications. Furthermore, the wide-range DLL should tolerate wide variations for clock frequency, process, and temperature. The highest operating frequency of a DLL is limited by the bandwidth of a single delay unit while the lowest operating frequency is restricted by the length of delay line. In order to meet the maximum and the minimum speed requirement at the same time, the conventional digital DLL [10]-[14] demands a delay line composed of high-bandwidth delay units. However, to realize such a DLL by a reasonable chip area, the tradeoff between bandwidth of a single delay unit and length of the delay line will substantially limit the ratio of maximum to minimum operating frequency. In this paper, the proposed DLL incorporating with the cycle-controlled delay unit (CCDU) and the two-step successive-approximation-register (SAR) controller achieves both wide range and fast lock operation. This DLL exhibits a wide locking range from 2 to 700 MHz without the limitations mentioned above and achieves an acceptable jitter performance as compared to the conventional analog DLL [8]. The DLL locks the input clock within 32 clock cycles regardless of input frequencies. The DLL operates in open-loop fashion once lock occurs in order to achieve low jitter operation with small area and low power dissipation. Since the DLL will not track temperature or supply variations once it is in lock, it is best suited for burst mode operation. The simulated sensitivity to temperature and supply variations are 1% change in delay value per 1% change in supply voltage  $(V_{dd})$  and 0.16% change in delay value per 1°C change in temperature as  $V_{dd}$  changes from 90% to 110% and temperature from -40 °C to 125 °C, respectively.

Compared to the conventional digital DLL, the overall hardware complexity can be reduced at a given operating frequency range and timing resolution. This paper is organized as follows. Comparisons between the conventional and proposed digital DLL are discussed in Section II. The architecture of the proposed DLL is addressed in Section III. Circuit implementation is described in Section IV. Measurement results are given in Section V. Finally, the conclusions are given in Section VI.

# II. COMPARISONS BETWEEN THE CONVENTIONAL AND PROPOSED DIGITAL DLL

Fig. 1 shows a conventional digital DLL. It consists of three major blocks: a phase detector, a controller, and a digitally controlled delay unit, called hierarchical delay unit (HDU). Gener-

Manuscript received April 1, 2004; revised October 9, 2004. This work was supported in part by MediaTek Inc. and the MediaTek Fellowship.



Fig. 1. Block diagram of the conventional digital DLL.

ally, the HDU can be divided into two parts: one is the coarse delay unit (CDU) and the other is the fine delay unit (FDU). The number of the control bits for the CDU and the FDU is N-bits and M-bits, respectively. Since the FDU usually interpolates 1 LSB (= td) delay time of the CDU to generate a finer delay, the minimum timing resolution (= $\Delta t$ ) of the FDU could be expressed as

$$\Delta t = \frac{td}{2^M}.\tag{1}$$

When the DLL locks and the delay time equals to one clock period, the maximum operating frequency range of the conventional digital DLL could be expressed as

$$T_{l\_c} \le T_{clk} \le (2^N 2^M - 1) * \Delta t + T_{l\_c}$$
 (2)

where  $T_{clk}$  is the period of the input clock and  $T_{l\_c}$  is the intrinsic delay of the conventional digital DLL when all control bits are low. The intrinsic delay is mainly contributed by multiplexers, phase selector, and phase interpolators. As indicated from (1) and (2), the operating frequency range trades off the hardware complexity and the timing resolution. One may either increase N or td to extend the operating frequency range. However, the former will increase the hardware complexity and the latter will decrease the timing resolution.

The simplified architecture of the proposed DLL is shown in Fig. 2. It is composed of the CCDU, the two-step SAR controller, and the delay line with a CDU and a FDU. The number of the control bits for the CDU and the FDU is L-bits and M-bits, respectively. The delay units used in the CCDU and the CDU are similar except that the gate length is slightly different. If the initial phase error between Ref and  $V_{cdl}$  is significantly large, the input clock can reuse delay units in the CCDU to increase the delay time. The maximum cycles that the input clock can circulate in the CCDU are  $2^C - 1$ , where C is the number of control bits of the programmable counter used in the two-step SAR controller. After the optimum cycles of the CCDU are determined, the phase error between Ref and  $V_{cdl}$  could be further reduced within one cycle delay of the CCDU. If the total variable delay of the HDU is larger than one cycle delay of the CCDU, the residue phase error could be compensated within 1 LSB delay time of the FDU. Thus, the proposed DLL could



Fig. 2. Simplified architecture of the proposed DLL.

achieve the same timing resolution,  $\Delta t$ , as the conventional digital DLL. Similarly, the maximum operating frequency range of the proposed one could be given as

$$T_{l_{-p}} \le T_{clk} \le 2^C * (2^L 2^M - 1) * \Delta t + T_{l_{-p}}$$
 (3)

where  $T_{l_p}$  is the intrinsic delay of the proposed DLL. If  $T_{l_c}$  and  $T_{l_p}$  are slightly different and relatively small, the ratio of the maximum operating frequency range of the proposed DLL to that of the conventional digital one could be approximated as

$$\frac{T_{l\_c}}{T_{l\_p}} * \frac{2^C * (2^L 2^M - 1) * \Delta t + T_{l\_p}}{(2^N 2^M - 1) * \Delta t + T_{l\_c}} \approx \frac{2^C 2^L}{2^N}.$$
 (4)

From (4), the conventional digital DLL should cascade  $2^{N} (= 2^{C}2^{L})$  delay units in the CDU in order to accomplish the same operating frequency range of the proposed one at a given timing resolution. For example, when C = 8 and L = 3, the conventional digital DLL requires 2048 delay units in the CDU as shown in Fig. 1. However, the total number of the delay units in the proposed DLL is only 32, which will be explained in Section IV. Since the hardware complexity of the controller is proportional to the operating frequency range, the overall hardware complexity of the proposed DLL could be significantly reduced compared to the conventional digital one at a given operating frequency range and timing resolution. From another view of (4), if L = N, the proposed DLL could improve the operating frequency range by a factor of  $2^{C}$  without decreasing the timing resolution.

Since conventional digital DLLs utilize sequential search scheme to find the optimal value of control bits, the lock time is usually longer than 100 clock cycles [10], [13], [14]. Furthermore, the sequential search scheme is not suitable for the wide-range DLL because the initial delay time between input and output clock will be significantly larger or smaller than the period of the input clock in the worst case, resulting in a long lock time. Conversely, the proposed DLL adopting binary search scheme locks the input clock within 32 clock cycles regardless of input frequencies.

#### III. ARCHITECTURE OF THE PROPOSED DLL

Fig. 3 is the architecture of the proposed DLL, composed of two CCDUs, an edge combiner, two HDUs, a two-step SAR controller, and a phase blender [15]. In this design, the number



Fig. 3. All digital cycle-controlled DLL.



Fig. 4. Locking procedure of the proposed all-digital DLL.

of control bits for the programmable counter in the two-step SAR controller and the HDU is set to 8-bits and 6-bits, respectively. In other words

$$C = 8$$
 and  $F = L + M = 3 + 3 = 6.$  (5)

In order to generate an output with 50% duty cycle, a dual structure, as reported in [13]-[15], is adopted and a pair of complementary inputs, Ref+ and Ref-, with a phase shift of 180 degrees is required. If the inputs are differential rather than complementary, the DLL merely preserves the input duty cycle. The locking procedure is illustrated in Fig. 4. After Start goes high, the initialization will take two cycles. In the cycle-tuning mode, the two-step SAR controller will block Ref- into the DLL for power saving and keep F[5:0] constant. The controller will adjust the value of C[7:0] corresponding to the phase relationship between Ref + and the feedback clock,  $V_{cdl_b}$ +, once every two clock cycles [16], [17]. If the initial delay time between Ref+ and  $V_{cdl_b}$ + is relatively smaller than the period of the input clock, the proposed DLL can reuse delay units in the CCDU to increase the delay time. After 16 cycles, the phase error between Ref+ and  $V_{cdl_b}$ + could be reduced within one cycle delay of the CCDU and Stop will go high. The locking procedure will move to the switching mode, which takes two clock cycles. Meanwhile, the controller will let Ref- enter the DLL for generating an output with 50% duty cycle. In the phase-adjusting mode, the DLL will align the phase of  $V_{cdl}$  to Ref+

within 1 LSB delay time of the FDL by adapting the value of F[5:0] [16], [17]. Since the phase adjustment requires 12 cycles, the total lock time of the proposed DLL is equal to 32(=2+16+2+12) cycles regardless of input frequencies. After Lock becomes high, C[7:0] and F[5:0] will be fixed. Therefore, the assertion of lock freezes the delay value setting, so that the DLL then becomes sensitive to supply and temperature variations. However, such open-loop design obviates the demand for a dither or lock detector, and hence lowers the power dissipation.  $V_{\rm cdl\_b}$ + is generated by blending  $V_{\rm cdl\_f}$ + itself to match the delay time from  $V_{\rm cdl\_f}$ + to  $V_{\rm cdl}$  [10], [15].

If the total variable delay of the HDU is larger than one cycle delay of the CCDU, the proposed DLL can improve the operating frequency range by a factor of  $2^{C}$  without decreasing timing resolution theoretically. Moreover, the proposed DLL can provide an output,  $V_{cdl}$ , with the duty cycle of 50%. At the slow–slow process corner with temperature of 100°C, the simulated results indicate  $T_{l_p}$  and  $\Delta t$  are around 1 ns and 28 ps, respectively. According to (3), the operating frequency range of the proposed DLL can be estimated to be from 2.21 MHz to 1 GHz.

#### **IV. CIRCUIT IMPLEMENTATION**

# A. Cycle-Controlled Delay Unit (CCDU)

Fig. 5 shows the schematic of the CCDU. Each CCDU is composed of two multiplexers, 13 inverters, and one DFF. The



Fig. 5. (a) Block diagram of the CCDUs. (b) Schematic of the CCDU.



Fig. 6. Timing diagram of the CCDU.

basic operation principle is similar to [18] and [19]. The timing diagram is illustrated in Fig. 6. Control±, generated from the two-step SAR controller, will control the multiplexers to select Ref± or the delayed signals, Out±, into the delay line. Once the signals propagate the CCDU, trig± will be triggered to make the programmable counter count upward. Control± will be high until the output of the programmable counter is equal to that of 8-bit SAR register, which will be explained in the next subsection. As shown in Fig. 6, since Td2 is larger than Td1, the delay time between Ref+ and  $V_{cdl_c}$ + can be digitally adjusted. It allows the input clock to circulate in the CCDU according to different input frequencies. The delay time between Ref+ and  $V_{cdl_c}$ + can be expressed as

$$t = k * Td2(k = 0 \sim 2^C - 1)$$
 (6)

where k is the times that the CCDU is reused. The delay time of the CCDU can be increased by reusing the delay units rather than cascading extra delay units.

The undesired duty cycle distortion of the delay line may cause the disappearance of  $Out\pm$  in the low-frequency operation due to mismatches of the driving capability of NMOS and PMOS in the CCDU. To resolve the problem, the edge-triggered DFF is placed in the front of the delay line so that it only responds to input edges. When the input frequency is high enough to make pass go high, Ref+ will directly bypass the CCDU. In this case, the optimal cycle of the CCDU is 0. The operation of the other CCDU is the same except that Stop will block Ref- into the CCDU for power saving in the cycle-tuning mode.

#### B. Hierarchical Delay Unit (HDU)

Assuming the total variable delay of the HDU is larger than one cycle delay of the CCDU, the HDU can further compensate the residue phase error between Ref+ and  $V_{cdl}$  within 1 LSB delay time of the FDU. A fully digital HDU has been implemented by adopting the inverter-base structure of the digital phase selector, the digital interpolator, and the 10-stage delay line as shown in Fig. 7. The dummy delay units are inserted in the first and last stage of the delay line to match the loading of each delay stages. Thus, the total number of delay units in the proposed DLL is only  $32 (= (6 + 10) \times 2)$ .

The digital phase selector will choose the two adjacent phases from p1-p9, according to the value of F[5:3]. Using the structure of the two-level inverter stage and the 3-bit binary-to-1-of-8 decoder reduces the latency of the digital phase selector as compared to the tree structure [20]. The first level inverter stage is composed of eight equal-sized parallel sets. Similarly, the dummy devices are added to compensate the unbalanced parasitcs in the digital phase selector.

The digital phase interpolator will adjust the output phase of  $V_{\text{cdl}_f}$ + by mixing s1 and s2, with a weight factor of  $w (0 \le w < 1)$ , determined from the value of F[2:0]. Normally turn-on and turn-off devices could be added in the upper and lower part of the digital phase interpolator, respectively, to improve the



(c)



Fig. 7. (a) Block diagram of the HDU. (b) Timing diagram of the HDU. (c) Schematic of the digital phase selector. (d) Schematic of the digital phase interpolator.

linearity of the characteristic of the HDU. At the slow–slow process corner with temperature of 100°C, the simulated characteristic of the HDU is indicated in the Fig. 8. Without the extra

devices, for example, when the value of F[5:0] is switched from 000111 to 001000, the increased delay time of the HDU is relatively small, resulting in a larger jump in next phase adjustment.



Fig. 8. Simulated characteristic of the HDU at slow-slow process corner with temperature of  $100^{\circ}$ C.

The timing resolution of the HDU is less than 28 ps and the total variable delay of the HDU is larger than one cycle delay of the CCDU.

### C. Two-Step SAR Controller

Adopting binary search scheme, the two-step SAR controller can provide a fixed lock time of 32 clock cycles over the entire operating frequency range. It is composed of two initialization circuits [8], two phase detectors, 8-bit and 6-bit SAR registers [16], [17], two 8-bit programmable counters, and two control logics as shown in Fig. 9. The outputs of the 8-bit programmable counter, P1[7:0], triggered by trig+, will be compared with that of the 8-bit SAR register, C[7:0], to decide the state of control+. The optimal cycles of the CCDU are determined sequentially one bit per adjustment from the MSB of C[7:0] in the cycle-tuning mode. For example, in the first adjustment, the value of C[7:0] and F[5:0] is set to the initial value, respectively, as shown in Fig. 9. After the input clock circulates 128 times in the CCDU, the counter will reach its final value and Control+ will select Out+ as the output of the CCDU,  $V_{cdl_c}+$ . If the period of  $\operatorname{Ref}$  + is larger than the delay of the DLL, the rising edge of  $V_{cdl_b}$ + will lead that of Ref+ and the MSB of C[7:0] will remain 1, vice versa. Since the duty cycle of  $V_{\text{cdl},\text{b}}$  + is not 50% and the initial delay time between  $\operatorname{Ref}$  + and  $V_{cdl_{-b}}$  + may be out of the normal operating range [7] for a simple phase detector, the initialization circuit [8] and the edge-triggered phase detector are required. Fig. 10 shows the schematic and the timing diagram of the edge-triggered phase detector, PD\_C. In order to have enough response time for the CCDU, the input frequency is divided by two. A DFF is placed in the signal path of  $V_{cdl_b}$ + to match the Clock-to-Q delay. Some dummy devices, which are not shown in the Fig. 9, are added to balance the loading of each node in the controller.

After Stop goes high, the controller will enter the switching mode and phase-adjusting mode sequentially. In the phase-adjusting mode, the DLL will align the phase of  $V_{cdl}$  to Ref+ within 1 LSB delay time of the FDL by changing the value of



Fig. 9. (a) Block diagram of the two-step SAR controller. (b) Example of the timing diagram of the two-step SAR controller.



Fig. 10. (a) Schematic of the phase detector  $(PD_C)$ . (b) Timing diagram of the phase detector  $(PD_C)$ .

F[5:0] once every two clock cycles. Thus, the two-step SAR controller ensures the proposed DLL to lock the input clock within 32 clock cycles regardless of input frequencies.



Fig. 11. (a) Block diagram of the edge combiner. (b) Schematic of the edge combiner cell. (c) Timing diagram of the edge combiner cell.

# D. Edge Combiner

For the convenience of following signal processing, the duty cycle of  $V_{cdl_c}$ + and  $V_{cdl_c}$ - should be recovered to nearly 50%. The schematic and timing diagram of the edge combiner is depicted in Fig. 11. In the cycle-tuning mode, only the upper counterpart is activated to save the power dissipations. Fig. 12 illustrates how the proposed DLL can generate an output with 50% duty cycle. After Stop goes high, Ref – is then allowed to enter the DLL. The edge combiner will start to recover the duty cycle of  $V_{cdl}$  to nearly 50%. The rising edge of  $V_{cdl_e}$ + is set by that of  $V_{\text{cdl},c}$  + while the falling edge of  $V_{\text{cdl},e}$  + is reset by the rising edge of  $V_{\rm cdl_c}$  –. The operation of the other counterpart is similar but opposite. Assume that both paths consisting of CCDUs, the edge-combiner cells, CDUs, and FDUs are matched.  $V_{cdl_f}$  + and  $V_{cdl_f}$  - will be complementary clocks. Incorporating with the phase blender [15], an output with 50% duty cycle is thus generated. If the signal "pass" is high,  $V_{cdl_c}$ + and  $V_{\text{cdl}\_c}$  – will directly bypass the edge combiner. The reason that the duty cycle of  $V_{cdl_c}$ + and  $V_{cdl_c}$ - slightly deviating from 50% is caused by the unmatched propagation time between the set and reset path as shown in Fig. 11. However, this error could be corrected by using the phase blender [15].



Fig. 12. Time diagram of generating a 50% duty cycle output.



Fig. 13. Microphotograph of the chip.

# V. EXPERIMENTAL RESULTS

Fig. 13 shows the die microphotograph. The experimental prototype has been fabricated in a 0.18- $\mu$ m one-poly six-metal (1P6M) CMOS technology and occupies a chip area of  $1.5 \times 1.5$  mm<sup>2</sup> including I/O interfaces. The active area is  $0.8 \times 1.1$  mm<sup>2</sup>. As indicated from the measured results, the proposed DLL can operate from 2 to 700 MHz at the supply voltage of 1.8 V. Figs. 14 and 15 show the DLL in the locked state when the input clock is 2 and 700 MHz, respectively. The highest operating frequency is limited by the circuit operating speed while the lowest operating frequency is restricted by maximum cycles that the input clock can circulate in the CCDU. Fig. 16 shows the locking procedure, measured by the Agilent Logic Analysis System 16702B, at 100 MHz. After Start goes high, the DLL begins to lock the input clock by adjusting sequentially the value of C[7:0] and F[5:0] and achieves the



Fig. 14. Locked state of the DLL at 2 MHz.



Fig. 15. Locked state of the DLL at 700 MHz.



Fig. 16. Locking process of the DLL at 100 MHz.

locked state within 32 clock cycles. Figs. 17 and 18 show the measured jitter histogram when the DLL operates at 2 MHz and 700 MHz, respectively. When the input frequency is 700 MHz, the measured rms and peak-to-peak jitter is 2.02 ps and 17.6 ps, respectively. Fig. 19 shows the measured results of the jitters and the duty cycle of output clock over different operating frequencies. From Fig. 19, the proposed DLL exhibits a relative jitter performance as compared to the conventional analog DLL [8]. The duty cycle of output clock varies less than 2%. Since the number of cycle times of the CCDU is proportional to the period of the input clock, the clock propagation path will be longer in the low-frequency operation. Thus, the absolute jitters in low frequency are worse than those in high frequency.

CSA803A COMMUNICATIONS SIGNAL ANALYZER date: 5-DEC-03 time: 9:28:08



Fig. 17. Measured jitter histogram when the DLL operates at 2 MHz.



Fig. 18. Measured jitter histogram when the DLL operates at 700 MHz.



Fig. 19. Measured jitters and the duty cycle of output clock.

But the jitters normalized by the period of the input clock are relatively small. Table I gives the performance summary of the proposed DLL.

|                                | JSSC99[10]           | ISSCC00[11]            | ISSCC01[12]                  | VLSI02[13]                   | VLSI03[14]      | This work                    |
|--------------------------------|----------------------|------------------------|------------------------------|------------------------------|-----------------|------------------------------|
| Process                        | 0.4-um<br>CMOS       | 0.15-um<br>CMOS        | 0.17-um<br>CMOS              | 0.13-um<br>CMOS              | 0.13-um<br>CMOS | 0.18-um<br>CMOS              |
| Timing resolution              | 40ps<br>(simulated)  | 10ps                   | Х                            | 14ps                         | Х               | <28ps<br>(simulated)         |
| Max.<br>operating<br>frequency | >667MHz              | 1GHz                   | 250MHz                       | 500MHz                       | 500MHz          | 700MHz<br>@1.8V              |
| Min.<br>operating<br>frequency | 250MHz               | Х                      | 100MHz                       | 66MHz                        | 66MHz           | 2MHz<br>@1.8V                |
| Peak-to-peak<br>jitter         | <250ps               | 128ps<br>/29ps (quiet) | 640ps<br>/200ps (quiet)      | Х                            | <25ps (quiet)   | 17.6ps<br>(quiet)<br>@700MHz |
| Lock time                      | 2.9us<br>(simulated) | Х                      | Х                            | >100 cycles                  | <150 cycles     | 32 cycles                    |
| VDD                            | 1.7~3.3V             | Х                      | 2.1V                         | >0.8V                        | >1.8V           | 1.4~2.5V                     |
| 50% duty<br>cycle output       | yes                  | X                      | yes                          | yes                          | yes             | yes<br>(50±2%)               |
| Power                          | 340mW<br>@400MHz     | Х                      | 27mW<br>/0.84mW<br>(standby) | 29mW<br>/<0.1mW<br>(standby) | 24mW            | 23mW                         |

TABLE I Performance Summary

X: not mentioned.

#### VI. CONCLUSION

A wide-range and fast-locking all-digital DLL is presented in this paper. The CCDU enlarges the operating frequency range of the proposed DLL by a factor of  $2^C$  without decreasing timing resolution. The two-step SAR controller ensures the DLL to lock the input clock within 32 clock cycles regardless of the input frequencies. The DLL operates in open-loop fashion once lock occurs in order to achieve low jitter operation with small area and low power dissipation. Since the DLL will not track temperature or supply variations once it is in lock, it is best suited for burst mode operation. Given a supplied reference input with 50% duty cycle, the DLL generates an output clock with the duty cycle of nearly 50% over the entire operating frequency range and achieves an acceptable jitter performance as compared to a conventional analog DLL [8]. The proposed all-digital DLL is suitable for the advanced deep-submicron technologies. If more advanced technologies were used, the performance of the DLL such as operating frequency range and jitters could be improved with a little design effort. The power consumption and the total die area would be reduced as well.

#### ACKNOWLEDGMENT

The authors would like to thank National Chip Implementation Center, Taiwan, for chip implementation.

# References

- B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design. New York: IEEE Press, 1996.
- [2] F. M. Gardner, "Charge-pump phase-lock loops," *IEEE Trans. Commun.*, vol. COM-28, no. 11, pp. 1849–1858, Nov. 1980.
- [3] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York: McGraw-Hill, 1998.
- [4] R. L. Aguitar and D. M. Santos, "Multiple target clock distribution with arbitrary delay interconnects," *Electron. Lett.*, vol. 34, pp. 2119–2120, Oct. 1998.
- [5] R. B. Watson Jr. and R. B. Iknaian, "Clock buffer chip with multiple target automatic skew compensation," *IEEE J. Solid-State Circuits*, vol. 30, no. 11, pp. 1267–1276, Nov. 1995.

- [6] C. H. Kim, J. H. Lee, J. B. Lee, B. S. Kim, C. S. Park, S. B. Lee, S. Y. Lee, C. W. Park, J. G. Roh, H. S. Nam, D. Y. Kim, D. Y. Lee, T. S. Jung, H. Yoon, and S. I. Cho, "A 64-Mbit, 640-Mbyte/s bidirectional data strobed, double-data-rate SDRAM with a 40-mW DLL for a 256-Mbyte memory system," *IEEE J. Solid-State Circuits*, vol. 33, no. 11, pp. 1703–1710, Nov. 1998.
- [7] Y. Moon, J. Choi, K. Lee, D. K. Jeong, and M. K. Kim, "An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low-jitter performance," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 377–384, Mar. 2000.
- [8] H.-H. Chang, J.-W. Lin, C.-Y. Yang, and S.-I. Liu, "A wide-range delaylocked loop with a fixed latency of one clock cycle," *IEEE J. Solid-State Circuits*, vol. 37, no. 8, pp. 1021–1027, Aug. 2002.
- [9] M.-J. E. Lee, W. J. Dally, T. Greer, H.-T. Ng, R. Farjad-Rad, J. Poulton, and R. Senthinathan, "Jitter transfer characteristics of delay-locked loops-theories and design techniques," *IEEE J. Solid-State Circuits*, vol. 38, no. 4, pp. 614–621, Apr. 2003.
- [10] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 632–644, May 1999.
- [11] K. Minami, M. Mizuno, H. Yamaguchi, T. Nakano, Y. Matsushima, Y. Sumi, T. Sato, H. Yamashida, and M. Yamashina, "A 1 GHz portable digital delay-locked loop with infinite phase capture ranges," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2000, pp. 350–351.
- [12] J.-B. Lee, K.-H. Kim, C. Yoo, S. Lee, O.-G. Na, C.-Y. Lee, H.-Y. Song, J.-S. Lee, Z.-H. Lee, K.-W. Yeom, H.-J. Chung, I.-W. Seo, M.-S. Chae, Y.-H. Choi, and S.-I. Cho, "Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin × 16 DDR SDRAM," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, vol. 431, Feb. 2001, pp. 68–69.
- [13] T. Matano, Y. Takai, T. Takahashi, Y. Sakito, I. Fujii, Y. Takaishi, H. Fujisawa, S. Kubouchi, S. Narui, K. Arai, M. Morino, M. Nakamura, S. Miyatake, T. Sekiguchi, and K. Koyama, "A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output buffer," *IEEE J. Solid-State Circuits*, vol. 38, no. 5, pp. 762–768, May 2003.
- [14] J.-T. Kwak, C.-K. Kwon, K.-W. Kim, S.-H. Lee, and J.-S. Kih, "Low cost high performance register-controlled digital DLL for 1 Gbps × 32 DDR SDRAM," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2003, pp. 283–284.
- [15] K. Nakamura, M. Fukaishi, Y. Hirota, Y. Nakazawa, and M. Yotsuyanagi, "A CMOS 50% duty cycle repeater using complementary phase blending," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2000, pp. 48–49.

- [16] A. Rossi and G. Fucili, "Nonredundant successive approximation register for A/D converters," *Electron. Lett.*, vol. 32, pp. 1055–1057, Jun. 1996.
- [17] G.-K. Dehng, J.-M. Hsu, C.-Y. Yang, and S.-I. Liu, "Clock-deskew buffer using a SAR-controlled delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 35, no. 8, pp. 1128–1136, Aug. 2000.
- [18] R. Farjad-Rad, W. Dally, N. Hiok-Tiaq, R. Senthinathan, M.-J. E. Lee, R. Rathi, and J. Poulton, "A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1804–1812, Dec. 2002.
- [19] A. Waizman, "A delay line loop for frequency synthesis of de-skewed clock," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 1994, pp. 298–299.
- [20] A. Momtaz, C. Jun, M. Caresosa, A. Hairapetian, D. Chung, K. Vakilian, M. Green, T. Wee-Guan, J. Keh-Chee, I. Fujimori, and C. Yijun, "A fully integrated SONET OC-48 transceiver in standard CMOS," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1964–1973, Dec. 2001.



Hsiang-Hui Chang (S'01) was born in Taipei, Taiwan, R.O.C., on February 4, 1975. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, in 1999, 2001, and 2004, respectively.

His research interests are PLL, DLL, and high-speed interfaces for gigabit transceivers.



**Shen-Iuan Liu** (S'88–M'93–SM'03) was born in Keelung, Taiwan, R.O.C., in 1965. He received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1987 and 1991, respectively.

During 1991–1993 he served as a Second Lieutenant in the Chinese Air Force. During 1991–1994, he was an Associate Professor in the Department of Electronic Engineering of National Taiwan Institute of Technology. He joined the Department of Electrical Engineering, NTU, in 1994, and he has been a

Professor since 1998. His research interests are in analog and digital integrated circuits and systems.

Dr. Liu received the Engineering Paper Award from the Chinese Institute of Engineers in 2003, the Young Professor Teaching Award from MXIC Inc., the Research Achievement Award from NTU, and the Outstanding Research Award from National Science Council in 2004. He has served as Chair of the IEEE SSCS Taipei Chapter since 2004. He has served as General Chair of the 15th VLSI Design/CAD Symposium, Taiwan, 2004, and as Program Co-Chair of the Fourth IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Japan, 2004.