The main features of the two circuits are summarised in Table 1. The offset error is divided by 70, the linear gain error by 20 and the total harmonic distortion is reduced by 13 dB .

Conclusion: Thus, the comparative Table of results shows that a significant improvement in accuracy and linearity is obtained when the proposed circuit is used. These improvements convey drawbacks : the occupied die area and the power dissipation are multiplied by three. However, it is worthwhile to use the proposed circuit for applications requiring high accuracy memory cell.

## © IEE 1997

25 July 1997
Electronics Letters Online No: 19971172
P. Riffaud, G. Tourneur, E. Garnier and P. Roux (Laboratoire d'Etudes de l'Integration des Composants et Systemes Electroniques, Université de Bordeaux I, 351 cours de la Libération, F-33405 Talence Cedex, France)
Corresponding author: E. Garnier

## References

1 toumazou, C., hughes, J.b., and pattulo, D.m.: 'Regulated cascode switched-current memory cell', Electron. Lett., 1990, 26, (5), pp. 303-305
2 SONG, M., LEE, Y., and KIM, W.: 'A clock-feedthrough reduction circuit for switched-current systems', IEEE J. Solid-State Circuits, 1993, 28, (2), pp. 133-137
3 CHA, H.W., and WATANABE, K: 'A clock-feedthrough and offset compensated fully-differential switched-current circuit', IEICE Trans. Fundam., 1995, E78-A, (11), pp. 1531-1533

## High-speed divide-by- $4 / 5$ counter for a dual-modulus prescaler

Ching-Yuan Yang, Guang-Kaai Dehng and Shen-Iuan Liu

Indexing terms: Dividing circuits, CMOS integrated circuits
A new high-speed divide-by- $4 / 5$ counter is developed. Based on this divide-by- $4 / 5$ counter, a $3 \mathrm{~V} 2 \mathrm{M} \sim 1.1 \mathrm{GHz}$ dual-modulus divide-by-128/129 prescaler fabricated with $0.6 \mu \mathrm{~m}$ CMOS technology is presented. Its maximum operating frequency of 1.11 GHz with power consumption of 19.2 mW has been measured at a 3 V supply voltage. In addition, for a power supply of 1.5 V , the circuit consumed 2.67 mW at a maximum input frequency of 520 MHz .

Introduction: In the world of modern communication, a frequency synthesiser with a high frequency prescaler is an important building block. New techniques offering higher integration density, lower power consumption, and high-speed capability are developed to achieve a high-speed CMOS prescaler. Some circuits, using advanced processes and/or special circuit techniques, are proposed to realise the high-speed dual-modulus prescalers. [1] Among them, the true single phase circuit technique [2] has resulted in many complex CMOS circuits operating at clock frequencies of several hundred MHz [3], and some CMOS circuits operating at $>1 \mathrm{GHz}[4,5]$ in the last few years. In this Letter, a new architecture of a dual-modulus prescaler is presented and fabricated with $0.6 \mu \mathrm{~m}$ CMOS technology. Experimental results are given to demonstrate its performance.

Circuit description: Most divide-by-128/129 dual-modulus prescalers $[4,5]$ consist of a synchronous divide-by- $4 / 5$ counter as the first (high-frequency) stage, followed by a chain of toggle flip-flops (TFFs), which forms an asynchronous divide-by- 32 counter as the second (low-frequency) stage. The operating speed of prescalers is mainly limited by that of the divide-by- $4 / 5$ counter. Unlike the conventional architecture of the divide-by- $4 / 5$ counter, our approach is to preprocess the clock signal and to cascade the divide-by-two stages as shown in Fig. 1. There is a clock pre-
processor (CP) and also two TFFs in the proposed circuit. The clock preprocessor consists of a 'half transparent' (HT) register [6] in the front, and a domino CMOS logic [7] in the rear. The HT register in its register mode (with a ' 0 ' input) is extremely fast; nearly one inverter delay is required. In its transparent mode (with a ' 1 ' input), the inverse data directly returns to the input of the precharged stage (becoming ' 0 ') so that the output signal is allowed to delay a period of the input signal. If $M C$ is set to ' 0 ', then $M C x$ is always ' 1 ', and this domino gate is used as the buffer stage of the two-stage inverter and directly transports the signal to the next stage (TFF). The state in the HT register is not effected since its input $C K x$ is the inverse of clock signal 'in'. In this situation, the output frequency equals $\operatorname{fin} / 4$, where $f$ in is the frequency of the input signal 'in'.


Fig. 1 Schematic diagram of circuit of divide-by-4/5 and corresponding timing diagram
(a) Circuit
(b) Timing diagram
(i) waveforms on divided-by-4 circuit ( $m c=0$ )
(ii) waveforms on divided-by-5 circuit ( $m c=1$ )


Fig. 2 Measured maximum operating frequency and power consumption of divide-by-128/129 circuit against power supply voltage
__ maximum operating frequency
--- - power consumption

If $M C$ is set to ' 1 ', the NAND gate forces $M C x$ to be ' 1 ' or ' 0 ', depending on the nodes $O U T^{*} 2$ and $O U T$; then, divide-by- 5 operation can be obtained. When the control signal $M C x$ is ' 1 ', the CP acts just as a buffer and the output frequency equals one-fourth of the input frequency. However, when $M C x$ is ' 0 ' (i.e. the outputs of these two TFFs are ' 1 '), the $N$-logic block in the domino gate is forced to turn off. This causes the domino gate to hold the precharge state (i.e. $C K x$ is ' 1 '), although the signal 'in' is changed to ' 1 '. Observe that node ' A ' is changed to ' 0 ' through the $N-\mathrm{C}^{2}$ MOS stage of the HT register. In the next half period of 'in' (which becomes ' 0 '), node ' B ' in the $P$ - $\mathrm{C}^{2} \mathrm{MOS}$ stage of the HT register is precharged to 1 . At this time (node ' $B$ ' becoming ' 1 '), it will not
discharge the 'locked' state of the domino logic gate until the next half period of 'in' (which becomes ' 1 '). This corresponds to an extra delay for CKx. Since $C K x$ changes state after both the nodes ' B ' and 'in' become ' 1 ', OUT* 2 changes state and $M C x$ forces the domain logic gate to become a buffer, as before. In this process, an extra delay is equivalent to a divide-by-five operation. The timing diagram of the whole function is shown in Fig. $1 b$.

In the divide-by- $4 / 5$ counter, the frequency-limiting in the architecture is carried out by the CP stage and the first TFF; they must operate at full speed. To date, the fastest standard CMOS DFFS are the dynamic circuits in $[6-8]$. They are based on the ninetransistor TSPC DFF of Yuan and Svensson. [2]


Fig. 3 Measured waveforms of prescaler divided by 128 and 129
$f i n=520 \mathrm{MHz}, V D D=1.5 \mathrm{~V}$

Experimental results: To verify the performance of the high-speed prescaler described above, a divide-by-128/129 prescaler using the proposed divide-by- $4 / 5$ counter has been implemented in a $0.6 \mu \mathrm{~m}$ single-poly double-metal $N$-well CMOS process. The DFFs are carefully designed with the help of SPICE simulation. Fig. 2 shows the measured maximum clock frequency against the supply voltage for the divide-by-128/129 prescaler as well as the corresponding power consumption. When the supply voltage is 3 V , the chip can operate at a clock rate from $\sim 2 \mathrm{MHz}$ to 1.11 GHz with a maximum power dissipation of 19.2 mW . The chip continues to operate until the supply voltage is 1.5 V . This demonstrates the robustness of the TPSC circuits. Fig. 3 shows the input waveform at 520 MHz and the output waveform of this prescaler for a 1.5 V supply and with a power dissipation of 2.67 mW .

Conclusions: A new divide-by- $4 / 5$ circuit without pass gates is adopted for the high-speed prescaler. The experimental results of the prescaler have demonstrated its ability to operate up to 1.11 GHz with low power consumption.

Acknowledgment: The authors would like to thank the National Science Council for financial support and the Chip Implementation Center (CIC), National Science Council, Taiwan, Republic of China for fabrication of the test chip. This work was sponsored by NSC85-2622-E002-019.
© IEE 1997
11 August 1997
Electronics Letters Online No: 19971175
Ching-Yuan Yang, Guang-Kaai Dehng and Shen-Iuan Liu (Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10664, Republic of China)

## References

1 LARSSON, e.: 'High-speed architecture for a programmable frequency divider and a dual-modulus prescaler', IEEE J. SolidState Circuits, 1996, 31, (5), pp. 744-748
2 YUAN, J., and SVENSSON, C.: 'High-speed CMOS circuit technique', IEEE J. Solid-State Circuits, 1989, 24, (1), pp. 62-70

3 Lu, f., SAMUELI, h., yUAN, J., and SVENSSON, C.: 'A700 MHz 24-b pipelined accumulator in $1.2 \mu \mathrm{~m}$ CMOS for application as numerically controlled oscillator', IEEE J. Solid-State Circuits, 1993, 28, (8), pp. 878-886
4 hUANG, Q., and ROGENMOSER, R.: 'Speed optimisation of edgetriggered CMOS circuits for gigahertz single-phase clocks', IEEE J. Solid-State Circuits, 1996, 31, (3), pp. 456-465
5 CHANG, B., PARK, J., and KIM, w.: 'A 1.2 GHz CMOS dual-modulus prescaler using new dynamic D-type flip-flop', IEEE J. Solid-State Circuits, 1996, 31, (5), pp. 749-752
6 YUAN, J., and SVENSSON, C.: 'Fast CMOS nonbinary divider and counter', Electron. Lett., 1993, 29, pp. 1222-1223
7 KRAMBECK, R.H.: 'High-speed compact circuits with CMOS', IEEE J. Solid-State Circuits, 1982, SC-17, pp. 614-619

# Importance of including power supply noise in digital circuit simulations 

K.A. Jenkins and P.-F. Lu


#### Abstract

Indexing terms: Digital circuits, Power supply circuits Rapid fluctuations of power supply values, or switching noise, can have a significant effect on VLSI circuit speed. This is shown by comparing circuit simulations with measurements of the critical path delay of a self-resetting SRAM. It is shown that including the measured high frequency noise in the circuit simulation leads to very accurate prediction of circuit speed.


The effect of power supply voltage on the speed of digital CMOS circuits is well understood. Simulations of circuits carried outfor the purpose of predicting speed must, of course, take the power supply values into account. It is also understood that the switching activity of the circuit results in power supply noise on-chip; that is, the levels are different from the external values, due to resistive and inductive losses in the package and on-chip wiring. While it is possible, in principle, to model the effect of switching noise, it is a very intractable problem. Instead, significant effort is spent in designing packaging to reduce the noise. Although package design cannot eliminate noise, the aim is to reduce it to such a level that the circuit designer can assume an average value of the power supply with a negligible amount of noise.

With good device models, and careful package design to minimise switching noise, it can be expected that simulations of circuit performance should accurately predict the circuit speed. However, it is shown in this Letter that short term power supply noise can cause significant departures from predictions which assume a constant, steady-state, value of the power supply. If the on-chip voltages vary on a time scale comparable to the circuit cycle time, the switching speed will be affected. Although circuit designers have long been aware of the importance of power supply changes, there have been no previous reports of measurements of power supply and device performance on the same short time scale. Because of the difficulty of simulating power supplies, measurement after circuit fabrication can greatly assist the designers in assessing the accuracy of their device models, and explaining performance discrepancies.

The effect of rapid power supply variations is illustrated here for the case of a 500 MHz self-resetting SRAM [1], for which the switching noise is potentially large, due to the large simultaneous switching activity in the circuit. The SRAM was built in a CMOS technology which was very well characterised [2], and for which highly accurate transistor models were developed. Furthermore, distinct parameters of the model were extracted for each chip of the wafers processed. By using the same die for measurement and simulation, an accurate simulation of the SRAM speed was expected.

Measurements of the SRAM speed were made by measuring waveforms of the internal nodes with a high bandwidth electronbeam prober [3]. To minimise power supply noise, the duty cycle of the chip was very low during the measurements. Examples of some of the measured waveforms are shown in Fig. 1, in which a series of signals along the critical path of a 'read' of the SRAM,

