# XNOR-Based Double-Edge-Triggered Flip-Flop for Two-Phase Pipelines

Ying-Haw Shu, Member, IEEE, Shing Tenqchen, Senior Member, IEEE, Ming-Chang Sun, and Wu-Shiung Feng, Senior Member, IEEE

Abstract—The conventional approach of double-edge-triggered flip-flops (DET-FFs) is to have two similar edge-triggered latches. And the achieved faster speed is at the cost of double chip area and complex logic structure. By contrast, the XNOR-based approaches is difficult to reach the speed demand due to the delay of the XNOR-based clock generator. This paper proposes a new designed DET-FF based on an alternative XNOR gate. By utilizing the sensitivity to the driving capacity of the previous stage, we use this simplified XNOR gate as a pulse-generator. A modified transparent latch following the pulse-generator acts as an XNOR-based DET-FF, which accomplishes the almost same speed and less power dissipation as compared with two conventional DET-FFs under HSPICE simulation. We also implemented the XNOR-based DET-FF in a two-phase-pipeline system, and the HSPICE simulation in the TSMC 0.25 um CMOS process shows our proposed DET-FF is much faster than those two conventional DET-FFs.

*Index Terms*—Double-edge triggered (DET), pipeline, two phase, XNOR.

# I. INTRODUCTION

**T** IS well known [1]–[3] that over the past two decades, researches into asynchronous circuits have revealed the possibility they can accomplish better average-case performance than other types by removing the unnecessary overhead of the globally synchronizing clock signal. The concept of globally asynchronous locally synchronous (GALS) circuit extends the application of asynchronous circuits to VLSI systems [1]. Furthermore, a wrapped asynchronous module with its own built-in power system and operating clock will make IP-reuse easier and more reliable [2].

The better performance achieved by two-phase control systems in simple asynchronous modules is due to their two data-captures per cycle of control-signal transitions. The direct practical application is to use a newly defined interface to make a conventional transparent latch active on the rising and falling edges of the control signal. That is because conventional transparent latches are level-sensitive, and such transparent latches determine their status, opaque (blocking) or transparent (transmissive), according to the logic state of the control signal. Therefore, the AMULET group [merged into

Digital Object Identifier 10.1109/TCSII.2005.855734



Fig. 1. Functional diagram of the pseudostatic DET-FF.

the advanced processor technology (APT) group] proposed a two-to-four-phase interface to enable level-sensitive latches to work in two-phase control schemes [3]. Of course, another direct approach for two-phase control systems is to design a truly new double-edge-triggered D flip-flop (DETDFF) [4]-[7] in which the DFF (signal latch) used in a two-phase micro-pipeline scheme is active during low-to-high and high-to-low transitions on the control wire. One good demonstration depicted by Yun is the pseudostatic DETDFF [6] constructed from a pair of complementary edge-triggered flip-flops. Each edge-triggered flip-flop is made up of three NAND or NOR gates in series, the functional diagram is shown in Fig. 1. Its capture mechanism stores the logic state of transferred data in the previous control state, then captures and passes the stored data to the output through the last NAND gates at the next control signal transition. To avoid having the stored data influenced by clock transitions, a pair of weak inverters maintains the data until the feedback signal from the new output state arrives. But the resulting tradeoff is longer propagation delay, which is a quite concerned drawback under asynchronous control system.

Another study also based on storing input data uses the parasitic capacitance of MOSFETs. Input selection-pairs controlled by control (clock) signals are constructed from transmission paths, unlike [6], which used inverters with enable-switches. The inverters used in this double-edge-triggered flip-flop (DET-FF) are N-type structures with active p-MOSFET (MOS-style) clocked inverters [7]; the functional diagram is shown in Fig. 2. There are two obvious drawbacks in this simpler and faster structure. First, the width-to-length ratios of n-MOSFETs must be much larger than those of p-MOSFETs in order to have low enough logic level and larger enough parasitic capacitance to reliably store data. This implies that the transistor area of the MOS-style DET-FF cannot be smaller than those of other solutions despite being constructed from only 16 transistors. The next is that control signals (clock and reverse clock) must be flawlessly matched. Otherwise, the next input data and the slower responding weak inverter may affect

Manuscript received May 15, 2005; revised June 13, 2005. This paper was recommended by Associate Editor S. Tsukiyama.

Y.-H. Shu, S. Tenqchen, and M.-C. Sun are with the Department of Electrical Engineering, National Taiwan University, Taipei 115, Taiwan, R.O.C. (e-mail: paulus@mail.apol.com.tw).

W.-S. Feng is with the Department of Electrical Engineering, National Taiwan University, Taipei 115, Taiwan, R.O.C., and also with the Department of Electronic Engineering, Chang Gung University, Taoyuan 333, Taiwan, R.O.C. (e-mail: fengws@mail.cgu.edu.tw).



Fig. 2. Functional diagram of the MOS-style DET-FF.



Fig. 3. Functional diagram of the XNOR-based DET-FF.

the stored data, and a sudden spike caused by the previous data may appear in the output.

# II. XNOR GATE AND DETDFF

We chose to use a pulse generator to trigger a transparent latch twice per control signal cycle. The pulse generator contains only two kinds of common logic gates, XNOR and inverter. A functional diagram of the proposed design is shown in Fig. 3. Theoretically, the inverter provides a delayed enough complementary control signal, and the XNOR logic provides signal pulses on the rising and falling transitions of the control signal. There are two practical problems with such a design. The first is that the XNOR gate is constructed from transmission paths or lots of conventional gates, that means it must intrinsically have either greater power dissipation or more propagation delays since the signals pass through three logic stages. The other problem is due to the inverter cell shown in Fig. 3. It is usually constructed of many inverters in odd stages, since those inverters must provide longer delays to have wide enough pulses for the following transparent latch to capture data. Therefore, the XNOR may be a direct solution to making conventional level-sensitive latches fit two-phase pipelined controllers, but it is seldom discussed in the past.

The transmission paths, or so-called "capture-pass logics", perform the XNOR logic used in our design [8], as shown in Fig. 4. It works well, with greater speed and lower power dissipation than the conventional capture-pass adders described in [9], [10]. But there are two potential problems with this proposed XOR/XNOR cell. The first problem is its poor driving capacity. Input signals just pass through transmission paths to the output ports, which means they must directly drive the input capacitors of the next stage via the transmission paths. And the following is that the Mp4 and Mn4 transistors provide additional current paths for the XOR port when both inputs are logical-high (A = B = `1'), and for the XNOR port when both inputs are logical-low (A = B = `0'). However, this transistor pair also causes the other problem. The logic-state patch may



Fig. 4. Pulse generator used in XNOR-based DET-FF.



Fig. 5. Transparent latch in the transistor-level.

provide an undesired current path from Vdd to the XNOR, or from the XOR to ground if an internal latching problem occur (XOR = "1", XOR = "0"). In our application, we apply the sensitivity of the driving capacity to validate variances in the input clock pair. We give the input inverter of the postponed clock a longer channel-length, and that allows this proposed XNOR to generate wide enough pulses at the control signal transitions, as shown in Fig. 4. Actually, the width-to-length ratio of the input inverter determines whether our XNOR-based DET-FF can work normally, and how low its supply voltage can be.

In deciding on whether to get complementary pulses from a single output of the XNOR gate with an added inverter or from a pair of differential outputs from XOR and XNOR, we choose the single output and removed Mn4. Thus, we cut off an undesired current path, and made the width-to-length ratio of the input inverter fit our design criterion, as mentioned above. We also added inverters as output buffers to get better driving capacity.

The latch used in our proposed DET-FF is a common transparent latch made up of a latching inverter pair and an input selector (multiplexer), as shown in Fig. 5. Normally, the time required to break the latch status between the inverter pair dominates the entire transient time of a transparent latch. To avoid reaction from a weak inverter, we set this lock-inverter also controlled by the complementary pluses. Thus, there is no latchbreaking delay in our proposed transparent latch. To compensate mismatch between the "clk" and the "Iclk" by the loaded capacitance and channel resistance, discussed in [11], we connected the control signals to the Mp1 and Mn2 of the selector shown in Fig. 5.



Fig. 6. Two-phase pipelined control system used in the simulation.

## III. TWO-PHASE PIPELINE SYSTEM

## A. Simplified System

The two-phase asynchronous control system used in our simulation was modified from Sutherland's micro-pipeline, which is well discussed in [6], and shown in Fig. 6. Only five logic components are used in this system, C-element, delay buffer, inverter, XNOR gate, and transparent latch. The XNOR gate and transparent latch work as DET-FFs, as discussed above. In our simulation, we treated the logic cells between stages as a simple signal path, which means the entire system worked as a simple first-in-first-out (FIFO) operation. In practical applications, those logic cells may be asynchronous modules or computing cells.

## B. Delay Buffer and C-Element

The delay buffer used in our simulation was a timing controller that not only determined how much time a single computing process took, but also whether there was enough time for the logic cells in the asynchronous modules to finish their computation before the next latch became active. Thus, the timing requirement of this delay buffer had to satisfy the equations presented in [6]—in simplified form

$$t_d \ge t_{ck \to Q} + t_{\log ic} - t_{R \to A'} \tag{1}$$

where  $t_d$ ,  $t_{ck\to Q}$ ,  $t_{logic}$ , and  $t_{R\to A'}$  are the response times of the delay buffer, transparent latch, logic cell, and C-element, respectively.

The C-element used in this simulation is a symmetric design discussed in [11]. The keeper of the symmetric C-element used in our simulation provides other signal paths for inputs only when the logic states of the inputs are unmatched. When new matched signals appear at the inputs, those signal paths will be useless. Thus, this symmetric C-element provides the best power-delay performance.

#### **IV. SIMULATION RESULTS**

We performed HSPICE simulation using the fast-mode model of TSMC 0.25- $\mu$ m CMOS process. Generally, all the N-MOS-FETs used in this simulation followed the original designed ratio with the minimum channel width or length (0.4  $\mu$ m), except for those used in the output inverters and those critical weak inverter discussed in Section II. Due to the quasi-symmetric CMOS structure used in our promoted DET-FF and the pseudostatic DET-FF, we adjusted the pMOS width-to-length ratio used in those two designs to 2.5 times the nMOS ratio. That is based on the simulation results for the rising and falling times of a simple inverter. However, it is a tradeoff between the response speed and workable voltage range to define the pMOS ratio of the MOS-style DET-FF [7]. The pMOS ration used in this simulation is 1.25 times the original design to obtain the reasonable falling time and operating voltage range. That is because of the ratio of the conductance parameters are different in different technology processes.

Our simulation yielded some attractive phenomena. First, we found that the response speed of the pseudostatic DET-FF [6] varies in accordance with the input data and control because the current data may be pre-stored at either the first stage or the second stage, and timing differences may result from the signal propagation path length and the cutting-in voltage of the clock signal. Furthermore, this pseudostatic DET-FF will take more time to transfer the correct signal if the first stage or the second stage needs to break the locked status of its weak inverters. A similar weak-inverter postponement problem also happens with the MOS-style DET-FF [7]. The only difference between these two phenomena is that channel charge-release plays a major part in the MOS-style DET-FF since the pre-stored data relies on the input capacitance of the main inverter.

Because we consider the worst-case control overhead as the timing assessment in an asynchronous control system, the propagation delay performance shown in Fig. 7(a) is based on the worst case for each DET-FF. All the DET-FF solutions functioned properly when the supply voltage was varied from 2.2 to 3.5 V, although this 0.25-um CMOS model is at optimum at 2.5 V. Generally speaking, the MOS-style DET-FF still had the shortest delay performance, even though its power dissipation was frightful when the supply voltage was increased. But as mentioned above, the tradeoff of the falling time and operating voltage range, the MOS-style DET-FF could not capture the correct data as supply voltage is under 2.2 V. Our XNORbased DET-FF still showed a significant reduction in propagation delay and power dissipation compared with the pseudostatic DET-FF. The propagation delay shown in our simulation results is the time delay from the control clock transition to the output data signal transition, and "power dissipation" refers to the power consumption of all control circuits, including the input and output buffers, because the driving current for the transmission paths comes from the output of the previous stage.

The timing concern between the data and clock arrivals is an important factor as applying into the pipeline control system. Generally, there is no distinct variation for all three types of DET-FF as operation voltage varying, and the set-up and hold time are (0.474 ns, -0.324 ns), (-0.05 ns, 0.11 ns), and (0.04 ns, 0.25 ns), respectively for the XNOR-based, pseudostatic, and MOS-style. Although the MOS-style DET-FF has the smaller propagation delay in Fig. 7(a), its untimely zone, set-up time plus hold time, is larger than the others. That is because of the transmission paths used in data selection. On the contrary, our promoted XNOR-based DET-FF shall apply an early clock to get the faster response speed for the delay of pulses generator.



Fig. 7. Simulation results for various supply voltages. (a) Propagation delay. (b) Power dissipation.

In the high-speed circuit, the current variation, di/dt, is another important issue, especially for the noise concern. According to the simulation analysis, the arrival of the data or clock causes the largest current variation. Since all input buffers used in this simulation are with the same size, the input conductance for all buffers of the data and clock is similar. Therefore, the simulation result presents there is no significant difference between those three types of DET-FF, shown in Fig. 8.

In the simulation of the maximum operation speed, the dual clocks in our simulated MOS-style DET-FF are well-matched clocks pair although it is almost impossible in real VLSI implementations. And the timing of the clock arrival is optimized to obtain the fastest result for each operating voltage. Obviously, the DET-FFs with data pre-charged scheme, which are able to latch the data even after the arriving of the next data, perform better. With the modification of applying unlocked weak inverter, our promoted DET-FF still reaches the similar response speed. By contrast, the adjustment of the clock arrival timing is necessary for the pseudostatic DET-FF because of the longer propagation delay under lower voltage. The MOS-style DET-FF achieves almost same response speed from 2.2 V to 3.3 V after applying the proper pMOS ratio as mentioned above.

In pipelined simulation, we did not insert any asynchronous modules or computing cells in the "logic cell" between the two



Fig. 8. Current variation under 2.5-V supply voltage.



Fig. 9. Maximum reliable response speed in individual simulation.

DET-FFs. Actually, we directly connected those DET-DFFs with C-elements, and triggered the first C-element with a simple clock. Thus, the whole pipelined system functioned just like an asynchronous FIFO. Also, the delay-buffers delay times determined by the actual propagation delay between the stages and the optimal timing of the clock arrival. The response speeds delivered by all the DET-FF solutions are shown in Fig. 9. In a pipelined control system, the propagation delay of any internal cell is included in the entire control timing, as discussed in [6]. Thus, the real timing consideration, as shown in Fig. 10, is the propagation delay of the output waveform, which limits the enable timing of the next stage. Some significant results were found in simulating the pipelined system. First, it is clear that only our XNOR-based DET-FF had symmetrical "high" and "low" periods, but not for the others. That was due to the unequal rising and falling times of the other two solutions. Second, obvious spikes were found in the output waveform of the MOS-style DET-FF because the new arrival of the data and clock pair could affect the pre-stored data in the parasitic capacitance.

As the simulation result shown in Fig. 11, the response speed in pipelined FIFO of proposed XNOR-based DET-FF is almost same as the one of the proposed DET-FF in the individual



Fig. 10. Output waveforms of the third stage from the DET-FFs in pipeline simulation.



Fig. 11. Maximum reliable operating speed in the pipeline simulation.

TABLE I Performance Comparison of Those Three Types of DET-FF in the Pipelined System at Vdd = 2.5 V

|                          | XNOR-Based                | Pseudo-Static             | MOS-style                 |
|--------------------------|---------------------------|---------------------------|---------------------------|
| Transistor Amount        | 21                        | 24                        | 16                        |
| Occupied Chip Ares       | 145                       | 172                       | 186                       |
| Maximum Data Rate        | 1.84Gbit                  | 1.08Gbit                  | 1.54Gbit                  |
| Max. Propagation Delay   | 653.3ps                   | 1117.6ps                  | 474.1ps                   |
| Average Power @ 660Mbs   | 298.3uW                   | 429.0uW                   | 345.1uW                   |
| Maximum di/dt            | 5.19e <sup>+6</sup> A/sec | 5.46e <sup>+6</sup> A/sec | 5.55e <sup>+6</sup> A/sec |
| Well-matched CLK         | No Need                   | No Need                   | Need                      |
| Input affected by CLK    | No                        | No                        | Yes                       |
| Static Power Dissipation | No                        | No                        | Yes                       |

simulation. By contrast, the pseudostatic and the MOS-style DET-FFs have the significant degradation in the pipelined system even with careful timing adjustment. That is because the asymmetric rising/falling time and pre-charging essential restricted the overall performance in pipelined system.

Finally, we reveal all simulation comparison in pipelined system in Table I. Although the transistor amount used in the MOS-style is far fewer than the others, the occupied chip area is larger than the others. That is because the singular width-to-length ratios are used in the original design.

# V. CONCLUSION

The conventional XNOR-based DET-FF is assumed to have the slower response and the larger power dissipation than the complementary approach because of the need of the postponed clock. We have presented a different approach to implementing XNOR-based DET-FF. We re-used an alternative capture-pass XOR/XNOR gate, and took advantage of its potential weakness, sensitivity to the driving capacity of the previous stage, to generate a pair of stable clock pulses for a simple transparent latch. Our approach allows use of the good property of level-sensitive latches modified from past studies to be reproduced in a two-phase pipelined control system. Simulation results show that the proposed XNOR-based DET-FF approach achieves a power reduction of over 13% and a pipeline-speed improvement of more than 19% as compared with other two approaches. Although the MOS-style DET-FF demonstrated significant speed performance in the individual simulation as the supply voltage is within 2.2 V~3.0 V, the tradeoff for the active load and simple latching mechanism is more power dissipation and a disappointing output spike. And the pseudostatic DET-FF had better speed performance in the individual simulation, but it was also with the drawback of the more power dissipation and longer propagation delay than the others. Furthermore, our proposed XNOR-based DET-FF with the symmetric rising/falling time and the faster latching scheme is the better choice in pipeline system. The HSPICE simulation shown our proposed DET-FF is able to provide the maximum reliable data rate up to 1.84 Gbs at Vdd = 2.5 V in the pipelined system. Generally, our proposed XNOR-based DET-FF had stable performance, even when the system power was reduced to 1.8 V, 72% of rated supply voltage. And this reliable performance was also evident in the two-phase pipelined system simulation.

## REFERENCES

- D. M. Chapiro, "Globally-asynchronous locally-synchronous systems," Ph.D. dissertation, Stanford Univ., Stanford, CA, Oct. 1984.
- [2] W. J. Carlsson, W. Li, T. Njolstad, K. Palmkvist, L. Wanhammar, and S. Zhuang, "A modular asynchronous wrapper," in *Proc. Nat. Conf. Radio Sci.*, Stockholm, Sweden, Jan. 10–13, 2002.
- [3] P. Day and J. V. Woods, "Investigation into micropipeline latch design styles," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 3, no. 2, pp. 264–272, Jun. 1995.
- [4] M. Afghahi and J. Yuan, "Double edge-triggered d-flip-flops for high-speed CMOS circuits," *IEEE J. Solid-State Circuits*, vol. 26, no. 8, pp. 1168–1170, Aug. 1991.
- [5] R. Hossain, L. Wronski, and A. Albicki, "Low power design using double edge-triggered flip-flops," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 2, no. 2, pp. 261–265, Jun. 1994.
- [6] K. Y. Yun, P. A. Beerel, and J. Arceo, "High-performance two-phase micropipeline building blocks: Double edge-triggered latches and burstmode select and toggle circuits," *Proc. IEE Circuits, Devices, Syst.*, vol. 143, no. 5, pp. 282–288, Oct. 1996.
- [7] P. Varma, B. S. Panwar, A. Chakraborty, and D. Kapoor, "A MOS approach to CMOS DET flip-flops design," *IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.*, vol. 49, no. 7, pp. 1013–1016, Jul. 2002.
- [8] J. M. Wang, S. C. Fang, and W. S. Feng, "New efficient designs for XOR and XNOR functions on the transistor level," *IEEE J. Solid-State Circuits*, vol. 29, no. 7, pp. 780–786, Jul. 1994.
- [9] N. Zhuang and H. Wu, "A new design of the CMOS full adder," *IEEE J. Solid-State Circuits*, vol. 27, no. 5, pp. 840–844, May 1992.
- [10] A. Bellaouar and M. I. Elmasry, "Low-power digital VLSI design: circuits and systems," in *Kluwer Reading*. Norwell, MA: Kluwer, 1995.
- [11] M. Shames, J. C. Ebergen, and M. I. Elmasry, "Modeling and comparing CMOS implementations of the C-element," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 6, no. 4, pp. 537–563, Dec. 1998.