# A WIDE PULL-IN RANGE FAST ACQUISITION HARDWARE-SHARING TWO-FOLD CARRIER RECOVERY LOOP

Ching-Chi Chang, Chien-Chih Lin, Muh-Tian Shiue\*, and Chorng-Kuang Wang

Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Taiwan University, Taipei, 106, Taiwan \*Topic Semiconductor Corp., 300 Hsin-Chu, Taiwan, R.O.C. E-mail: ckwang@cc.ee.ntu.edu.tw

### ABSTRACT

This paper proposes a two-fold carrier recovery loop that possesses  $\pm 25000$ -ppm pull-in range and 7-ms acquisition time for 64-QAM blind adaptive system. The carrier recovery system contains a prior wide-band loop to acquire a coarse carrier frequency and a posterior narrow-band loop to achieve -82dBc jitter suppression. It can be applied to a 4.035-MHz low-IF cable modem system with the  $\pm 100$ -kHz frequency offset tolerance requirement. The two-fold carrier recovery loop operates in consecutive three stages, which are Costas carrier phase estimation, DDML carrier phase estimation, and DD-MMSE carrier phase estimation. The proposed architecture is hardware efficient since the three-staged operation shares most of the circuit functions.

### 1. INTRODUCTION

After down-conversion from the analog front-end stage, some carrier offset and phase error may occur to the received signal. The carrier phase error will degrade the information signal power and therefore, deteriorate the signal-to-noise ratio (SNR). In a coherent broadband communication system, a carrier of accurate frequency and phase is essential to perform demodulation. It is then imperative to extract the frequency and the phase of the carrier from the incoming signal. The carrier recovery circuit is employed to recovery the carrier. In a blind 64QAM cable modem receiver, it requires fast frequency/phase acquisition of the incoming data stream and low jitter in the steady-state tracking mode.

In a QAM transmission system, the carrier suppressed modulation is employed to avoid additional transmit power in a pilotsignal-added system, such as the ATSC (Advance Television System Committee) VSB transmission system. In such a scenario, the data-aided CR loops are often used [1]. However, referring to the architecture as shown in Fig.1, a long FIR filter may be required for the QAM signal equalization. The loop will be too lengthy if the adding data are taken from the equalizer and the retrieved carrier is fed back a long way to demodulator. Not to cause instability for the equalizer, the bandwidth of CR is usually reduced. Hence, The frequency offset tolerance is only a few kilo-hertz in such a long loop PLL architecture [2] and the acquisition time is longer than hundreds of mili-seconds [1]. This paper proposes a two-fold carrier recovery loop that has a wide locking range and small jitter performance for carrier retrieval. Three different carrier recovery algorithms, namely nondecision-directive maximal likelihood, decision-directive maximal likelihood, and decisiondirective minimum-mean-square-error estimation techniques, can



Figure 1: The conventional carrier recovery loop with adaptive equalizer situated in the forward signal path

be implemented in this loop, as shown in Fig.7. Section II shows the two-fold carrier recovery loop architecture that contains a prior wide-band CR loop before the equalization and a posterior narrowband CR loop with data-aided estimation. The double loops can be further derived to share the same hardware, which is presented in Section III. Section IV shows the simulation results and conclusions are drawn in Section V.

### 2. TWO-FOLD CARRIER RECOVERY LOOPS

Before designing the carrier recovery system, practical considerations must be taken to achieve wide lock-in range and fast acquisition in the acquisition state and good jitter performance in the steady state. PLL with narrow bandwidth is often used to avoid fluctuations and stabilize the adaptive equalizer situated in the forward signal path, as shown in Fig.1. To widen the lock-in range and accelerate acquisition process, some auxiliary techniques have been brought up to work with PLL loop, such as sweeping pull-in technique, frequency-locked-loop (FLL) [2], and nonlinear circuit assisting [3], dual-loop carrier recovery (DCR) [4]. Nevertheless, these methods are often either complicated or have low efficiency in hardware implementation.

This paper proposes a two-fold carrier recovery loop that can achieve wide pull-in range, good jitter performance, and high hardware efficiency. The prior loop contains a modified Costas Loop and the posterior loop contains both decision-directed maximallikelihood (DDML) carrier phase estimation and decision-directed minimum-mean-square-error (DD-MMSE) carrier phase estimation. The architecture of the proposed CR loop is shown in Fig.2 and further discussed in the following.



Figure 2: The proposed two-fold carrier recovery loop cooperating with channel equalization

# 2.1. Prior CR Loop

The prior CR loop is employed to retrieve a coarse carrier frequency and set the central frequency for the posterior CR loop. With the adaptive equalizer precluded from the loop to speed up the rough carrier acquiring, the carrier phase estimation is not dataaided. In such a case, Costas Loop is usually used to estimate the carrier phase of double sided-band (DSB) signal [5]. The estimated phase error can be derived as

$$\hat{\phi}_{ML} = -Im\{I_n^*y(t)\}\tag{1}$$

where  $y(t) = \left[\sum_{n} I_n h(t - nT) + N(t)\right] e^{-j\phi}$  is the received equivalent lowpass signal.  $I_n = I_i(n) + jI_q(n)$  is the symbol sequence bearing information data, and h(t) and N(t) are equivalent channel response and additive noise, respectively. For high order modulation systems such as cable modem transmission, the multiplication operation of the term,  $I_n^*y(t)$ , may raise the hardware cost in digital circuit implementation. To eliminate the effect and promote the hardware efficiency, the modified phase error estimation in (2) is used [6].

$$\phi_{ML} = -Im\{sgn(I_n^*) \cdot y(t)\}$$
<sup>(2)</sup>

The error signal can then be expressed as

$$e(t) = y_l s(t) \cdot sgn(y_l c(t)) - y_l c(t) \cdot sgn(y_l s(t))$$
  

$$\cong y_l s(t) \cdot sgn(I_i) - y_l c(t) \cdot sgn(I_q)$$
  

$$= \frac{1}{2} (|I_i| + |I_q|) \sin \Delta \phi + \cdots$$
(3)

However, the modified Costas loop, as shown in Fig.3, may bring more noise. Preceded the proportional and integral (PI) loop filter, an IIR pre-filter is added to suppress the noise. This pre-filter is of 5 to 10 times wider bandwidth than the the loop filter, and the close loop architecture forms a second order PLL. This can be shown in Fig.3. Due to the non-decision-directed characteristics,



Figure 3: The modified Costas prior carrier recovery loop

this CR loop is independent of the adaptive equalizer and can be optimized independently.

#### 2.2. Posterior CR Loop

Due to the wide bandwidth of the prior CR loop, both Gaussian noise and pattern-dependent noise may cause serious problems. Moreover, without the adaptive equalizer lodged in the loop, the ISI can further deteriorate the acquired carrier. Accordingly, another posterior CR loop is employed to refine the carrier signal with the aid of the retrieved DC value from the prior CR loop as a central frequency.

The posterior CR loop extracts the carrier phase error from the equalized output and performs decision-directed estimation. This CR loop can be further partitioned into two stages. It assumes AWGN channel without ISI and performs data-directed maximal-likelihood estimation in advance to widen the pull-in range. It will alternate to data-directed minimum-mean-square-error estimation to get rid off the biased estimate caused by ISI.

Assuming the information sequence  $\{I_n\}$  over the observation interval has been decided correctly, the posterior CR loop performs decision-directed parameter estimation for carrier phase retrieval. Based on the ML estimation criterion, the estimated carrier phase can be written as

$$\phi_{ML}^{*} = -\tan^{-1} \left[ Im\{I_{n}^{*}y_{eq}(t)\} / Re\{I_{n}^{*}y_{eq}(t)\} \right]$$
  
=  $-Im\{I_{n}^{*} \cdot y_{eq}(t)\}$  (4)

where  $y_{eq}$  is the equalized received data. The digital multiplier can be avoided by using the sign bit of the slicer output only for the same reason as (2). The SNR will still be degraded and an IIR pre-filter is called for to improve the carrier jitter performance. Fig.4 illustrates the DDML estimation algorithm.



Figure 4: The DDML posterior carrier recovery loop

With the existence of ISI in the HFC channel, the estimate may contain some bias component even if it is the optimal estimate value in ML sense. Following the DDML-CPE, a carrier phase estimation based on minimum-mean-square-error (MMSE) criterion is adopted [6]. The cost function of MMSE is

$$P(\phi) = E\left\{ \left| \left[ \sum_{n=-\infty}^{\infty} I_n h(t - nT) + N(t) \right] e^{j\phi} - I_n \right|^2 \right\}$$
(5)

and the carrier phase can be extracted as

$$\phi_{MMSE} = E\{[y_{eq}(t) - I_n]^* y_{eq}(t)\}$$
(6)

To simplify the hardware structure, only the sign bit of y(t) is used, and the MMSE estimate can be modified as

$$\hat{\phi}_{MMSE} = E\{[y_{\epsilon q}(t) - I_n]^* \cdot sgn[y_{\epsilon q}(t)]\}$$
(7)

An IIR pre-filter is still used to overcome the SNR reduction. Fig.5 shows the block diagram of this estimation process.



Figure 5: The DDMMSE posterior carrier recovery loop

#### 2.3. The Co-operation of the Two-Fold Carrier Recovery Loop

Cooperated with the adaptive equalizer, the proposed two-fold carrier recovery loop operates as Fig.6 shows. To conduct the equalization for high order QAM system, the multi-stage LMS-based blind equalizer is adopted [6]. In Fig.6(a), the prior CR loop starts at time A with the central frequency  $\omega_0$  when the high order QAM constellation is treated to be QPSK accompanied by large additive noise. Based on non-decision-directed ML algorithm, the modified Costas loop will mend the frequency offset roughly between the transmitter and the receiver. At time B, the two-fold CR loop will switch to the posterior CR state when the adaptive equalizer converges roughly. Meanwhile, the blind adaptive equalizer is initialized and the DC component of prior CR loop output,  $\omega_{dc}$ , is extracted and adds to the center frequency of the posterior CR NCO. The posterior CR begins with center frequency at  $\omega_0 + \omega_{dc}$  and the tracking procedure is shown in Fig.6(b). In the tracking state, the equalizer operates in an ISI-affected converged mode and the CR performs decision-directed ML phase estimation. At time  $C_{\star}$ the equalizer fully converges and the CR is further switched to a decision-directed MMSE phase estimation that takes advantage of the equalized signal to provide low jitter performance in the steady state.



Figure 6: The frequency acquiring procedure of the two-fold carrier recovery loop (a) the operation of the prior CR loop (b) the operation of the posterior CR loop

### 3. VLSI IMPLEMENTATION

As shown in Fig.3, 4,and 5, these loops are composed of a phase detector, a pre-filter, a loop filter, and an NCO. Based on the intentional similarity of the three stages in the two-fold CR loop, a

judicious method can be made to share the hardware for the different algorithms. Fig.7 shows the complete two-fold CR loop architecture with shaping filter and equalizer located in the signal path. The main blocks, including a phase detector, an IIR pre-filter, a PI loop filter, and an NCO, are shared by prior CR loop and posterior CR loop. The hardware-sharing carrier recovery signal processing



Figure 7: The complete two-fold CR loop with shaping-filter and equalizer located in the signal path

is further illustrated to the circuit level, as shown in Fig.8. Corresponding to (2), (4), and (7), the phase detector receives input signal from one of the three sources, namely, the IIR LPF output, the equalizer output, or the difference between the equalizer output and the slicer. The phase detector then performs non-decisiondirected ML detection, DDML detection, or DD-MMSE detection in different stages.

Operating in different sampling frequencies in each period, the phase detector outputs the error signal that is either stored in a register or fed directly to next stage. At the prior CR stage, the circuit operates at four-times oversampling frequency, i.e., 21.52-MHz, to speed up the carrier acquisition process. At this high frequency, a storage element,  $Z^{-1}$ , is needed to latch the phase detector output lest the following combinational logic circuits should receive wrong data. Since the pulse shaping filter down-samples the signal by a factor of two and the feed-forward equalizer (*FFE*) is fractionally spaced and further down-samples the signal to the symbol rate, the data-aided CR loops receive signal and operate at the 5.38-MHz symbol rate. It follows that the storage element is not required at this state. Therefore, a MUX is used to switch the following pre-filter input either from the storage element or directly from the phase detector.

The critical step for hardware-sharing is to multiplex the filter coefficients and the input signals in different stages. Corresponding to different sampling frequencies, the coefficients for the prefilter are different in prior CR loop and posterior CR loop. The different IIR pre-filters can share the hardware by switching the coefficients of the multiplier using a state control signal. On the other hand, the following PI loop filter circuit cost is reduced by approximating the multiplier to powers of 2 numbers and replacing the multiplication operation by bit shifting. Parallel shifters together with a MUX are used to perform the Gear shifting operation for altering the filter bandwidth toward optimal values. The NCO control signal exported from the loop filter is either stored or fed directly to NCO depending on the sampling frequency then. In the first nondecision-directed ML period, the signal is latched at the end of the modified Costas loop cycle and provides  $\omega_{dc}$  for the free running frequency of the following loop cycles. Afterwards, the accumulating control signal maps to a recovered carrier



Figure 8: Open loop of the hardware-sharing CR loop

for demodulation in NCO ROM.

# 4. SIMULATION RESULT

The tolerance for the carrier offset in a cable modem system required is  $\pm 100$ -kHz; i.e., the 4.035-MHz low-IF receiver carrier recovery demands more than  $\pm 2.5\%$  locking range. Using Verilog RTL code with 250,000 symbols, the simulation can fulfill this large frequency tolerance requirement, as shown in Fig.9. As



Figure 9: Simulation result of the two-fold CR loop frequency acquiring behavior

soon as the two-fold CR loop enters the steady state, the carrier jitter, as shown in Fig.10. can be reduced to -82dBc below. The acquisition time is 7ms, which is quite fast for a blind adaptive system [1].



Figure 10: Jitter performance of the two-fold CR loop

# 5. CONCLUSION

To compensate the latent frequency offset in a broad-band cable modem system, the carrier recovery needs to possess large frequency offset compensation ability while maintain good jitter performance and affordable hardware cost. Using blind equalization technique, the conventional data-aided carrier recovery loop is consisted of a long latency signal path. The proposed two-fold carrier recovery loop, which is composed of a prior Costas loop and a posterior DD-ML/DD-MMSE loop, operates in a three stages sequence and shares the identical hardware. The pull-in range may be as large as  $\pm 25000$ -ppm and the jitter can be suppressed below -82dBc. The acquisition time is 7ms, which is faster than the required system [1].

#### 6. REFERENCES

- K.J. Kerpez, "A comparison of QAM and VSB for hybrid fiber/coas digital transmission". *IEEE Trans. Broadcasting*, Vol.41, No.1, pp.9-16, March 1995
- [2] Gary Sgrignoli, Wayne Bretl, and Richard Citta, "VSB Modulation Used for Terrestrial and Cable Broadcasts", *IEEE Trans. on Consumer Electronics*, Vol.41, No.3, pp.367-382, Aug., 1995.
- [3] Shigeki Saito and Hiroshi Suzuki, "Performance of QPSK Coherent Detection with Dual-mode Carrier Recovery Circuit for Fast and Stable Carrier Tracking", *IEEE Intel. Conf. on Comm.*, 1988, pp.735-741.
- [4] Beomsup Kim, "Optimal MMSE Gear-shifting Algorithm for the Fast Synchronization of DPLL", *IEEE Intel. Conf. on Comm.*, 1993, pp.172-175.
- [5] J.P. Costas, "Synchronous Communications", Pro. IRE, Vol.44, pp.1713-1718, Dec. 1956.
- [6] M.T. Shie, "Transceiver VLSI Design for High Speed Local Access Modems", ph.D.'s dissertation, National Central University, 1999.