# Design of a Novel Architecture for Shift Registers Using Pulsed Latches # M.Kiran Kumar<sup>#1</sup>, Smt.S.Santa Kumari<sup>#2</sup> #1 Student, M-Tech (VLSI), Department of ECE AU College of Engineering, Andhra University, Visakhapatnam-530003 Mobile Number: 9494572599 #2 Associate Professor, Department of ECE AU College of Engineering, Andhra University, Visakhapatnam-530003 Mobile Number: 9290876448 #### **ABSTRACT:** This paper proposes a low-power and area-efficient shift register using digital pulsed latches. The power dissipation and area are reduced by replacing flip-flops with static differential sense amp shared pulse latch (SSASPL). By this we can solve the timing issues between pulsed latches through the use of multiple non-overlap delayed pulsed clock signals instead of the conventional single pulsed clock signal. The shift register uses a small number of the pulsed clock signals by grouping the latches to several sub shifter registers and using additional temporary storage latches. A 32-bit shift register using SSASP latches was designed using a 120nm CMOS process with $V_{DD} = 1.2$ V. The proposed shift register reduces power dissipation and area compared to the conventional shift register with flip-flops. **Keywords:** SSASPL, Area-efficient, Flip-flop, Pulsed clock, Pulsed latch, Shift register. # 1. INTRODUCTION: Flip flops are the basic storage elements used extensively in all kinds of digital designs. As the feature size of CMOS technology process scaled down according to Moore's Law, designers are able to integrate many numbers of transistors onto the same die. The more transistors there will be more switching and more power dissipated in the form of heat or radiation. Heat is one of the phenomenon packaging challenges in this epoch; it is one of the main challenges of low power design methodologies and practices. Another driver of low power research is the reliability of the integrated circuit. More switching implies higher average current is expelled and therefore the probability of reliability issues occurs. We are moving from laptops to tablets and even smaller computing digital systems. With this profound trend continuing and without a match trending in battery life expectancy, the more low power issues will have to be addressed. The current trends will eventually mandate low power design automation on a very large scale to match the trends of power consumption of today's and future integrated chips. Power consumption of Very Large Scale Integrated design is given by generalized relation, $P = cfv^2$ . Since power is proportional to the square of the voltage as per the relation; voltage scaling is the most prominent way to reduce power dissipation. However, voltage scaling is results in threshold voltage scaling which bows to the exponential increase in leakage power. Though several contributions have been made to the art of single edge triggered flip-flops, a need evidently occurs for a design that further improves the performance of single edge triggered flip-flops patterns ## 2. SHIFT REGISTERS: Shift Register is a sequential logic circuit that can be used for the storage or the transfer of data. It has many applications in various fields such as digital filters [1], communication receivers [2], and image processing ICs. Now a days, as the size of the image data increases due to the high demand for high quality image data, the word length of the shifter register increases to process large image data in image processing ICs. The design of a shift register is quite simple. A K-bit shift register is composed of series connected K data flip-flops. The speed of the flip-flops is less important than the area and power consumption because there is no circuit between flip-flops in the shift register. The smallest flip-flop is suitable for the shift register to reduce the area and power dissipation. Now a days, pulsed latches put back flip-flops in many applications, because a pulsed latch is too much smaller than a flip-flop [6]. In this paper we implemented a shift register by using static differential sense amp shared pulse latch (SSASPL) and compared with the shift register which is implemented with Power-PC-style flip-flop (PPCFF) [10]. The SSASP Latch uses 7 transistors, which is the smallest number of transistors among the pulsed latches. The PPCFF uses 16 transistors, which is the smallest number of transistors among the flip-flops. ## 3. DESIGN OF SSASP LATCH: The SSASP Latch in Fig.1, which is the smallest latch, is selected. The original SSASPL with 9 transistors is modified to the SSASPL with 7 transistors in Fig.1 by removing an inverter to generate the complementary data input $(D_b)$ from the data input (D). In the proposed shift register, the differential data inputs (D) and (D) of the latch come from the differential data outputs (D) of the previous latch. The SSASPL uses the smallest number of transistors (D) and it consumes the lowest clock power because it has a single transistor driven by the pulsed clock signal. The SSASPL updates the data with three NMOS transistors and it holds the data with four transistors in two $(M_1 - M_3)$ cross-coupled inverters. It requires two differential data inputs (D) and (D) and a pulsed clock signal. When the pulsed clock signal is high, its data is updated. The node (D) or (D) is pulled down to ground according to the input data (D) and (D) and (D) The pull-down current of the NMOS transistors $(M_1 - M_3)$ must be larger than the pull-up current of the PMOS transistors in the inverters. The SSASPL was implemented and simulated with a $0.12\mu m$ CMOS process at $V_{DD} = 1.2V$ . The sizes (W/L) of the three NMOS transistors $(M_1 - M_3)$ are $1\mu m/0.18\mu m$ . The sizes of the NMOS and PMOS transistors in the two inverters are all $0.5\mu m/0.18\mu m$ . Fig.1 Schematic of the SSASPL #### 4. PROPOSED SHIFT REGISTER: A master-slave flip-flop using two latches in Fig.2(a) can be replaced by a pulsed latch consisting of a latch and a pulsed clock signal in Fig.2(b) [6]. All pulsed latches share the pulse generation circuit for the pulsed clock signal. So, the area and power consumption of the pulsed latch become almost half of those of the master-slave flip-flop. The pulsed latch is an attractive solution for small area and low power dissipation. Fig. 2 (a) Master-slave flip-flop. (b) Pulsed latch. But pulsed latch cannot be used in design of shift registers due to the timing problem, as shown in Fig.3. The shift register in Fig. 3(a) consists of several latches and a pulsed clock signal (CLK\_pulse). The operation waveforms in Fig. 3(b) show the timing problem in the shifter register. The output signal of the first latch (Q1) changes correctly because the input signal of the first latch (IN) is constant during the clock pulse width ( $T_{Pluse}$ ). But the second latch has an uncertain output signal (Q2) because its input signal (Q1) changes during the clock pulse width. Fig. 3. Shift register with latches and a pulsed clock signal. (a) Schematic. ## (b) Waveforms. One solution for the timing problem is to add delay circuits between latches, as shown in Fig. 4(a). The output signal of the latch is delayed ( $T_{dalyed}$ ) and reaches the next latch after the clock pulse. As shown in Fig. 4(b) the output signals of the first and second latches (Q1 and Q2) change during the clock pulse width ( $T_{Pluse}$ ), but the input signals of the second and third latches (D2 and D3) become the same as the output signals of the first and second latches (Q1 and Q2) after the clock pulse. So that, all latches have constant input signals during the clock pulse and there is no timing problem occurs between the latches. But due to delay circuits in between latches may cause large area and power overheads. Fig. 4. Shift register with latches, delay circuits, and a pulsed clock signal. (a) Schematic. (b) Waveforms. Another solution to overcome the timing problem is to use multiple non-overlap delayed pulsed clock signals, as shown in Fig. 5(a). The delayed pulsed clock signals are generated when a pulsed clock signal goes through delay circuits. Each latch uses a pulsed clock signal which is delayed from the pulsed clock signal used in its next latch. So, each latch updates the data after its next latch updates the data. As a result, each latch has a constant input during its clock pulse and no timing problem occurs between latches. However, this solution also requires many delay circuits. Fig. 5. Shift register with latches and delayed pulsed clock signals. (a) Schematic. (b) Waveforms. Fig. 6 shows an example the proposed shift register. The proposed shift register is divided into M sub shifter registers to reduce the number of delayed pulsed clock signals. A 4-bit sub shifter register consists of five latches and it performs shift operations with five non-overlap delayed pulsed clock signals ( $CLK_pulse(1:4)$ and $CLK_pulse(T)$ ). In the 4-bit sub shift register #1, four latches store 4-bit data (Q1-Q4) and the last latch stores 1-bit temporary data (T1) which will be stored in the first latch (Q5) of the 4-bit sub shift register #2. Five non-overlap delayed pulsed clock signals are generated by the delayed pulsed clock generator in Fig. 7. The sequence of the pulsed clock signals is in the opposite order of the five latches. Initially, the pulsed clock signal $CLK_pulse(T)$ updates the latch data T1 from Q4. And then, the pulsed clock signals $CLK_pulse(T)$ update the four latch data from Q4 to Q1 sequentially. The latches Q2-Q4 receive data from their previous latches Q1-Q3 but the first latch Q1 receives data from the input of the shift register (IN). The operations of the other sub shift registers are the same as that of the sub shift register #1 except that the first latch receives data from the temporary storage latch in the previous sub shift register. The proposed shift register reduces the number of delayed pulsed clock signals significantly, but it increases the number of latches because of the additional temporary storage latches. As shown in Fig.7 each pulsed clock signal is generated in a clock-pulse circuit consisting a delay circuit and an AND gate. When an N-bit shift register is divided into K-bit sub shift registers the number of clock-pulse circuits is K + 1 and the number of latches is N + N/K. A K-bit sub shift register consisting of K + 1 latches requires K + 1 pulsed clock signals. The number of sub shift registers (M) becomes N/K, each sub shift register has a temporary storage latch. Therefore N/K, latches are added for the temporary storage latches. Fig. 6. Schematic of proposed shift register Fig. 7. Delayed pulsed clock generator Table I shows the transistor comparison of pulsed latches and flip-flops. The transmission gate pulsed latch (TGPL)[7], hybrid latch flip-flop (HLFF) [8], conditional push-pull pulsed latch (CP3L) [9], Power-PC-style flip-flop (PPCFF)[10], Strong-ARM flip-flop (SAFF) [11], data mapping flip-flop (DMFF) [12],conditional precharge sense-amplifier flip-flop (CPSAFF) [13],conditional capture flip-flop (CCFF) [14],adaptive-coupling flip-flop (ACFF) [15] are compared with the SSASPL [6] used in the proposed shift-register. When counting the total number of transistors in pulsed latches and flip-flops, the transistors for generating the differential clock signals and pulsed clock signals are not included because they are shared in all latches and flip-flops. The SSASPL uses 7 transistors, which is the smallest number of transistors among the pulsed latches [6]–[9]. The PPCFF uses 16 transistors, which is the smallest number of transistors among the flip-flops [10]–[15]. Two 32-bit area-efficient shift registers using the SSASPL and PPCFF were implemented to show the effectiveness of the proposed shift register. Fig. 8 shows the schematic of the PPCFF, which is a typical master-slave flip-flop composed of two latches. Fig. 8. Schematic of the PPCFF The PPCFF consists of 16 transistors and has 8 transistors driven by clock signals. For a fair comparison, it uses the minimum size of transistors. The sizes of NMOS and PMOS transistors are $0.5\mu m/0.18\mu m$ and $1\mu m/0.18\mu m$ respectively. All circuits were implemented with a $0.12\mu m$ CMOS process. Table 1: Transistor comparison of Pulsed Latches and Flip-Flops | | | Total numbers of transistors | Number of transistors connected to clock | |-----------|--------------|------------------------------|------------------------------------------| | | SSASPL | 7 | 1 | | Pulsed | TGPL | 10 | 4 | | latch | HLFF | 14 | 2 | | | CP3L | 26 | 6 | | | <b>PPCFF</b> | 16 | 8 | | Flip-flop | SAFF | 18 | 3 | | | DMFF | 22 | 5 | | | CPSA | 28 | 5 | | | CCFF | 28 | 5 | | | ACFF | 22 | 4 | ## 5. SIMULATION RESULTS: ## 1) SSASP LATCH: # 2) PPC FLIP FLOP: # 3) PROPOSED 32 BIT SHIFT REGISTER USING SSASP LATCH FOR K=4: ## 4) CONVENTIONAL 32 BIT SHIFT REGISTER USING PPC FLIP-FLOP FOR K=4: # A Performance Comparison of Shift Register: From the following table the proposed 32 bit Shift register using SSASP Latch dissipates low power and consumes less area when compared to the conventional 32 bit Shift register using PPC Flip-flop. Table 2: Comparison of 32 bit Shift registers using PPC Flip-flop and SSASP Latch | | 32 bit Shift register using PPC<br>Flip-flop | 32 bit Shift register using SSASP Latch | |------------------------|----------------------------------------------|-----------------------------------------| | ТҮРЕ | FLIPFLOP | PLUSED LATCH | | Power Dissipation [mw] | 58.833 | 0.962 | | Area [μm²] | 15353.7 | 11933.5 | ## 6. CONCLUSION: This paper proposed a low-power and area-efficient shift register using pulsed latches. The shift register reduces area and power dissipation by replacing flip-flops with pulsed latches. Here timing problem between pulsed latches is solved using multiple non- overlap delayed pulsed clock signals instead of a single pulsed clock signal. A small number of the pulsed clock signals is used by grouping the latches to several sub shifter registers and using additional temporary storage latches. Here a 32 bit shift register is implemented by using SSASP Latch for K=4 and it is compared with the 32 bit shift register which is implemented by using PPC Flip-flop for K=4. So, the proposed shift register for K=4 saves area and power compared to the conventional shift register with flip-flops. # **References:** - [1] P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, "New protection techniques against SEUs for moving average filters in a radiation environment," *IEEE Trans. Nucl. Sci.*, vol. 54, no. 4, pp. 957–964, Aug. 2007. - [2] M. Hatamian *et al.*, "Design considerations for gigabit Ethernet 1000 base-T twisted pair transceivers," *Proc. IEEE Custom Integr. Circuits Conf.*, pp. 335–342, 1998. - [3] H. Yamasaki and T. Shibata, "Areal-time image-feature-extraction and vector-generation vlsi employing arrayed-shift-register architecture," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 2046–2053, Sep. 2007. - [4] H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, and G.-H. Cho, "A 10-bit column-driver IC with parasitic-insensitive iterative charge-sharing based capacitor-string interpolation for mobile active-matrix LCDs," *IEEE J. Solid-State Circuits*, vol. 49, no. 3, pp. 766–782, Mar. 2014. - [5] S.-H. W. Chiang and S. Kleinfelder, "Scaling and design of a 16-mega-pixel CMOS image sensor for electron microscopy," in *Proc. IEEE Nucl. Sci. Symp. Conf. Record (NSS/MIC)*, 2009, pp. 1249–1256. - [6] S. Heo, R. Krashinsky, and K. Asanovic, "Activity-sensitive flip-flop and latch selection for reduced energy," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 9, pp. 1060–1064, Sep. 2007. - [7] S. Naffziger and G. Hammond, "The implementation of the next gen-eration 64 bit microprocessor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2002, pp. 276–504. - [8] H. Partovi *et al.*, "Flow-through latch and edge-triggered flip-flop hybrid elements," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pp. 138–139, Feb. 1996. - [9] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, "Conditional push-pull pulsed latch with 726 flop's energy delay product in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 482–483. - [10] V. Stojanovic and V. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems," *IEEE J. Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, Apr. 1999. - [11] J. Montanaroet al., "A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor," IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714, Nov. 1996. - [12] S. Nomura et al., "A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps decoding, 8-core media processor with embedded forwardbody-biasing and power-gating circuit in 65 nm CMOS technology,"in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2008, pp. 262–264. - [13] Y. Uedaet al., "6.33 mW MPEG audio decoding on a multimediaprocessor," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2006, pp. 1636–1637. - [14] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," IEEE J. Solid-State Circuits, vol. 36, pp. 1263–1271, Aug. 2001. - [15] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40 nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2011, pp. 338–339. ## **BIO-DATA OF AUTHORS:** <sup>1</sup>M.Kiran Kumar received his B.E in Electronics and Communication Engineering from ANITS College, Sangivalasa in the year of 2013 and pursuing a Post-graduation in AU college of Engineering (A) in VLSI Specialization. He is interested in research in the area of VLSI to design an efficient IC while reducing the leakage powers. Currently he is doing his thesis work of Masters Technology in Andhra University, Visakhapatnam, under the guidance of Smt.S.Santa Kumari, Associate Professor in AU college of Engineering (A).