# A comparative study of CMOS circuit design styles for low-power high-speed VLSI circuits ## L. BISDOUNIS†, D. GOUVETAS†, and O. KOUFOPAVLOU† An important issue in the design of VLSI circuits is the choice of the basic circuit approach and topology for implementing various logic and arithmetic functions. In this paper, several static and dynamic CMOS circuit design styles are evaluated in terms of area, propagation delay and power dissipation. The different design styles are compared by performing detailed transistor-level simulations on a benchmark circuit using HSPICE, and analysing the results in a statistical way. Based on the results of our analysis, some of the trade-offs that are possible during the design phase in order to improve the circuit power-delay product are identified. #### 1. Introduction Much of the research effort of the past years in the area of digital electronics has been directed towards increasing the speed of digital systems. Recently, the requirement of portability and the moderate improvement in battery performance indicate that power dissipation is one of the most critical design parameters (Chandrakasan and Brodersen 1995). The three most widely accepted metrics to measure the quality of a circuit or to compare various circuit styles are area, delay and power dissipation. Portability imposes a strict limitation on power dissipation while still demanding high computational speeds. Hence, in recent VLSI systems the power–delay product becomes the most essential metric of performance. The reduction of the power dissipation and the improvement of the speed require optimizations at all levels of the design procedure. In this paper, the proper circuit style and methodology is considered. Since most digital circuitry is composed of simple and/or complex gates, we study the best way to implement logic and arithmetic functions in order to achieve low power dissipation and high speed. Several circuit design techniques are compared in order to find their efficiency in terms of speed and power dissipation. A review of the existing CMOS circuit design styles is given, describing their advantages and their limitations. Furthermore, a four-bit ripple carry adder for use as a benchmark circuit was designed in a full-custom manner by using the different design styles, and detailed transistor-level simulations using HSPICE (Meta-Software 1996) were performed. Then, a statistical approach based on the produced power and delay measurements is followed, in order to compare the designs. Conventional static CMOS has been a technique of choice in most processor designs. Alternatively, static pass transistor circuits have also been suggested for low-power applications (Yano *et al.* 1996). Dynamic circuits, when clocked carefully, can also be used in low-power, high speed systems (MIPS Technologies 1994). However, several other design techniques need to be applied and evaluated along with these Received 30 June 1997; accepted 28 August 1997 <sup>†</sup> VLSI Design Laboratory, Department of Electrical & Computer Engineering, University of Patras, 26500 Patras, Greece. circuit styles in order to improve the speed and reduce the power dissipation of VLSI systems. In this paper we study eight different CMOS logic styles: - conventional static CMOS—CSL; - complementary pass-transistor—*CPL* (Yano *et al.* 1990); - double pass-transistor—*DPL* (Suzuki *et al.* 1993); - static and dynamic differential cascode voltage switch—*DCVSL* (Heller *et al.* 1984, Chu and Pulfrey 1986), - static differential split-level—SDSL (Pfennings et al. 1985); - dual-rail domino—*DRDL* (Krambeck *et al.* 1982, Oklobdzija and Montoye 1986); and - enable/disabled CMOS differential—*ECDL* (Lu 1988). The rest of the paper is structured as follows. In section 2, an overview of the CMOS logic styles describing their characteristics, is given. The different styles are compared in terms of speed, power dissipation and silicon area, in section 3. Also, the power–delay product of the designs is considered, due to the importance of this metric in modern VLSI applications. The comparison is based on power dissipation and delay measurements, which are analysed by using a statistical method. Finally, we conclude in section 4. ### 2. CMOS circuit design styles Since the objective is to investigate the trade-offs that are possible at the circuit level in order to reduce power dissipation while maintaining the overall system throughput, we must first study the parameters that affect the power dissipation and the speed of a circuit. It is well known that one of the major advantage of CMOS circuits over single polarity MOS circuits, is that the static power dissipation is very small and limited to leakage. However, in some cases, such as bias circuitry and pseudo-nMOS logic, static power is dissipated. Considering that in CMOS circuits the leakage current between the diffusion regions and the substrate is negligible, the two major sources of power dissipation are the switching and the short-circuit power dissipation (Chandrakasan and Brodersen 1995) $$P = p_f C_L V_{\rm DD}^2 f + I_{\rm sc} V_{\rm DD} \tag{1}$$ where $p_{\rm f}$ is the node transition activity factor, $C_{\rm L}$ is the load capacitance, $V_{\rm DD}$ is the supply voltage, and f is the switching frequency. $I_{\rm sc}$ is the current which arises when a direct path from power supply to ground is caused, for a short period of time during low to high or high to low node transitions (Bisdounis *et al.* 1996). The switching component of power arises when energy is drawn from the power supply to charge parasitic capacitors. It is the dominant power component in a well designed circuit and it can be lowered by reducing one or more of $p_{\rm f}$ , $C_{\rm L}$ , $V_{\rm DD}$ and f, while retaining the required speed and functionality. Even though the exact analysis of circuit delay is quite complex, a simple first-order derivation can be used (Bellaouar and Elmasry 1995, Sakurai and Newton 1990) in order to show its dependency of the circuit parameters $$T_{\rm d} \propto \frac{C_{\rm L} V_{\rm DD}}{K (V_{\rm DD} - V_{\rm TH})^{\alpha}} \tag{2}$$ where K depends on the transistors' aspect ratio (W/L) and other device parameters, $V_{\rm TH}$ is the transistor threshold voltage, and $\alpha$ is the velocity saturation index which varies between 1 and 2 ( $\alpha$ is equal to 1.4 for the 1.5 $\mu$ m process technology that is used in the experiments of the next section). Since a quadratic improvement in power dissipation may be obtained by lowering the supply voltage (equation (1)), many researchers have investigated the effects of lowering the supply voltage in VLSI circuits. Unfortunately, reducing the supply voltage reduces power, but the delay increases (equation (2)) with the effect being more drastic at voltages close to the threshold voltage (Sun and Tsui 1995). Equations (1) and (2) indicate that by reducing the node parasitic capacitance in a CMOS circuit, the power dissipation is reduced and the circuit speed is increased. In the following, the circuit design styles are described using the full adder circuit, which is the most commonly used cell in arithmetic units. Also, their characteristics in terms of power dissipation and delay are investigated. ## 2.1. Conventional static CMOS logic—CSL Conventional static CMOS logic is used in most chip designs in the recent VLSI applications. The schematic diagram of a conventional static CMOS full adder cell is illustrated in figure 1. The signals noted with '- ' are the complementary signals. The pMOSFET network of each stage is the dual network of the nMOSFET one. In order to obtain a reasonable conducting current to drive capacitive loads the width of the transistors must be increased. This results in increased input capacitance and therefore high power dissipation and propagation delay. # 2.2. Complementary pass-transistor logic—CPL The main concept behind CPL (Yano et al. 1990) is the use of only an nMOSFET network for the implementation of logic functions. This results in low input capacitance and high speed operation. The schematic diagram of the CPL full adder circuit Figure 1. Conventional static CMOS full adder. Figure 2. Complementary pass-transistor full adder. is shown in figure 2. Because the high voltage level of the pass-transistor outputs is lower than the supply voltage level by the threshold voltage of the pass transistors, the signals have to be amplified by using CMOS inverters at the outputs. CPL circuits consume less power than conventional static circuits because the logic swing of the pass transistor outputs is smaller than the supply voltage level. The switching power dissipated from charging or discharging the pass transistor outputs is given by $$P_{\rm D} = V_{\rm DD} V_{\rm swing} C_{\rm node} f \tag{3}$$ where $V_{\rm swing} = V_{\rm DD}$ - $V_{\rm THn}$ . In the case of conventional static CMOS circuits the voltage swing at the output nodes is equal to the supply voltage, resulting in higher power dissipation. To minimize the static current due to the incomplete turn-off of the pMOSFET in the output inverters, a weak pMOSFET feedback device can also be added in the CPL circuits of figure 2, in order to pull the pass-transistor outputs to full supply voltage level. However, this will increase the output node capacitance, leading to higher switching power dissipation and higher propagation delay. # 2.3. Double pass-transistor logic—DPL DPL (Suzuki *et al.* 1993) is a modified version of CPL. The circuit diagram of the DPL full adder is given in figure 3. In DPL circuits full-swing operation is achieved by simply adding pMOSFET transistors in parallel with the nMOSFET transistors. Hence, the problems of noise margin and speed degradation at reduced supply voltages, which are caused in CPL circuits due to the reduced high voltage level, are avoided. However, the addition of pMOSFETs results in increased input capacitances. ## 2.4. Static differential cascode voltage switch logic—SDCVSL Static DCVSL (Heller et al. 1984), is a differential style of logic requiring both true and complementary signals to be routed to gates. Figure 4 shows the circuit Figure 3. Double pass-transistor full adder. diagram of the static DCVSL full adder. Two complementary nMOSFET switching trees are connected to cross-coupled pMOSFET transistors. Depending on the differential inputs one of the outputs is pulled down by the corresponding nMOSFET network. The differential output is then latched by the cross-coupled pMOSFET transistors. Since the inputs drive only the nMOSFET transistors of the switching trees, the input capacitance is typically two or three times smaller than that of the conventional static CMOS logic. Figure 4. Static differential cascode voltage switch full adder. Figure 5. Static differential split-level full adder. ## 2.5. Static differential split-level logic—SDSL A variation of the differential logic described above is the Static DSL (Pfennings et al. 1985). The SDSL full adder circuit diagram is illustrated in figure 5. Two nMOSFET transistors with their gates connected to a reference voltage ( $V_{\rm ref} = (V_{\rm dd}/2) + V_{\rm THn}$ , $V_{\rm THn}$ : nMOSFET threshold voltage) are added to reduce the logic swing at the output nodes. The output nodes are clamped at half of the supply voltage level. Thus, the circuit operation becomes faster than standard DCVSL circuits. However, due to the incomplete turn-off of the cross-coupled pMOSFET transistors, SDSL circuits dissipate high static power dissipation. Also, the addition of two extra nMOSFET transistors per gate results in area overhead. #### 2.6. Dual-rail domino logic—DRDL Dual-rail domino logic (Krambeck *et al.* 1982, Oklobdzija and Montoye 1986) is a precharged circuit technique which is used to improve the speed of CMOS circuits. Figure 6 shows a dual-rail domino full adder cell. A domino gate consists of a dynamic CMOS circuit followed by a static CMOS buffer. The dynamic circuit consists of a pMOSFET precharge transistor and an nMOSFET evaluation transistor with the clock signal (CLK) applied to their gate nodes, and an nMOSFET logic block which implements the required logic function. During the precharge phase (CLK = 0) the output node of the dynamic circuit is charged through the precharged pMOSFET transistor to the supply voltage level. The output of the static buffer is discharged to ground. During the evaluation phase (CLK = 1) the evaluation nMOSFET transistor is *ON*, and depending on the logic performed by the nMOSFET logic block, the output of the dynamic circuit is either discharged or it Figure 6. Dual-rail domino full adder will stay precharged. Since in dynamic logic every output node must be precharged every clock cycle, some nodes are precharged only to be immediately discharged again as the node is evaluated, leading to higher switching power dissipation (Chandrakasan and Brodersen 1995). One major advantage of the dynamic, precharged design styles over the static styles is that they eliminate the spurious transitions and the corresponding power dissipation. Also, dynamic logic does not suffer from short-circuit currents which flow in static circuits when a direct path from power supply to ground is caused. However, in dynamic circuits, additional power is dissipated by the distribution network and the drivers of the clock signal. # 2.7. Dynamic differential cascode voltage switch logic—DDCVSL Dynamic DCVSL (Chu and Pulfrey 1986), is a combination between the domino logic and the static DCVSL. The circuit diagram of the dynamic DCVSL full adder is given in figure 7. The advantage of this style over domino logic is the ability to generate any logic function. Domino logic can only generate non-inverted forms of logic. For example, in the design of a ripple carry adder, two cells must be designed for the carry propagation, one for the true carry signal and another for the complementary one (in figure 6 only the cell for the true carry signal is shown, but the one for the complementary signal is also required). Using DCVSL to design dynamic circuits will eliminate p-logic gates because of the inherent availability of complementary signals. The p-logic gates usually cause long delay times and consume large areas. # 2.8. Enable/disabled CMOS differential logic—ECDL ECDL (Lu 1988) is a self-timed differential logic which is used in the case of implementing logic functions using iterative networks. It uses extra signals to indicate the beginning and ending of a function evaluation, in order to improve the Figure 7. Dynamic differential cascode voltage switch full adder. circuit speed. The structure of the ECDL full adder is illustrated in figure 8. The signals $Done_{i-1}$ and $Done_i$ are the input and output self-timing control signals. During the disabled state, $Done_{i-1}$ has a value of logic one, which discharges both the true and the complementary outputs to logic zero. During the enabled state, $Done_{i-1}$ changes to logic zero and the topmost pMOSFET transistor (figure 8) is ON to provide power to the inverters below. Then, depending on the logic of the Figure 8. Enable/disable CMOS differential full adder. differential nMOSFET network, a path exists from one of the output nodes to ground, holding that node to ground while leaving the other output node to be driven to logic one. One major advantage of the ECDL circuits is that there is no minimum clocking frequency requirement. However, ECDL circuits suffer from extra power dissipation due to the inverters which are needed to change the polarity of the output nodes. Also, their complex pull-up circuitry occupies extra silicon area. ## 3. Power, delay and area comparisons of design styles The experimental results described in this section were obtained using a four-bit ripple carry adder. A general block diagram of the adder is illustrated in figure 9. The circuit was designed in a full custom manner for all the design styles described in the previous section, using a 1.5 µm CMOS process technology. The channel width of the transistors was 4.8 µm for the nMOSFETs, and 9.6 µm for the pMOSFETs. The design was based on the full adder cells presented in figures 1 to 8. Figure 10 shows the layout of the conventional static four-bit ripple carry adder, as an example of the designed circuits. In table 1 the adder silicon area and the number of the transistors for each design style are given. Although no extensive attempts were made to minimize area, the numbers presented are a good indication of the relative areas of the eight adder implementations, which account not only for the transistors, but for the interconnections as well. For example, even though the DPL adder has fewer transistors than the CSL one, it has longer interconnections, which is reflected by its large Figure 9. Block diagram of the four-bit ripple carry adder. Figure 10. Layout of the conventional static four-bit ripple carry adder. | Design style | Adder area ( $\times 10^4 \mu m^2$ ) | No. of transistors | |--------------|---------------------------------------|--------------------| | CSL | 5.42 | 144 | | CPL | 4.46 | 88 | | DPL | 6.52 | 136 | | SDCVSL | 5.19 | 114 | | SDSL | 6.39 | 130 | | DRDL | 6.48 | 146 | | DDCVSL | 7.22 | 154 | | ECDL | 7.65 | 166 | Table 1. Area and number of transistors of the four-bit ripple carry adder implementations. area. Dynamic design styles and styles which use control signals (such as ECDL) occupy extra area for the routeing of the clock and the control signals. The smallest area is occupied by the CPL circuit, which has fewer transistors and shorter interconnections than the other adder implementations. After the design of the layouts, circuit equivalents were extracted for a detailed circuit simulation using HSPICE (Meta-Software 1996) to obtain the power and delay measurements. In our experiments, a supply voltage of 5V is used. All measurements were obtained with each input supplied through a driver consisting of two minimum-sized inverters in series, and each output node driving a minimum-sized inverter load. The estimation of power dissipation is a difficult problem because of its data dependency, and has received a lot of attention (Najm 1994). Some direct simulative power estimation methods have been proposed (Kang 1986, Yacoub and Ku 1989), which are expensive in terms of time. Also, several power estimation methods have been proposed, where possibilities are used to solve the pattern-dependence problem. However, in order to achieve good accuracy, the spatial and temporal correlations between internal nodes should be modelled (Devadas *et al.* 1992, Schneider *et al.* 1996). An alternative technique is the use of statistical methods (Burch *et al.* 1993, Xakellis and Najm 1994), which combine the accuracy of simulation-based techniques with the speed of probabilistic approaches. In this paper, the statistical approach proposed by Burch *et al.* (1993) is used in order to estimate the power dissipation of our designs. Using the powermeter subcircuit proposed by Kang (1986), HSPICE can measure the average power consumed by a circuit given a set of input transitions and a time interval. In the method, the inputs are randomly generated and statistical mean estimation techniques are used to determine the final result. In our case, for each adder design we use 200 independent, pseudorandom input transition samples, and the power consumed for each sample is monitored by HSPICE. All simulations were carried out at 27°C, with an input frequency of 50 MHz in order to accommodate the slowest adder. The power dissipation measures do not include the power consumed by the drivers and the loads. In figure 11, the probability distributions of the power dissipation per addition derived from the measurements, for the eight adder implementations, are shown. Since the data inputs are independent, power can be approximated to be normally distributed (Burch *et al.* 1993). This conclusion can also be extracted from the curves of figure 11. Hence, the mean power dissipation is given by $$\bar{P} \pm t_{\alpha/2} \frac{s}{\sqrt{N}} \tag{4}$$ Figure 11. Power dissipation histograms. where $\bar{P}$ is the sample average, s is the standard deviation, N is the number of samples, and $t_{\alpha/2}$ is obtained from the t-distribution for a $(1 - \alpha)\%$ confidence interval (Miller et al. 1990). The mean power dissipation of the eight adder implementations using the simulation results and equation (4) is given in table 2. The number of the required samples is extracted using the stopping criterion (Burch *et al.* 1993) of the above method $$\frac{t_{\alpha/2}s}{P\sqrt{N}} < e \tag{5}$$ | Adder design style | Mean power dissipation per addition (mW) | Statistical error (%) | Worst case delay (nsec) | Mean power-delay product per addition (pJ) | |--------------------|------------------------------------------|-----------------------|-------------------------|--------------------------------------------| | CSL | $0.422 \pm 0.0302$ | 6.1 | 6.125 | $2.585 \pm 0.1850$ | | CPL | $0.238 \pm 0.0208$ | 4.8 | 4.042 | $0.962 \pm 0.0841$ | | DPL | $0.305 \pm 0.0263$ | 6.9 | 3.345 | $1.020 \pm 0.0879$ | | SDCVSL | $0.432 \pm 0.0362$ | 6.5 | 7.986 | $3.450 \pm 0.2891$ | | SDSL | $2.383 \pm 0.0129$ | 0.6 | 4.606 | $10.976 \pm 0.0594$ | | DRDL | $0.641 \pm 0.0091$ | 1.4 | 2.909 | $1.865 \pm 0.0265$ | | DDCVSL | $0.957 \pm 0.0074$ | 0.8 | 3.453 | $3.304 \pm 0.0255$ | | ECDL | $1.721 \pm 0.0096$ | 0.6 | 2.892 | $4.977 \pm 0.0278$ | Table 2. Power dissipation, delay and power-delay product of the four-bit ripple carry adder implementations. where e is the desired percentage error in the power estimate. The error in our statistical power analysis for N = 200 and 95% confidence interval ( $t_{\alpha/2} = 1.96$ ) is less than 7%. In table 2, the percentage error for each adder design is also given. For the four last designs the error is quite small because of the high normality of their distributions, which leads to small standard deviation. The delay of each design was measured directly from the output waveforms generated by simulating the adder using HSPICE for the worst case inputs, that is, inputs which cause the carry to ripple from the least significant bit position to the most significant bit position. The worst case delays of the eight adder designs are listed in the fourth column of table 2. As mentioned in section 1, the most essential metric of performance in modern VLSI applications is the power-delay product. By multiplying each power measurement with the worst case delay, we found the mean power-delay product of the designs using a method similar to that used for the mean power dissipation. Hence, the mean power-delay product is given by $$\overline{P \times D} \pm t_{\alpha/2} \frac{s}{\sqrt{N}} \tag{6}$$ where $\overline{P \times D}$ is the sample average power-delay product. The mean power-delay product values of the eight adder designs are listed in table 2, and the probability distributions of the power-delay product are shown in figure 12. As we can see in the probability distributions of figure 11, the curves of the dynamic designs (DRDL and DDCVSL) are shifted to the right, because of the power dissipated due to the precharge cycles. The same phenomenon occurs in the ECDL adder due to the power dissipation of its disabled state. The shifting to Figure 12. Power-delay product histograms. the right of the SDSL adder curve is caused because of the high static power which is dissipated due to the incomplete turn-off of the cross-coupled pMOSFET transistors. The other static design styles are more power efficient compared to the dynamic circuits. The static DCVSL circuit consumes more power than the conventional static circuit due to the difference of the charging and discharging times of its output nodes. The asymmetry in the rise and fall times of the potential at these output nodes will prolong the period of current flow through the latch during the transient state, thus increasing the power dissipation. It can be obtained from the results of table 2, that the dynamic circuits exhibit an increase in speed compared with the conventional static circuit. Comparing the dynamic logic styles, domino logic has better power–delay product characteristics (figure 12). The circuit operation in the SDSL circuit becomes faster than the standard SDCVSL circuit, due to the reduced logic swing at the output nodes, but at the cost of high static power dissipation. The ECDL circuit is the faster one, but consumes high switching power due to the inverters, which are needed to change the polarity of the outputs. The design styles which use pass-transistor logic (CPL and DPL) are the best in terms of power dissipation. The CPL circuit consumes lower power than the DPL one, because of its lower parasitic capacitance. On the contrary, the DPL circuit is faster than the CPL, because the addition of pMOSFET transistors in parallel with the nMOSFET transistors results in higher circuit drivability. Also, DPL avoids the problems of noise margin and speed degradation at reduced supply voltages which are caused in CPL circuits. As shown in figure 12 and in table 2, the two styles exhibit similar power–delay product characteristics, and they are the most efficient for low-power and high-speed applications. The mean power dissipation and the propagation delay values of the eight adder implementations are summarized in figure 13. The fast adder circuits lie to the left of Figure 13. Power dissipation versus delay of the adder implementations. the figure, and those with low power consumption lie towards the bottom of the figure. #### 4. Conclusions In this paper, we have compared several CMOS circuit design styles based on various performance criteria such as area, delay, power dissipation and power–delay product. A four-bit ripple carry adder was used as the benchmark circuit. All the circuits have been designed in a full-custom manner, and simulated using HSPICE. A statistical approach was used in order to analyse the simulation results. It has been shown that the circuits which use pass-transistor logic (CPL and DPL) exhibit better power and power–delay product characteristics compared with other design styles. #### REFERENCES - Bellaouar, A., and Elmasry, M., 1995, Low-Power Digital VLSI Design: circuits and Systems (Boston, Massachusetts: Kluwer Academic Publishers). - BISDOUNIS, L., KOUFOPAVLOU, O., and NIKOLAIDIS, S., 1996, Accurate evaluation of CMOS short-circuit power dissipation for short-channel devices. *Proceedings of IEEE International Symposium on Low Power Electronics and Design*, pp. 189–192. - Burch, R., Najm, F., Yang, P., and Trick, T., 1993, A Monte Carlo approach for power estimation. *IEEE Transactions on VLSI Systems*, 1, 63–71. - Chandrakasan, A., and Brodersen, R., 1995, Low Power Digital Design (Boston, Massachusetts: Kluwer Academic Publishers). - Chu, K., and Pulfrey, D., 1986, Design procedures for differential cascode voltage switch circuits. *IEEE Journal of Solid-State Circuits*, **21**, 1082–1087. - Devadas, S., Keutzer, K., and White, J., 1992, Estimation of power dissipation in CMOS combinational circuits using Boolean function manipulation. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 11, 373–383. - Heller, L., Griffin, W., Davis, J., and Thoma, N., 1984, Cascode voltage switch logic: a differential CMOS logic family. *Proceedings of IEEE International Solid-State Circuit Conference*, pp. 16–17. - Kang, S., 1986, Accurate simulation of power dissipation in VLSI circuits. *IEEE Journal of Solid-State Circuits*, **21**, 889–891. - Krambeck, R., Lee, C., and Law, H., 1982, High-speed compact circuits with CMOS. *IEEE Journal of Solid-State Circuits*, 17, 614–619. - Lu, S., 1988, Implementation of iterative networks with CMOS differential logic. *IEEE Journal of Solid-State Circuits*, **23**, 1013–1017. - META-SOFTWARE, 1966, HSPICE User's Manual—Version 96.1 (Cambell, California: Meta-Software). - MILLER, I., FREUND, J., and JOHNSON, R., 1990, *Probability and Statistics for Engineers* (Englewood Cliffs, New Jersey: Prentice Hall). - MIPS TECHNOLOGIES, 1994, *R4200 Microprocessor Product Information* (Mountain View, California: MIPS Technologies Inc). - NAJM, F., 1994, A survey of power estimation techniques in VLSI circuits. *IEEE Transactions on VLSI Systems*, **2**, 446–455. - Oklobdz IJA, V., and Montope, R., 1986, Design-performance trade-offs in CMOS-domino logic. *IEEE Journal of Solid-State Circuits*, 21, 304–309. - PFENNINGS, L., Mol, W., BASTIAENS, J., and VAN DIRK, J., 1985, Differential split-level CMOS logic for subnanosecond speeds. *IEEE Journal of Solid-State Circuits*, 20, 1050–1055. - SAKURAI, T., and Newton, A., 1990, Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas. *IEEE Journal of Solid-State Circuits*, **25**, 585–594. - Schneider, P., Schlichtmann, U., and Wurth, B., 1966, Fast power estimation of large circuits. *IEEE Design and Test of Computers Magazine*, 13, 70–78. - Sun, S., and Tsui, P., 1995, Limitation of CMOS supply-voltage scaling by MOSFET threshold voltage. *IEEE Journal of Solid-State Circuits*, **30**, 947–949. - Suzuki, M., Ohkubo, N., Shinbo, T., Yamanaka, T., Shimizu, A., Sasaki, K., and Nakagome, Y., 1993, A 1.5-ns 32-b CMOS ALU in double pass-transistor logic. *IEEE Journal of Solid-State Circuits*, 28, 1145–1151. - Xakellis, M., and Najm, F., 1994, Statistical estimation of the switching activity in digital circuits. *Proceedings of ACM/IEEE Design Automation Conference*, pp. 728–733. - YACOUB, G., and Ku, W., 1989, An enhanced technique for simulating short-circuit power dissipation. *IEEE Journal of Solid-State Circuits*, **24**, 844–847. - Yano, K., Sasaki, Y., Rikino, K., and Seki, K., 1996, Top-down pass-transistor logic design. *IEEE Journal of Solid-State Circuits*, **31**, 792–803. - YANO, K., YAMANAKA, T., NISHIDA, T., SAITO, M., SHIMOHIGASI, K., and SHIMIZU, A., 1990, A 3.8-ns CMOS 16 × 16-b multiplier using complementary pass-transistor logic. *IEEE Journal of Solid-State Circuits*, 25, 388–395.