Nanoscale cryptography: opportunities and challenges

While most of the electronics industry is dependent on the ever-decreasing size of lithographic transistors, this scaling cannot continue indefinitely. To improve the performance of the integrated circuits, new emerging and paradigms are needed. In recent years, nanoelectronics has become one of the most important and exciting forefront in science and engineering. It shows a great promise for providing us in the near future with many breakthroughs that change the direction of technological advances in a wide range of applications. In this paper, we discuss the contribution that nanotechnology may offer to the evolution of cryptographic hardware and embedded systems and demonstrate how nanoscale devices can be used for constructing security primitives. Using a custom set of design automation tools, it is demonstrated that relative to a conventional 45-nm CMOS system, performance gains can be obtained up to two orders of magnitude reduction in area and up to 50 % improvement in speed.


Background
Since the beginning of the seventies, microelectronics industry has followed Moore's law, doubling processing power every 18 months. This performance increase has been obtained mainly by decreasing the size of circuit features obtained by optimization and improvement of existing technology. Based on the SIA (Semiconductor Industry Association) roadmap, it seems likely that CMOS will remain the mainstream of IC technology even after 2014 and has many years to go (http://www.semiconductors.org/, http://www.itrs.net/). However, it is clear that the Moore's Law exponential increases in density and performance cannot be maintained for ever and with ongoing shrinking dimensions, a MOS transistor will ultimately cease to operate as a proper field-effect-transistor. The main reason is that at gate length around or below 10 nm, the sensitivity of transistor parameters, (most importantly, the gate voltage threshold) of silicon field-effect transistors (MOSFETs) to inevitable fabrication spreads grows exponentially. As a result, the gate length should be controlled with a few-angstrom accuracy, far beyond even the longterm projections of the semiconductor industry. On the other hand, the technical limit to interconnect complexity is much harder to define [1]. There is a great preference in the semiconductor industry for the system-on-a-chip with as many different functional silicon-based (and perhaps other) devices on one silicon chip. Nanotechnologies which may be integrated onto a CMOS chip would be the preferred route [2,3]. If the nanoelectronic field wants to mature to this stage, there is a necessity to bring novel devices more on a par with CMOS by developing the necessary fabrication processes, simulation tools and design rules that are required for any industrial electronic manufacturing process. As a result, researchers in any one particular area need to reach beyond their expertise in order to appreciate the broader implications of nanotechnology, and learn how to contribute to this exciting new field. One of the most important and valuable potential application areas may be information security and integrated implementation of cryptographic systems [4][5][6][7][8][9][10]. Indeed, the main threat to a cryptographic token in the real world is not the cryptanalysis of the actual algorithm, but rather the exploration of weaknesses of the implementation of the algorithm in the real world [11]. These are mainly due to the inherent weaknesses of CMOS technology in which CMOS transistors leak information related to sensitive information when they are switched "ON" or "OFF" [12]. However, nanoelectronic may change this situation. The new hybrid technology paradigm will certainly require rethinking of the current circuit architectures. This work describes a radically different yet efficient approach based on the combination of CMOS circuits along with nanoelements for the implementation of the security circuits which may provide a significantly improved performance [13]. Instead of insisting on existing solutions, we focus on some new solutions that may have the highest potential for conceptual breakthroughs. Based on this fact, we are proposing that digital hybrid CMOS-Nanoelectronic reconfigurable architecture [14] is a potential optimum platform to realize encryption algorithms. It will be demonstrated that such a design result in a circuit which is faster and strikingly denser than its CMOS counterpart. To better demonstrate the capabilities of the proposed approach, we have implemented the basic modules of the Secure Hash Algorithm [15] on cell-FPGA-like hybrid semiconductor-nanowirenanodevice platform which combines a CMOS transistor stack and two levels of parallel nanowires using some already available CAD-tools. We believe that this work will lead to a paradigm shift incorporating security fully into the design and development of future generations of nanoscale computing hardware. The contribution of this paper is as follows: 1. This paper presents the first crossbar based nanoscale computing platform (nano-architecture) for the implementation of encryption algorithms. This uniform array of nanowires in a multi-layer CMOS-Nano crossbar structure provides manufacturability by regularity, reliability (fault tolerance) by reconfigurability, and performance by logic density. Although, some works have addressed the use of nanowire crossbar architecture for logic implementation [2] but their performance cannot be easily evaluated and compared to MOSFET FPGAs. 2. So far, mainly 128-bit key encryption has been designed and implemented by CMOS technology, primarily due to area, speed and power consumption problems associated with the implementation of the encryption algorithm with longer keys. The results we have obtained demonstrate that longer keys can be easily realized by hybrid CMOS-Nano FPGA architecture, making the implementation much more robust against unauthorized deciphering and cryptanalytic attacks.
The reminder of paper is organized as follows: Section II briefly explains reconfigurable hybrid CMOS/Nano technology. Section III illustrates the Secure Hash Algorithm very briefly. Section IV presents the performance results of the implementation of secure hash algorithm basic modules on hybrid CMOS/Nano platform. Finally, in the conclusion we briefly summarize the results of our discussions.

Reconfigurable hybrid CMOS/nanodevice circuits
Traditional existing microelectronic-based approaches might not able to meet all the performance requirements because of the long term costs and the inherent limitations of CMOS technology [16]. So far, mainly the meaning of nanoscale circuits has been the same CMOS circuits that have been smaller. However, to improve the performance of integrated circuits other emerging and paradigms are needed. A feasible yet efficient scenario is the integration of silicon with nanoelectronics, i.e., a mixed CMOS/nano system. This approach would allow a smooth transition and permits leveraging the beneficial aspects of both technologies. Currently, it is not possible Fig. 1 General schematics of a mixed CMOS-Nano system Fig. 2 Left: crossing nanowires separated by a configurable nanojunction layer form 'junctions' that may be electrically configured as electronic devices. Right: nanowire crossbars connected to a CMOS chip via metallic 'pins' on the CMOS surface [28] to make an electronic circuit by using only nanoscale devices, but rather combining it with CMOS circuits may be a better idea [17]. The possibility of mixed CMOS/Nano circuits permits using the best aspects of both technologies simultaneously, while the undesired aspects of a technology can be compensated by the partner technology. Hence, Instead of completely replacing the CMOS technology, the common belief is that the future chips for nanotechnology should be built as a hybrid using both CMOS and nanomaterials (such as CarbonNanoTube bundle interconnects and nanotube/nanowire crossbar memories), thus taking advantages of both mature CMOS technology and novel advances in nanotechnology. The basic idea for such circuits is to combine the advantages of current CMOS technology including flexibility and reasonable fabrication yield with nanoscale devices, assembled on a pre-fabricated nanowire fabric, enabling very high function density at modest fabrication cost. Such architectures allow for significant design versatility. For example, while nano portion is restricted to regular structures, the CMOS portion can be used for the implementation of any arbitrary logic circuit. Perhaps one of the most promising structure for such circuits is an FPGA-like architecture combining a CMOS stack and two-level nanowire crossbar with nanodevices formed at each nanowire cross point together with the ability to reconfigure the circuits around nanodevices defects. Such reconfiguration is essential for any mixed CMOS-Nano system because the lack of enough alignment accuracy and also due to the fact that such a fabrication can hardly provide 100 % yield. It has been shown that such architecture is defect tolerant and even with a high degree of defect rate can provide much better performance in terms of area and speed at acceptable power consumption when compared to circuits which use CMOS alone [18,19]. These circuits work with two-terminal nanodevices whose are electrically activated or deactivated at the cross-points of the mesh and their fabrication is substantially less challenging than their three-terminal counterparts. Of course, the limited functionality of two terminal devices is compensated by transistors of the CMOS subsystem.   [29] This is accomplished by CMOS cells which are accessible through column and row lines. These nanodevices are generally resistive junctions with hysteretic switching behaviors. They are reprogrammable and can be reconfigured to be either turned-off or turned-on. Depending on the materials, some devices will also have hysteretic diode-like switching behavior similar to one resistor in series with one diode. The original structure considers junction nanodevices with diode behaviors, which can be used as a memory element or part of the logic gate for field-programmable gate array applications. The general schematic for this architecture is shown Fig. 1. Fig. 2 represents an improved version of this figure in which each crossbar 'junction' is generally hypothesized to be an electrically configurable or reconfigurable device. The simplest being may an anti-fuse that is currently being widely used in the state-of-theart chip manufacturing industries [20]. A positive voltage drop across an antifuse junction might drive it into a low-impedance state, while a negative voltage drop might return it to a high-impedance state. Metallic pins on the surface of the chip connect down into CMOS gates and provide contact points for electrically attaching nanowires in the crossbar. It is necessary to mention that the 'crossbar' nanowire structure does not need 100 % alignment with respect to the CMOS subsection and a shift of the nanowire/nanodevice subsystem by one nanowiring pitch with respect to the CMOS base does not affect the circuit properties. This problem is solved using area-distributed interfaces. The idea of achieving CMOSto-nano interface without any 'overlay' alignment using precisely angled cuts, is suggested in [21]. Figure 3 shows the general schematics of a hybrid CMOS/Nano circuit with area distributed interfaces. A hybrid CMOS-Nano basic cell consists of an inverter and two pass transistors connected to two pins (with different heights) serving as the cell input and output. Each square houses a CMOS inverter (or a NAND gate) connected to one input pin (for reading a signal driven from a nanowire) and one output pin (for driving a signal from a gate to a nanowire). The input and output of each gate are connected to the nanowires via connection pads shown by blue and red dots in Fig. 4. The bottom wire mesh which makes connections to the inputs is shown by green while the top wires are shown by yellow and provide connections to the outputs. There is a nanodevice placed at each crosspoint between the bottom and top nanowire meshes. The interesting proposed alignment of nanowires with respect to underlying CMOS cells, which is rotated by a certain angle, makes it possible to address each and every nanodevice. The CMOS row signals are used to program the nanodevices through pass transistors that are controlled by the columns signals. In other words, we are able to access each element and turn them "ON" and create a connection between the top and bottom wire at that point or we can choose to leave it "OFF", which then acts as an open circuit. It is easy to observe that each and every nanodevice can be addressed by proper choice of two CMOS cells and by using this technique, we are able to implement combinational and sequential logic gates such as NAND, NOT, XOR, Flip-Flops.
One of the key feature of crossbar hybrid CMOS-Nano circuits is a tilt of the nanowire nanoscale crossbar relative to the CMOS circuitry, which allows to match the pitch 2F CMOS of the CMOS system and that 2F nano of the nanowire crossbar. Figure 5 clarifies the subject. Figure 5(a) shows the schematics of hybrid CMOS-Nano system. As it is seen in Fig. 5(c), the interface pins of each type (reaching to either the lower or the upper Fig. 5 Low level structure of hybrid CMOS/Nano system (a) schematic side view; (b) the idea of addressing a particular nanodevice, and (c) zoom-in on several adjacent interface pins to show that any nanodevice may be addressed via the appropriate pin pair (e.g., pins 1 and 2 for the leftmost of the two shown devices, and pins 1 and for the rightmost device) [29] nanowire level) are arranged into a square array with side 2βF CMOS , where β is a dimensionless factor of the order of one that depends on the CMOS cell complexity. The nanowire crossbar is turned by angle α = arcsin (F CMOS /βF nano ) relative to the CMOS pin array. Hence, by activating two pairs of perpendicular CMOS wires, we can select two individual nanowires and program a single nanodevice at their crosspoint ( Fig. 5(b)) to connect or disconnect these two nanowires This not only makes each nanodevice individually accessible from CMOS subsection (even if F nano ≪ F CMOS ) but also makes the system robust against small shifts of nanowire section. Indeed, studies of [22] shows that at the optimal choice of the pin tip diameter (equal to F nano ), there is only one specific mutual position of the pins and crossbar (in each of two perpendicular directions), at which the connection between these two subsystems is imperfect, while even a small shift from that position restores the proper connectivity. This structure also allows us to use the high drive strength CMOS transistors to buffer and restore each nanowire output signal. The hybrid CMOS-Nano wired logic depends on the voltage divider between the junction switch (modeled as a resistor R on ) and the pass transistor (modeled as a resistor R PASS ) in order to provide a suitable voltage level to the input of the inverter. Figure 6 shows the NAND/buffers/flip-flop cells in the hybrid CMOS-Nano architecture. For clarity, Fig. 7 shows the implementation of a 7-input NOR gate  Logic and routing primitives of hybrid CMOS-Nano FPGA, example of the implementation of a 7-input NOR gate. Only active nanodevices are shown [29] from another perspective in which active nanodevices are shown in green while unused nanodevices are not shown. Please notice that this multi inputs NOR gate is implemented by only one minimum size inverter and several nanoelements while for the implementation of the same function in CMOS technology several NMOS and PMOS transistors are needed. This is the main reason that hybrid CMOS-Nano circuits are far smaller than their CMOS counterparts.
It should be mentioned that the enormous density of two-terminal nanodevices can hardly be used without reliable individual contacts to each of them. This is why the fabrication of wires with nanometer-scale cross-section is another fundamental problem of nanoelectronics. The currently available photolithography and patterning methods, and even their rationally envisioned extensions, will hardly be able to provide a few nanometer resolution. In addition, the scaling of the pitch (F nano ) below 3 nm value would not be practical because of the quantum mechanical tunneling between nanowires. Hence, scaling down of these circuits in nano section will be limited by some fundamental problems. However, as will be demonstrated, hybrid CMOS-Nano circuits provide higher degree of integratability compared to their CMOS counterpart with the same feature size and design rules.

SHA-512 Logic
In order to better demonstrate the effectiveness of the proposed approach and to compare the performance of hybrid CMOS-nanowire-nanoelement crossbar with conventional CMOS circuits, we have implemented the building modules of secure hash algorithm using both regular CMOS and hybrid CMOS-Nano circuits. In this section we briefly describe the secure hash algorithm. The algorithm takes as input a   message with a maximum length of 2 128 and produces as output a 512-bit message digest. The input is processed in 1024-bit blocks. Each round takes as input the 512-bit buffer value, abcdefgh, and updates the contents of the buffer. At input to the first round; the buffer has the value of the intermediate hash value, H i -1 . Each round t makes use of a 64bit value W t , derived from the current 1024 bit block being processed (M i ). Each round also makes use of an additive constant K t , where 0 ≤ t ≤ 79 indicates one of the 80 rounds. The SHA-512 algorithm has the property that every bit of the hash code is a function of every bit of the input. The complex repetition of the basic function produces results that are well mixed; that is, it is unlikely that two messages chosen at random, even if they exhibit similar regularities, will have the same hash code. Unless there is some hidden weakness in SHA-512, which has not so far been published, the difficulty of coming up with two messages having the same message digest is on the order of operations, while the difficulty of finding a message with a given digest is on the order of operations. The algorithm has four basic modules: Round Function, Round Operation, Final Round Addition and   Round Word Computation. Figure 8 represents the overall processing of a message to produce a message digest [23].

Methods
Using a completely custom design automation flow, we first mapped the circuits in .BLIF format on the two-cell hybrid CMOS-Nano FPGA fabric tile array without defect and then with 10 % and 30 % defect rate (https://www.ece.ucsb.edu/~strukov/papers/2007/CMOLCAD2007.pdf, http://www.eecg.utoronto.ca/vpr/) [28,29]. To study how those defects impact the yield and critical path timing, we compiled each of the four mentioned building blocks, varying the 'stuck at open' nanojunction defect probability from 0-0.3. In order to have a fair comparison with CMOS FPGA, the results for different circuits are obtained for the CMOS-Nano FPGA fabric with exactly the same operating conditions and physical structure for all the circuits, thus enabling a fair comparison with CMOS FPGA. For this comparison, the same benchmark circuits have been synthesized into cluster-based island-type logic block architecture. This was done with T-VPack and VPR tools using the architecture designed for the optimal area-delay product, specifically the cluster size of 4, 4-input LUTs, and the VPR's default architecture file (4x4 LUT-sanitized arch.) with technology parameters corresponding to the 45 nm CMOS process (http://www.eecg.utoronto.ca/vpr/). We first found the worst case segment width required to route every circuit successfully. Then, using architecture with such segment width we mapped and routed all circuits, and then extracted their delay and area (for the optimistic case of buffer sharing). The full delay of the considered circuits was calculated from the critical path, which was found after circuit placement and global routing. The delay of 1-input NOT gate which is the basic logic primitive of hybrid CMOS-Nanoelement circuit turns out to be about 40 ps based on Eq. (1).

Results and Discussions
With V DD = 0.3 V and given a 15 nm-wide metallic nanowire interconnects with 3 nm thick switching layer separating two nanowire layers, and an insulator between and around all nanowires with a dielectric constant of 3.9 (that of SiO 2 ), the wire capacitance per unit length to be close to 0.2 fF/μm, capacitance C wire of the full nanowire fragment will be about 3 fF. It is known that R ON = 400 kΩ and the "ON" resistance of a crosspoint nanodevice is R ON D ¼5 kΩ ; D¼80 is the number of parallel nanodevices connected in series with the ohmic resistance R wire , driven by voltage V DD and I is the Fig. 12 Final routing of Final Round Addition in .BLIF file with presence of 30 % defective cells after succesful reconfiguration maximum gate fanin (http://www.eecg.utoronto.ca/vpr/). The wire resistivity is almost at 8.88 μΩcm and the substrate and coupling capacitances are about 2 pF/cm and 1 pF/cm, respectively. The estimated resistance between the center and the end of a nanowire fragment, of the length β FCMOS ð Þ 2 Fnano ¼ 7:2 μm is estimated less than 2.5 kΩ. Assuming the area of the minimum-width transistor to be 25(F CMOS ) 2 (http://www.eecg.utoronto.ca/vpr/), where F CMOS is the half pitch of the CMOS subsystem, the results for CMOS implementation is shown in the Table 1. Table 2 summarizes the performance estimation for the same circuits on hybrid CMOS-Nano FPGA fabric without defect. Table 3 shows the same with 10 % and 30 % defect rates. As it is seen, the hybrid-CMOS implementation is almost two orders of magnitude denser than its CMOS counterpart. Figure 9 shows the implementation of one columns of Final Round Addition on a (9 + 2) × (9 + 2) CMOS FPGA block in which the critical path is represented by green color. Figure 10 shows the initial placement for that circuit in .BLIF format mapped on the (9 + 2) × (9 + 2) tile array with 30 % defects. (Here the additional layer of tiles at the array periphery is used exclusively for I/O functions). Figure 11 show initial placement and routing of the same circuit on the same platform. Figure 12 shows the same circuit after final placement and routing. Figure 13 shows the placement of Round Operation with 10 % defective cells on a (27 + 2) × (27 + 2) CMOS-Nano FPGA. Figure 14 shows the initial placement of Round Operation with 10 % defective cells. Figure 15 shows final routing and placement of the same circuit after final successful reconfiguration. Figure 16 shows the same implementation in which active nanoelements are shown with green dots. Figure 17 shows a zoomed view of Round Operation mapped on a hybrid CMOS-Nano fabric. Active nanoelements are identified with green dots. Bad or unused nanoelements are not shown. Figure 18 shows a global view of Round Word Computation mapped on a hybrid CMOS-Nano fabric with 30 % defect rate. Bad nanoelements are shown in black while good used are shown in green.
Static power consumption of two cell hybrid CMOS-Nano circuits can be estimated as the sum of static power consumption P ON due to currents I ON , , leakage power consumption P leak due to current leakage through nanodevices in their "OFF" state [24,25]. Figure 19 shows the equivalent circuit for hybrid CMOS-Nano logic stage.
where R ser is the series resistance equivalent of R ON /D and R wire , D is the total number of nanoscale switches in one nanowire crosspoint, M is the number of closed crosspoint switches. With F CMOS = 45nm, F nano = 4.5nm, R wire = 14Ω, R OFF = 4 GΩ, and D = 80 [24], the leakage and static power consumption of each module can be estimated. It is clear that P leak can be neglected. Dynamic power consumption P d is mainly due to recharging of nanowire capacitances and depends on the number of nanowires allocated by the synthesis tool to implement a circuit onto the target platform and can be calculated using Eq. (3).
Where α is the average 'switching activity' of the circuit, N is the number of nanowires participating in the implementation of logic function, C is the capacitance of single nanowire, V dd is the voltage supply of CMOS transistors, and f is the maximum clock speed determined by timing analysis of critical path. We chose an activity of 0.2, twice of the value predicted by Davis [26] to estimate the power consumption pessimistically not optimistically. Hence, the power consumption of each module (without defect) can be estimated as Table 4.
We have not computed the total power consumption of the whole algorithm; however, evidences show that the power consumption of the proposed design is close and comparable to the power consumption of the actual implementation of the algorithm on FPGA [27].

Conclusions
The invention of the transistor is one of the most important inventions of the 20 th century. Since its inception, the transistor size has been reduced so that now modern devices are orders of magnitude smaller than their earliest counterparts. Unfortunately, the scaling down will eventually end. Increasing power, capital costs, and ultimately theoretical size limitations, are poised to halt the process of continually shrinking the transistor. The results presented in this paper clearly demonstrate that nanoelectronic-based digital circuits may continue the performance scaling of     microelectronics well beyond the limits of the currently dominating CMOS technology. However, whether nanoelectronics will be a replacement for conventional ICs, or as a complimentary technology, is yet to be investigated. We believe that this situation may justify large-scale research and development efforts in this area. In this paper, we discussed the contribution that nanotechnology may offer to the evolution of cryptographic hardware and embedded systems and demonstrated how nanoelectronics can be used for constructing security primitives. There are still some problems but the prospect of cheaply integrating 10 12 devices per chip is a powerful incentive to overcome the existing challenges. In order for this prediction to become true, several challenges still have to be overcome. Without a doubt, the most important of them is the development of a highly reproducible technology for VLSI fabrication of crosspoint resistive switches. Finally, the preliminary research indicates that while existing parts of the CAD tools will be useful for nano-electronics, there will need to be some additions and changes made. Improved device models and 3-D CAD and design tools will certainly accelerate research in this area.