RESEARCH PROJECTS
Home People Undergraduate Graduate Research Contact
 

RESEARCH PROJECTS

Fields and Waves
Signals, Control and Signal Processing
Telecommunications and Wireless
VLSI, Electronics and Power

 

 

 

 

 

 

 

 

 

 

 

Mitigation of Voltage Disturbances Caused by Nonlinear Electrical Massive Loads

Power utilities around the world become recently more and more concern about maintaining high quality of power they provide for the customers. The reason for that is both increasing number of nonlinear devices, which cause pollution of the voltage and sensitiveness of other equipment to this pollution. Variation in the voltage level can be seen as annoying blinking of fluorescent lights in our houses, but can also interrupt proper operation of modern industrial machines or even damage some very sensitive appliance. Heavy and nonlinear loads are main sources of the voltage distortion in the electrical power grids. In the example below we can see a heavy induction motor powering a car shredder and causing variation of the voltage on its terminal. This variation transmitted through utility lines may be the reason for improper operation of the machine in the factory or can be seen in the house as a fluctuating light.

Participating Faculty: Zivan Zabar (zzabar@poly.edu) and  Dariusz Czarkowski (dcz@pl.poly.edu )

Collaborators: Tomasz Sulawa

Project is being conducted by the power group of the ECE department, Polytechnic Institute of NYU.

Long Island Power Authority and KeySpan Energy sponsor this power research project to investigate and mitigate disturbances in Long Island power network. 

 

 

Development of a Unit Substation Demand Estimator

Today electric utility companies use a wide range of computerized applications for energy management. These applications have an important role in many aspects in the power system industry. Using real-time measurements and different data analysis methods, these applications are responsible for the creation of reasonable and accurate representation of the network.  These applications are also used for short term and long term load forecasting during significantly degraded operations.

A Unit Substation Demand Estimator (USDE) is needed to estimate missing data from substations in various networks across NYC and Westchester County. As a starting point for this study and the USDE development, Flatbush Brooklyn network has been chosen as the first network to be tested.

This project describes the design and implementation of a USDE. The project presents different methods for estimation of missing data measurements.  Each method is tested in detail to validate the accuracy of the estimated data and an estimation process strategy is suggested.

By using the successive estimation methods and Visual Basic for Application code, a USDE application is developed. The USDE application is then tested and special tuning functions are developed to improve the estimation process and the estimation results.

Participating Faculty: Dariusz Czarkowski (dcz@pl.poly.edu ) and Zivan Zabar (zzabar@poly.edu)

Collaborator: Yariv Ten-Ami

Sponsor: Consolidated Edison Company New York, NY

 

 

High Speed Cryptographic Architectures

Encryption/Decryption and Message Authentication Codes (MACs) have been widely used in many network applications. Secure Socket Layer (SSL) and Transport Layer Security (TLS) use encryption and authentication to support secure browsing, secure file transfer and secure remote login between end users and servers. These high performance commercial servers and routers require dedicated cryptographic hardware implementations to match the greater than 10 Gbps wire speed requirements of network traffic. We are designing high speed architectures for encryption, authentication and encryption with authentication cryptographic functions.

Encryption/Decryption

We designed architectures for symmetric encryption/decryption algorithm Advanced Encryption Standard (AES) at 10Gbps or higher. Cryptographic cipher processing in software is incapable of achieving 10 Gbps line rate. Hardware implementations of cryptographic algorithms DES, IDEA, Twofish etc. have also been investigated. None of them yields a 10 Gbps throughput. We investigated hardware architectures for high-speed Advanced Encryption Standard (AES) block and stream ciphers that achieve 10 to 100 Gbps throughputs. The ten round 128-bit AES iterative block cipher offers a variety of architectural options, each trading-off the circuit area/complexity with the throughput. For example, a fully loop-unrolled architecture allows implementation of all rounds as a single combinational logic block, reducing the hardware for round key multiplexing and the number of clock cycles per block. However, this approach has a high area overhead and yields the worst register-to-register delay.

A pipelined architecture increases the number of blocks of data being simultaneously processed. In a full pipeline implementation where each stage implements one AES round, the system will output an N-bit block at each clock cycle once the latency of the pipeline has been met. Here N is the encryption/decryption block length. In this proposal, we extended this pipelined architecture further by partitioning each AES round into two stages (sub-pipelining). This is based on an important observation regarding in 128-bit AES data path, the 8-bit-in 8-bit-output SBox is the bottleneck. We partitioned the round function into an Sbox stage and a stage that implements Shift Row, Mix Column and Round Key XOR operations. Even though this second stage contains a multiplexer (to bypass the Mix Column during the last round of encryption), the SBox operation still remains the critical path. We implemented an optimized AES 10-stage pipeline architecture that implements 5 rounds with two pipeline stages per round. The first four rounds are optimized by removing the bypass multiplexer in the second stage of the sub-pipeline since the Mix Column is always performed in these stages. We store round keys in registers. This 10-stage pipeline architecture results in 2.1 cycles (2 + 1/10) per block. Our FPGA prototype of AES in the block cipher mode has demonstrated that this architecture has the best trade-offs between performance and circuit area. A throughput of more than 4.6 Gbps has been achieved for implementation.

Message Authentication Codes

We implemented 100Gbps hardware architectures for MACs based on universal hash functions. A universal hash function is defined as follows: Let A and B be two sets and H be a family of functions from A to B. H is a universal family of hash functions if for every pair x1, x2 A with x1≠x2, hH and h(x1), h(x2) B, the collision probability of h(x1) =h(x2) = 1/|B|. |B| is size of set B and 1/|B| is the smallest possible value of the probability. We focus on Linear Congruential Hash (LCH), a widely used universal function family. LCH is defined as:

where mi is the ith word in a message block m and xi is the ith word in key x and t Zp. p is a prime number which is typically close to 2w. Modular reduction of the accumulated result using p generates either a w-bit or a w+1-bit hash value. The straightforward architecture of LCH as shown in Figure 1 (a) uses four 32-bit registers (R1, R2, H and L), two 64-bit registers (R3, r4), two 32-bit 2-to-1 multiplexers (mux1 and mux2), and two 64-bit 2-to-1 muxes (mux3 and mux4).

We propose to divide a 2w-bit data path into two w-bit data paths and concatenate their results to construct an equivalent 2w-bit data path. The concept of equivalence is crucial. Obviously, a straightforward data path and its corresponding divide-and-concatenate data path cannot be equivalent in terms of the results that they output. We define two data paths to be equivalent if the results that they output satisfy a pre-defined property. For one way universal hash functions and associated message authentication codes the actual result is not important. Rather, it is the collision probability of the result that is important. Hence, we propose that two data paths implementing a hash function be considered equivalent if they have the same collision probability. We defined equivalent data paths and architectures for universal hash and associated MACs, when 1) they can process same size input every clock cycle and 2) they have the same collision probability.

Using the divide-and-concatenate technique a 32-bit LCH hash data path with a collision probability of 2-32 can be constructed using two 16-bit LCH hash data paths, each with collision probability of 2-16, and concatenating their 16-bit results to generate a 32-bit hash value. The equivalent 16-bit divide-and-concatenate architecture consumes 11506 gates compared to 9071 gates by the straightforward 32-bit architecture. The 16-bit divide-and-concatenate architecture achieves 45% throughput improvement with 26% area overhead. Applying divide-and-concatenate once more to construct each 16-bit LCH data path using four 8-bit LCH hash data paths yields an equivalent 32-bit LCH hash data path from sixteen 8-bit LCH hash data paths. Compared to 32-bit straightforward LCH architecture, this preliminary 8-bit equivalent divide-and-concatenate architecture achieves 101% throughput improvement with 52% area overhead, which is shown in Figure 1 (b). We implemented either MMH or TMMH or UMAC with a collision probability of 2-32 that is composed of sixteen 8-bit LCH hash data paths. Based on the synthesis experiments, the throughput for the FPGA implementation is >70Gbps.

Participating Faculty: Ramesh Karri (ramesh@india.poly.edu)

Website: http://cad.poly.edu/encryption/

Research supported by Cisco Systems

[1] B. Yang, R. Karri, D. A. Mcgrew, A High Speed Hardware Architecture for Universal Message Authentication Code, submitted to IEEE Transactions on Computer.

[2] B. Yang, R. Karri, D. A. Mcgrew, Divide-and-concatenate: an architecture level optimization technique for universal hash functions, IEEE/ACM Design Automation Conference (DAC), pp.614-617, June, 2004.

[3] K. Alexander, R. Karri, I. Minkin, K. Wu, P. Mishra, and X. Li, Towards 10-100 Gbps Cryptographic Architectures, Proceedings, International Symposium on Computer and Information Science (ISCIS), Orlando, Florida, October 2002.

[4] B. Yang, R. Karri, D. A. Mcgrew, An 80Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code, Third Place Winner, 2004 DAC/ISSCC Student Design Contest, June 2004. http://www.dac.com/42nd/studcon.html.

 

 

Secure Built-In-Self-Test (BIST) Architecture

Crypto algorithms are being implemented in hardware to meet high throughput requirements  and widely integrated as crypto accelerators in System-On-Chip (SOC) devices for secure applications ranging from tiny smart cards to high performance routers. In a secure SOC, crypto coprocessors offload intensive arithmetic computations from the host processor. A straightforward way to use BIST to test symmetric block cipher circuits is using an additional Test Pattern Generator (TPG) and Output Response Analyzer (ORA) circuits. In the test mode, the inputs to crypto data path are applied from the TPG instead of the plaintext; the outputs from crypto data path are compressed into ORA as a signature.

In a BIST architecture, the aim of TPG is to provide random inputs to Circuit under Test (CUT). Since exhaustive testing is almost impossible, for example AES data path needs 2128 test patterns, the probability distribution for test patterns determines the length of test patterns to insure an acceptable level of fault coverage. LFSR tends to produce test patterns having equal numbers of 0s and 1s on each output test pattern resulting in very long test patterns for some circuits. Weighted random pattern generators bias the distribution of 0s and 1s that makes test patterns more random thereby achieving a higher fault coverage with fewer test patterns. Strong randomness is an inherent feature of crypto algorithms. A block cipher can be considered as an instance of a random permutation over a message block under the control of a key block. In fact, the security of a block cipher can be formalized by pseudorandomness: if there is no way to distinguish the block cipher from an ideal random permutation, then the block cipher can not be attacked.  One or more round operations are non-linear transformations in symmetric block ciphers. For example, in both DES and AES, the non-linear substitution is used. The randomness of several symmetric block cipher algorithms has been evaluated by National Institute of Standards and Technology (NIST).

In BIST technique, the ORA operates as a hash function; it compresses all the test results into a signature. MISR is a simple hash function and widely used as ORA. Collision probability is the most important parameter for a hash function. It is defined as the probability that two different messages have the same hash result. The smaller the collision probability is, the better the hash function. If a result sequence with faulty output vectors can also be compressed into the correct signature, such faults can not be tested. Both the quality of TPG and ORA determines the efficiency of the BIST technique. Block cipher in CBC mode is the one of the most powerful hash function widely used in message authentication code. It is computationally infeasible for such hash functions to find messages x and x’ such that x’ ≠ x and hash (x’) = hash(x). A block cipher can be used either as a TPG with more random output patterns or as an ORA with very low collision probability. Based on this key observation, we develop a BIST technique called Secure BIST to test block cipher modules. In the proposed Secure BIST technique, the output of a crypto core (ciphertext) is fed back to the input of the crypto core (plaintext) in the test mode and the signature is compressed into the output ciphertext register. The proposed Secure BIST technique incurs almost no area overhead by using a crypto module itself as both the TPG and the ORA.

We validated Secure BIST on hardware implementations of Data Encryption Standard (DES) and Advanced Encryption Standard (AES). The experimental results show that Secure BIST is superior to LFSR-based BIST in terms of area overhead, fault coverage and test sequence length.

Participating Faculty: Ramesh Karri (ramesh@india.poly.edu)

[1] Bo Yang and Ramesh Karri, A Secure Built-In Self Test Technique for Crypto Modules in Secure Systems-On-Chip (SOC), submitted to IEEE Transactions on Computer.

 

 

Fault Attack Resistant High Speed Crypto Architectures

Motivation: Because of the rapidly shrinking dimensions in VLSI, transient and permanent faults arise and will continue to occur in the near future in increasing numbers. Faults can broadly be classified in to two categories: Transient faults that die away after sometime and permanent faults that do not die away with time but remain until they are repaired or the faulty component is replaced. The origin of these faults could be due to the internal phenomena in the system such as threshold change, shorts, opens etc. or due to external influences like electromagnetic radiation. The faults could also be deliberately injected by attackers in order to extract sensitive information stored in the system. These faults affect the memory as well as the combinational parts of a circuit and can only be detected using Concurrent Error Detection (CED). This is especially true for sensitive devices such as cryptographic chips. Hence, CED for cryptographic chips is growing in importance. Since cryptographic chips are a consumer product produced in large quantities, cheap solutions for concurrent checking are needed. CED for cryptographic chips also has a great potential for detecting fault-injection attacks where faults are injected into a cryptographic chip to break the key.

Objectives:

      Investigate low-cost, low-latency CED schemes for Symmetric Block Ciphers

      Obtain close to 100% coverage without significant throughput degradation

      Develop CED techniques to resist all known and possible future attacks

Progress: We developed a fault resistant architecture for involoutional ciphers which detects all possible faults in the datapath and consumes <20% area overhead, degrading the throughput by <10% [1] [2]. We also extended this technique to be applicable to Feistel Network Ciphers [4]. We developed a novel CED scheme for the Advanced Encryption Standard (AES), which was chosen as the U.S Government (FIPS) standard to be a royalty-free encryption algorithm for use worldwide and offer security of a sufficient level to protect data for the next 20 to 30 years. This scheme makes use of some of the invariance properties exhibited by AES [3]. These invariance properties of AES are being used to investigate its weaknesses. In contrast, we use them to strengthen the hardware implementations of AES and to protect it against fault attacks. The fault injection based simulation of this scheme shows 100% fault coverage and the ASIC implementations resulted in a very low area and throughput overhead.

Participating Faculty: Ramesh Karri (ramesh@india.poly.edu)

Website: http://cad.poly.edu/encryption

[1] N. Joshi, K. Wu and R. Karri, Concurrent Error Detection Schemes for involution ciphers, Cryptographic Hardware and Embedded Systems (CHES) 2004, Springer Verlag LNCS vol. 3156, August 2004.

[2] N. Joshi, J. Sundararajan, K. Wu and R. Karri, CED schemes for involutional SPN networks, IEEE Transactions on Computer Aided Design, under review.

[3] N. Joshi, S. Iyer, R. Karri, Invariance based CED for the Advanced Encryption Standard, Submitted to IEEE Design Automation Conference (DAC) 2005

[4] N. Joshi, J. Sundararajan, K. Wu, R. Karri, Generalized involution based CED for Substitution Permutation and Feistel Networks, IEEE Transactions on Computers, under review.

 

 

Fault Tolerant Nanoscale Systems

New technologies based on nanoscale physical characteristics such as Resonant Tunneling Diodes, Quantum-dot Cellular Automata and molecular electronics have been researched and are being proposed as candidates for next generation device technologies. However, physical limitations at the nanoscale result in highly unreliable fabrication mechanisms which in turn translate into highly unreliable nano devices. Consequently, device failure rates in these emerging nanotechnologies are projected to be in the order of 10-3-10-1. Furthermore, the faulty behavior is time varying and hard to model. Overall, fault tolerance is an important system level design objective in these emerging nanotechnologies. In current CMOS based technologies, fault rates are static and in the range 10-9-10-7. The typical techniques for addressing reliability in CMOS technologies, namely, extensive testing at manufacturing time, and a limited amount of redundant hardware added into the circuitry for high operation time reliability, cannot be successfully applied in emerging nanotechnologies with much higher and time varying failure rates. Fundamentally, manufacturing processes and hence failure mechanisms are different and the devices per unit area are several orders of magnitude larger (~107 device/cm2 in CMOS vs ~1012 device/cm2 in emerging nanotechnologies).

This research investigates design principles for building reliable systems from unreliable nano device technologies of Quantum-dot Cellular Automata (QCA) and Negative Differential Resistance (NDR).

Fault tolerant QCA building block design

Triple Modular Redundancy (TMR) is a straightforward way to provide fault tolerance capability. However, TMR is not a good choice for designing fault tolerant QCA designs since wires, faults in wires, and wire delays dominate in this nanotechnology. We propose TMR using Shifted Operands (TMRSO) as a new approach to designing fault tolerant QCA designs with lower area overhead and better performance than straightforward TMR 0. This new method exploits the self-latching and adiabatic pipelining properties of QCA devices to maximize throughput of a system since more than one calculation can be in the pipeline at a given time. We have validated this concept on a two-bit adder as shown in Figure 1.

Fault Tolerant NDR building block design

Error checking code based information redundancy approach has been regarded as a powerful fault tolerance scheme in communication and storage systems. Preliminary work in this direction has shown that, by exploiting the characteristics of certain Nanotechnology devices, linear block code based information redundancy approach can be applied to carry save based arithmetic subsystems, thus providing a promising vision of further developing a low-overhead unified fault tolerance scheme for Nanotechnology systems 0.

Fault tolerant nanotechnology processor design

We propose to investigate a new decentralized architecture that incorporates powerful and flexible fault tolerance strategies in the Nanotechnology environment 0. As a preliminary work, we have developed a fault tolerance strategy with a certain degree of decentralization in computation units that dynamically selects between hardware and time redundancy in response to the time varying fault rates in the system. Figure 3 shows a high-level view of the instruction issue process and the interaction between the voters and the C-units.

Participating Faculty: Karri Ramesh (ramesh@india.poly.edu)

Collaborators: Alex Orailoglu and Kaijie Wu

[1] T. Wei, K. Wu, R. Karri and A. Orailoglu, Fault Tolerant Quantum Cellular Array (QCA) Design using Triple Modular Redundancy with Shifted Operands, ASP-DAC 2005, to appear

[2] W. Rao, A. Orailoglu and R. Karri, Fault Tolerant Arithmetic with Applications in Nanotechnology based System, International Test Conference, pp. 472-478, October 2004

[3] W. Rao, A. Orailoglu and R. Karri, Fault Tolerant Nanoelectronic Processor Architectures, ASP-DAC 2005, to appear.

 

 
  poly thinking