This course introduces a variety of hardware and software techniques to design and model fault-tolerant computers. Topics include coding techniques (Hamming, SECSED, SECDED, etc.); majority voting schemes (TMR); software redundancy (Nversion programming); software-recovery schemes; network reliability design and estimation. The course introduces probabilistic methods for reliability modeling. Other topics: Examples from space fault tolerant systems, networks, commercial nonstop systems (TANDEM and STRATUS). RAID memory systems. Fault-tolerant modeling tools such as HARP, SHURE and SHARPE.
Prerequisites CS 6133 and Graduate Standing.