SEAS > ECE > Research > Research Labs

 

Safety Assessment Lab


Contact:
Link:
Head Faculty:

Eric Cutright

Dr. Ted C. Giras


Mission:
The mission of the Center of Rail Safety-Critical Excellence is to provide a unique collaboration between the University of Virginia's (UVA) School of Engineering and Applied Science and the Association of American Railroads (AAR) to enhance the risk assessment capabilities for railroads, high speed, freight and transit railways, commuter lines and maglev systems. All of these sectors plan to deploy new and innovative train control systems to replace existing technologies that have been in place for decades. The complexity and potential safety impacts of these processor-based systems presents a unique need to provide suppliers with verified and validated WEB-based risk assessment toolsets to help ensure achievement of rigorous standards without the need for large industry investments. This partnership increases the ability of the Federal Railroad Administration (FRA) to evaluate, approve and enforce performance-based safety-critical standards.


Overview:
A key shared resource of the Center for Safety-Critical Systems (CSCS) and the Center of Railroad Safety-Critical Excellence (CRSCE) is the Safety Assessment Lab (SAL) located in Thornton C314. The primary SAL research focuses are system-level risk assessment and numerical safety quantification of safety-critical systems. System-level risk assessment is performed using a simulation environment developed by the CRSCE called the Axiomatic Safety-Critical Assessment Process (ASCAP), which uses a Monte-Carlo simulation approach to derive system-level risk metrics and to allocate numerical safety targets to safety-critical processor-based sub-systems. These numerical safety targets are expressed in terms of the Mean Time To Hazardous Event (MTTHE) metric, which represents the average time to an unsafe failure of the system.


A key shared resource of the Center for Safety-Critical Systems (CSCS) and the Center of Railroad Safety-Critical Excellence (CRSCE) is the Safety Assessment Lab (SAL) located in Thornton C314. The primary SAL research focuses are system-level risk assessment and numerical safety quantification of safety-critical systems. System-level risk assessment is performed using a simulation environment developed by the CRSCE called the Axiomatic Safety-Critical Assessment Process (ASCAP), which uses a Monte-Carlo simulation approach to derive system-level risk metrics and to allocate numerical safety targets to safety-critical processor-based sub-systems. These numerical safety targets are expressed in terms of the Mean Time To Hazardous Event (MTTHE) metric, which represents the average time to an unsafe failure of the system.
The second SAL research focus is the experimental demonstration of MTTHE compliance based upon fault injection into simulated or physical hardware/software systems. The MTTHE metric can be expressed as a function of the system failure rate and fault coverage, a measure of the ability of the system to detect the presence of faults and to react in a safe manner. Typically, the system failure rate can be estimated using common reliability analysis techniques. The challenging issue of the failure rate estimation process is the estimation of hardware and software design faults that may be present in the system. However, fault coverage is typically an extremely difficult parameter to estimate and becomes the focus of the MTTHE compliance process. This process involves the development of analytical, statistical, and fault/error models of the system and the creation of a set of faults to be injected into the system. Fault injection can be performed by augmenting either the system hardware or software to support physical fault injection, or by developing simulation models of the system hardware which can execute the actual system software and support fault injection. The results of the fault injection experiments are used to develop a statistical estimate of the system fault coverage and the associated MTTHE.
Current CSCS/CRSCE sponsors include the Electricite de France (EDF), the Federal Railroad Administration (FRA), Lockheed-Martin Inc., Maglev Inc., the New York City Transit Authority, the Nuclear Regulatory Commission (NRC), and Union Switch & Signal, Inc.
1.1. BEOWULF Computing Cluster
A key component of the Safety Assessment Lab is a mini-super computer based on the BEOWULF cluster computing architecture, shown in Figure 1. The BEOWULF cluster is used to execute ASCAP risk assessment simulations as well as to interface to external physical systems for testing/simulation activities or to coordinate fault injection campaigns. The cluster consists of sixteen Computation Nodes (each containing two Pentium III 1GHz processors), one Controller Node to coordinate cluster task scheduling, one Interface node to provide network and internet access to the cluster, and multiple external I/O nodes to provide I/O connections to physical systems under MTTHE evaluation. The cluster nodes use the LINUX operating system and communicate with each other over 1 GB/sec Ethernet links.
1.2. Fault Injection Approaches
Fault injection is performed using one of four techniques: (1) hardware-based, (2) software-based, (3) simulation-based, or (4) a hybrid approach. Hardware-based fault injection involves augmenting the system under analysis with specially designed test hardware to allow for the injection of faults into the system. Typically, these faults are injected at the Integrated Circuit (IC) pin level, though processors can sometimes be subjected to internal faults depending upon the test facilities built into the processor itself.
Traditionally, software-based fault injection involves the modification of the software executing on the system under analysis in order to provide the capability to modify the system state (both processor registers and memory) according to the programmer's model view of the system.
Simulation-based fault injection involves the construction of a simulation model of the system under analysis, including a detailed simulation model of the processor in use. The simulation models are developed using a hardware description language such as the Very high speed integrated circuit Hardware Description Language (VHDL).
A hybrid approach combines two or more of the other fault injection techniques to more fully exercise the system under analysis. For instance, performing hardware-based or software-based fault injection experiments can provide significant benefit in terms of time to perform the fault injection experiments, can reduce the initial amount of setup time before beginning the experiments, and so forth. However, given the significant gain in controllability and observability with a simulation-based approach, it might be useful to combine a simulation-based approach with one of the others in order to more fully exercise the system under analysis.
1.3. Fault Injection Environment
Figure 2 depicts a typical fault injection experimental set-up. The system under evaluation in this figure is a Digital Feedwater Control System (DFWCS) for the Calvert Cliffs nuclear power plant. The system consists of a main and backup processor to control the feedwater levels as well as three Proportional Integral Derivative (PID) controllers which monitor and control various pumps and valves in the system. Both software-based and simulation-based fault injection techniques are being applied to this system. The software-based approach uses software interrupts in the operating system to modify register and memory contents during execution of the DFWCS software in the main controller. The simulation-based approach uses a commercial simulation tool (Simics) to seamlessly replace the main controller hardware with a simulation model which executes the unmodified DFWCS system software. The Simics tool allows complete access to the programmer's model of the processor, such that register and memory contents can be corrupted as desired during the execution of the software.