SymCure - Fault Management Software
SymCure is a development and deployment environment for building and implementing fault management applications that automate real-time fault isolation, testing, repair, and availability management tasks of large-scale operations.
Fault management plays a vital role across a broad spectrum of commercial and industrial applications, ranging from service level management and telecommunications network management in the Information Technology (IT) world, to abnormal condition management in manufacturing, chemical, oil and gas industries. The size and complexity of these applications often necessitates automated expert-system support for fault management. A small number of root cause problems in IT communication networks often result in a large number of messages and alarms that human operators cannot handle in real time. Failure to identify and repair the root cause problems results in increased system downtime and poor service levels. Abnormal conditions in manufacturing and processing plants may result in unplanned shutdowns, equipment damage, safety hazards, reduced productivity, and poor product quality.
Fault management across these industries shares some common goals, such as improving application availability and utilization, reducing operator overload, and minimizing operation costs. To achieve these goals, it is necessary to develop fault management tools with the
- Symptom monitoring. Symptoms are manifestations of underlying root causes and must be monitored to detect the occurrence of problems as soon as they happen.
- Diagnosis. Diagnosis identifies the root causes of known symptoms. Diagnosis is also often referred to as fault isolation. Some studies have shown that 80% of the fault management effort is spent in identifying root causes after the manifestation of symptoms.
- Correlation. Correlation is the process of recognizing and organizing groups of events that are causally related to each other for diagnostic inference and presentation to system operators. Usually such events share one or more root causes.
- Prediction. Early prediction of the impacts of underlying root causes before the effects are manifested is critical for proactive maintenance, safety, and optimal system utilization.
- Testing. In large systems, it is impractical and sometimes impossible to monitor every variable. Instead, key observable variables are monitored to generate symptom events. Diagnostic inference typically identifies a set of suspected root causes. Additional variables can then be examined by running associated tests to complete the diagnosis process.
- Automated recovery. Identifying and automating recovery procedures allow for growth in equipment, processes, and services, without increasing the supervisory burden on system operators.
- Notification. Operators must be notified of the presence of root causes and their potential impacts. Raw alarms, which can overload an operator with redundant information, must be replaced with concise diagnostic summaries of root causes and their impacts.
- Postmortem. Information from the diagnostic problem solving is fed back to the fault management system for historic record keeping and proactive fault management in the future.
SymCure addresses a number of fault management functions, including diagnosis, correlation, prediction, testing, automated recovery, and notification. It provides a powerful object-oriented, model-based framework to specify diagnosis knowledge in the form of a persistent, generic (class-based), graphical, fault model library. It performs diagnosis and prediction by combining the fault models with specific domain information and incoming events at run time. It detects and resolves multiple system failures, and notifies the results of its diagnostic reasoning to external systems by using messages and other suitable means. SymCure’s methodology is domain independent. It has been used for fault management in diverse applications across different industries, including abnormal condition management for heaters and service management for enterprise wide software systems.
SymCure provides the following capabilities:
- Automates fault and availability management of operations in domains as diverse as communications networks, enterprise-wide software applications, and manufacturing process plants.
- Performs online event correlation and interactive diagnosis to address the full life cycle of problem identification, based on symptoms, root-cause analysis, diagnostic testing, fault isolation, and recovery.
- Provides a powerful model-based framework consisting of generic, classbased fault models, which are tied to an object-oriented domain representation and scalable algorithms.
- Understands the complex relationships between each object, process, and event.
- Anticipates and diagnoses problems, based on this understanding.
- Requires minimal on-site customization for practical deployment. SymCure automatically accounts for configuration changes in equipment, topology, or operating modes in managed operations. Minimal customization eliminates the need for expensive reconfigurations of the fault management applications, enabling application developers to build reusable solutions that implement fault and availability management capabilities quickly and reliably.
- Accepts events and data from external sources.
- Provides a graphical language for automating labor- or reasoning-intensive tasks, such as root cause analysis, testing, fault mitigation, response, and recovery.
- Performs impact analysis to predict the effects of problems, and to measure the potential business impacts, for example, the impact on service level agreements in the networking industry or the impact of shutdowns in the manufacturing industry. SymCure can perform offline “what-if” simulation of failures to rapidly identify any potentially harmful effects of suspected root causes in a system.
- Guides operators through testing and recovery.
- Provides built-in configurable message browsers for system operators and developers.
The depth and breadth of SymCure’s causal reasoning capability; its ability to automate time-consuming, labor-intensive, and reasoning-intensive fault management tasks; and its ability to factor in the business impact of each event make SymCure the most powerful solution for managing complex fault management applications.
SymCure’s major benefits include:
- Rapid diagnosis and response to problems.
- Increased system availability.
- Optimization of personnel and system resources.
- Comprehensive impact analyses for more accurate contingency planning and other system-related business modeling.
- Improved service and equipment availability.