Top methodologies to ensure system alarms are properly acknowledged and responded to in a timely manner.

By: Allan Evora & Adam Baker

scada-alarm

More and more of today’s modern control systems are moving towards a model that does not involve operators watching control system screens 7x24x365. Automation and intelligent systems allow industries to do more with less. Operating and maintaining facilities and equipment are no exception. Smart alarm management has allowed owners and operators to interact with their assets on an exception basis. By this, I mean operators are notified when a fault or abnormal condition occurs (or is about to occur) and then interrogate their automation systems to understand how to respond.


The proper design, configuration, and installation of an alarm management system is no longer optional; it is a must. The following are best practices for smart alarm management.

 

1.    Process industries: Follow ISA-18.2-2009

The issuance of the ISA-18.2 standard was a significant event for the chemical, petrochemical, refining, power generation, pipeline, mining and metals, pharmaceutical, and similar industries using modern control systems with alarm functionality. It sets forth the work processes for designing, implementing, operating, and maintaining a modern alarm system in a life cycle format.

The basic intent of ISA-18.2 is to improve safety. The standard focuses on both work process requirements (“shall”) and recommendations (“should”) for effective alarm management.

 

2.    Stop alerting on events

There’s a big difference between an alarm and an event in your SCADA system. There are many different actions and data points you need to keep track of, but not all require operator corrective actions to keep things under control and running. Whether it’s a human or software making the decision to generate an alert, alarms should only be for actionable conditions.

Don’t cry wolf. The more events you blast out, the more apathetic maintenance personnel will become. Because hundreds of events (which require no action) can occur each day for every one critical alarm, there’s too much noise in the system for alarms to be meaningful. Not to mention the likelihood that a mission-critical alarm could easily get buried.

 

3.    Stop blindly using the manufacturer’s points list

Most OEMs provide a list of points that relate to their equipment, and owners and engineers use these lists in their control system specifications.  This list is a great starting point for systems integrators, but it’s not the end-all-be-all for SCADA alarming.

We recommend detailed design discussions be held between the systems integrator and the owner’s support personnel to identify the specific alarm conditions that apply to the owner’s equipment and facilities. During this discussion, the systems integrator can also become familiar with the owner’s standard naming conventions and can translate alarms conditions into a language their operators will be familiar with. We also recommend systems integrators work with the OEM to understand what alarm conditions may be triggered during equipment maintenance and testing.

 

4.    Stop sending alarms to everyone

Operators should only be notified of alarms on a “need to know” basis. Receiving alarms on equipment or systems that you have no responsibility for is a sure-fire way to create apathy towards your alarm management system. People only care about alarms that pertain to them. Chiller technicians don’t need to be bothered about generator issues. Network technicians don’t need to know if the boiler is offline. Blasting out alarm broadcasts to all maintenance personnel is distracting, inefficient, and dangerous as it can potentially result in missed alarms if there are too many nuisance alarms.

Understanding owners organize their workforce and how their process works is key to defining alarm groups. Are there conditions that need to be sent to outside support providers? Outside of normal business hours, should all alarms go to an on-call pager, but during normal business hours, to pre-defined groups? Maybe your process involves an escalation routine. If the designated alarm group does not acknowledge an alarm within a given timeframe, the alarm(s) get escalated to a higher level in the organization to ensure adequate response times are adhered to.

 

5.    Stop alarming on obvious issues

Sometimes one high-priority alarm will start a cascade of other not-as-critical alarms. If a circuit breaker at a PV solar generation site trips, an alarm will generate indicating that the breaker tripped. But you’ll probably also get an alarm that indicates loss of communications to each inverter. If you have 8 inverters, that’s 9 total alarms for only 1 actionable event.

Ask your SCADA integrator to build a functionality into your system that intelligently recognizes certain situations that could cause cascading alarms. Most of today’s automation systems can configure alarm logic or conditional alarms. Using this capability can help eliminate cascading alarms where a single condition triggers several other conditions. In the example above, the system could be configured to only generate a loss of communications if control power is present.

 

6.    Stop alarming when you’re doing maintenance

The testing and maintenance of equipment or systems can often result in nuisance alarms or events. We’ve found that using the alarm suppression capabilities or alarm logic/conditional alarming is a great way to eliminate these types of alarms.

 

7.    Stop using “Acknowledge All”

There is a common but concerning practice among operators inundated with pages and pages of alarms. Because they haven’t looked at the alarm list in some period, the pages keep filling up. Instead of looking through each alarm individually, they click the “acknowledge all” button, and all alarms get acknowledged.

The danger of acknowledging all alarms is that you might lose visibility to a high priority alarm that is buried within a long list of other alarms. One way to handle this is designing alarm management systems that keep alarms manageable. Maybe you have different alarm summary screens dedicated to each of the alarm priority groups, and for critical alarms you do not allow the ability to acknowledge all. Another way to discourage this behavior is to require that each operator log in to the system with their unique user ID, and that user ID is recorded with each acknowledged alarm (see #8).

If you stop alerting on events (see #2) and obvious issues (see #5), you should see your overall list reduce drastically.

 

8.    Enforce user logins for each operator

Operators must take responsibility for the actions they do (or don’t do) on their shift. Too many times I’ve heard horror stories where a critical alarm was missed, and because the system wasn’t set up with individual user logins, the operator responsible was never identified.

The best way to ensure operator responsibility is through individual, unique user logins. Do not use one generic user ID for all operators of the automation system. If you’re not already logging alerts in your historian, start today. In addition, you should be logging all actions made by your operator. If he/she hits the “acknowledge all” button, he/she alone should take responsibility of understanding what’s going on with every alarm.

 

9.    Stop treating alarms the same

All actionable conditions warrant attention, but some do more than others. A common but dangerous situation occurs when a critical alarm is pushed to the bottom of the HMI by lower priority alarms. If an already bogged-down operator works from top to bottom, the critical alarm might persist for hours, representing a safety or financial risk.

To efficiently manage alarms, your SCADA system needs an alarm priority scheme. All prioritized alarms should first be based on safety, and then on potential economic impact. Based on the number of critical alarms at any given time, one alarm could be considered a higher priority than another high-level alarm. Design your system with not only a priority numbering scheme, but differing color codes and indication to bring operator’s attention to the most pressing alarm conditions in the system.