Wide Area Measurement Systems, NASPInet, and Security
Deregulation, market transactions, congestion management, and the separation of functions have created increasing complexity that is making it difficult to maintain situational awareness and supervision of power system performance over large areas. Past reliability events (such as blackouts) have highlighted the need for better situational awareness and advanced applications to improve planning, operations, and maintenance. The deployment of a continent-wide wide area measurement system (WAMS) is an important part of the solution to these complex problems, but it faces challenges with respect to communications and security.
Wide Area Measurement System (WAMS)
In its recent book A Century of Innovation, the National Academy of Engineering listed widespread electrification first on its list of the top 20 engineering achievements of the 20th century. Although the highly interconnected North American electrical power grid is rightly hailed as a great engineering feat, managing and operating it in a reliable and safe way remains a challenge that involves many complex technical tasks that must be accomplished at different time and geographic scales. Such tasks include continuous feedback control, protection and control mechanisms that operate every few milliseconds at substations, state estimators and contingency analysis processes that operate every few minutes, and generation dispatch decisions to bring power plants online or take them off-line based on load or expected demand. In earlier years, control areas were vertically integrated in all respects and acted as quasi islands responsible for flow control. The interconnections among control areas enabled emergency flow paths and occasional economic benefits. Knowledge beyond control area boundaries was limited and often depended on slow point-to-point communications. Modern operations are far more complex, as reliability constraints require extensive congestion management with significant economic consequences. Further, given that various parts of the system are owned and operated by many independent entities, reliable operation of the grid depends on those tasks being accomplished at a range of geographic granularities and with a high level of coordination among the various entities that manage and operate the grid. With the passage of the Energy Policy Act of 2005 in the United States, the Federal Energy Regulatory Commission (FERC) and North American Electric Reliability Corporation (NERC) have been given additional authority to regulate electric power entities to ensure reliable operation of the grid.
Historically, the grid has been very reliable. While minor outages have been fairly common, large-scale and widespread outages have been rare, and most customer interruptions occur within relatively localized distribution infrastructure. An increasing demand for electricity has not been accompanied by increases in transmission capacity, however, putting growing pressure on the reliability and safety of the grid. Recent large blackouts and outages, such as the 14 August 2003 blackout in the Northeast and the 26 February 2008 outage in Florida, stand as evidence. The final report by the U.S.-Canada Power System Outage Task Force on the August 2003 blackout pointed out that the job of maintaining the system reliably had become harder because of reduced transmission margins. The report recommended the development and adoption of technologies, such as WAMS, that could improve system reliability by providing better wide area situational awareness.
Traditionally, sensor readings from substations in utilities are sent via a communication network to the supervisory control and data acquisition (SCADA) systems in the local utility and exchanged regionally with other utilities and reliability coordinators using the Inter-Control Center Communications Protocol (ICCP). Typically, SCADA systems acquire sensor data every 2–4 s. Since the data are not time-stamped at the point of measurement or acquired synchronously, they do not capture the state of the system at a given moment in time. Rather, the data can provide a good estimate of the system state, assuming that the system is in quasi-steady state. While the grid operates in quasi-steady state most of the time, increased stress on the system means that operators’ views of it must be more fine-grained and cover a wider area, moving across multiple organizations in order to improve the reliability and stability of the grid.
A WAMS can be defined as a system that takes measurements in the power grid at a high granularity, over a wide area, and across traditional control boundaries and then uses those measurements to improve grid stability through wide area situational awareness and advanced analysis. Certain power system measurements cannot be meaningfully -combined unless they are captured at the same time. An important requirement of a WAMS, therefore, is that the measurements be synchronized. A high sampling rate—typically, 30 or more samples per second—is particularly important for measuring system dynamics and is another important requirement of a WAMS. Certain elements of a WAMS have existed in rudimentary forms in the Western Interconnection since the early 1990s, and the cascading outage of 1996 provided the impetus for further WAMS development.
Many advanced applications can take advantage of the measurement capability provided by a WAMS, including:
- Wide area monitoring: High-speed, real-time measurement data and analysis are essential to achieve wide area visibility across the bulk power system for entire interconnections. Time-synchronized measurements from geographically dispersed locations throughout a large region enable better operational awareness of the real-time condition of the grid and allow operators to make better-informed decisions.
- Real-time operations: Real-time operations improve operators’ understanding of how to take advantage of the newfound visibility of grid dynamics, including interarea oscillatory modes and methods for damping and stabilizing frequency oscillations.
- Improved accuracy of models: Time-synchronized wide area measurements continue to be very valuable for improving the accuracy of planning models by precisely correlating simulation output with observed system behavior under a variety of conditions. Improved planning models enable better assessment of system behavior and will permit a more complete assessment of dynamic performance issues, such as disturbance response, voltage and frequency response, and stability performance.
- Forensic analysis: Synchronized measurement data collected at high sampling rates are also helpful for forensic analysis of blackouts and other grid disturbances. Because the data are collected at high speed and are time-synchronized, their analysis can lead to faster and better understanding of precise sequences of events.
Phasor measurement units (PMUs), developed in the early 1990s, were among the first devices that could monitor the grid in a synchronized way and produce coordinated phasor measurements, also known as synchrophasors. A GPS clock signal is the most commonly used mechanism for providing the time reference needed for synchronizing PMU measurements. Another distinguishing feature of PMUs, in addition to synchronized measurements, is their sampling rate, which ranges from 30 samples per second up to 120 samples per second in current implementations. Even the low end of that spectrum, 30 samples per second, is an order of magnitude higher than the sampling rate of SCADA systems, meaning that PMU devices are capable of measuring system dynamic performance in a manner that is not possible with traditional SCADA systems. Their synchronized monitoring and high sampling rate make PMUs the ideal class of monitoring device for a WAMS. Traditionally, PMUs were stand-alone devices, but today many devices such as relays and digital fault recorders (DFRs) also have the ability to produce synchrophasors at high sampling rates.
Realizing the need for wide area measurement, monitoring, and control across the continent and the potential of synchrophasor technology to enable these functions, the U.S. Department of Energy (DOE), the National Electric Reliability Council (NERC), and a range of electric utilities and other organizations formed the North American SynchroPhasor Initiative (NASPI) in 2007. NASPI’s vision is “to improve power system reliability through wide area measurement, monitoring and control,” and its mission is “to create a robust, widely available and secure synchronized data measurement infrastructure for the interconnected North American electric power system with associated analysis and monitoring tools for better planning and operation and improved reliability.”
Realizing a continent-wide WAMS requires not only synchronized measurement but also a high-speed communication infrastructure that enables secure sharing of synchronized monitoring data among control centers. There is an effort under way at NASPI to develop such an infrastructure, known as the NASPI Network (or NASPInet); it is being designed to be secure, standardized, distributed, and capable of supporting future needs. One of the key requirements for this communication infrastructure is that it must be able to support different classes of applications with varying levels of latency, accuracy, availability, message rate, and time-alignment requirements. For example, one class of applications, such as feedback control, places strict requirements on the latency, availability, and accuracy of data, while another class of applications, such as post-event analysis, values accuracy, availability, and sampling or message rate more than latency. The communication infrastructure should therefore be able to support different quality-of-service (QoS) classes for traffic and should be able to prioritize one class over another. Conceptually, as shown in Figure 1, NASPInet is made up of two components: the phasor gateway (PGW) and the data bus (DB). The PGW is envisioned as a utility’s or control center’s sole point of access to the DB. It will let the utility or control center share its synchrophasor data and obtain synchrophasor data from other utilities or control centers. The idea is that the data sharing will follow a publish-subscribe pattern, according to which a gateway that wishes to share data will publish them so that authorized gateways may subscribe to the published stream and receive the data. Each PGW will need to manage QoS and administer cybersecurity and access rights for the data it is sharing. The DB is envisioned as a wide area network that connects all the PGWs and provides the associated services for basic connectivity, QoS management, performance monitoring, and cybersecurity.
NASPInet’s Cybersecurity Requirements and Challenges
It is crucial to secure a WAMS in order to ensure the availability and integrity of the data it carries, which in turn affect the reliability of the power grid, since monitoring and control applications may rely on those data. The core security goals of a WAMS are to ensure the availability, integrity, and confidentiality of the data and the underlying computing and communication infrastructure. Furthermore, the data security should be ensured end to end, that is, from the time of data origination at the sensor to the time of use by a control or monitoring application. Achieving these security objectives is easier within a single organization (that is, from the measurement sensor to the control center owning or managing the sensor) than it is for an infrastructure distributed over a wide area like NASPInet, which is envisioned as enabling data sharing across organizational boundaries and helping to realize a continent-wide WAMS. Here we highlight the security requirements of NASPInet, the many security functions and mechanisms needed to meet them, and the challenges of realizing them.
Authentication, Authorization, and Access Control
Owners of sensor data would not want anyone other than authorized data-sharing partners to gain access to their data. Toward that end, they need to be able to ensure that an entity with which they are communicating is what it claims to be and that it is an authorized data-sharing partner. In other words, they need to be able to authenticate the entity with which they are communicating and verify that it is an authorized entity before they share their data. Similarly, a data receiver may want to authenticate the entity from which it is receiving data to make sure that the incoming data are legitimate.
A naive strategy for authentication is to create an out-of-band security and communication context and use it to establish communications and perform authentication. A more dynamic and scalable approach is desirable, however, and could include leveraging a trusted third-party service to establish trust and long-term cryptographic keys among the WAMS entities, such as their PGWs. The third-party service could be a Kerberos-like service that helps establish long-term symmetric cryptographic keys among entities, a certificate authority that issues digital certificates that are trusted by members of NASPInet, or just a simple secure and authenticated directory service in which entities like gateways can post their public keys or digital certificates. Once an entity is authenticated, its authorization to access the data needs to be verified. Access control lists (ACLs) associated with data are often used to specify the list of entities authorized to access the data. In such a case, authorization checking involves ensuring that the authenticated entity is listed in the ACL associated with the data. In addition to access control for data, which is enforced by the data owner’s gateway, there must be an access control mechanism at the network level to limit access only to authorized entities. In the case of NASPInet, the network-level access control is to be administered and enforced by the DB function.
Integrity and Confidentiality of Measurement Data
When sending data to an authenticated and authorized entity, it is necessary to protect the data’s confidentiality and integrity. It is important to protect measurement data confidentiality from malicious eavesdroppers because such data may contain information sensitive for the market or reveal sensitive information about the grid that could be exploited to disrupt grid operation. Encryption primitives are commonly used to protect data confidentiality. Similarly, it is important to protect measurement data integrity as inadvertent or malicious modification of measurement data could lead operators or applications to make catastrophic decisions. A typical approach is to use symmetric-key-based cryptographic message authentication (or integrity) codes to detect data tampering and to ensure that only legitimate data are accepted for use. Another notion, closely related to data integrity protection, is that of data origin or source authentication, which assures a receiver that data indeed originated at the entity from which the receiver was expecting data. As the symmetric key used to compute the cryptographic message authentication code is shared only between the sender and receiver, in a two-party setting (one sender to one receiver), verification of a message authentication code assures the receiver that the data were not tampered with in transit and that the data originated at the expected sender. Thus, in a two-party setting symmetric-key-based cryptographic message authentication codes provide both data integrity protection and data origin authentication.
Since a data owner might share the data with multiple entities at the same time, for efficiency reasons, a WAMS should support not just unicast or two-party data sharing (that is, one sender to one receiver), but also multicast or multiparty data sharing (that is, one sender to multiple receivers). So multicast integrity and confidentiality issues must also be addressed. Whereas the cryptographic key is shared between two parties in two-party or unicast data sharing, in a multicast setting, the cryptographic keys used for encryption and message integrity protection are shared among a group of entities. Support for multicast data sharing can make data sharing efficient, as shown in Figure 2. With support for multicast data sharing, the sender only needs to encrypt the data and compute the message authentication code once using the group keys; the sender then transmits the data only once, using the underlying multicast primitive. In contrast, without support for multicast data sharing, a sender will have to encrypt the data and compute the message authentication code separately for each receiving entity, with different keys for each receiver; then the sender must transmit the data as many times as there are receivers, increasing both communication and computation costs.
While multicast data sharing reduces communication and computation costs, it adds additional complexity for key management and for data origin authentication. Specifically, in a multicast setting, when a symmetric-key-based message authentication code is verified as valid by the receiver, the receiver is assured that the data haven’t been tampered with by anyone outside the multicast group. The receiver cannot be sure, however, that the data originated at any particular member of the group, as the symmetric key used to compute the cryptographic message authentication code is shared among all the multicast group members and any one of the members is technically capable of generating a valid message authentication code. As a result, the receiver may have to rely on other means of data origin authentication. In a secure, well-configured, and well-monitored network, the receiver may be able to rely on the network layer to provide assurances about the origin of data packets.
One straightforward way to achieve data origin authentication in a multicast setting without relying on network-layer guarantees is to use digital signatures, which use asymmetric keys (public-private key pairs), instead of symmetric-key-based message authentication codes. The data sender would digitally sign the data using a private key. When the signature is verified as valid using a public key, which corresponds to the private key and is distributed to all group members, the receivers can be sure that the data originated at that sender, as only that sender had access to the private key used to generate the signature. Unfortunately, digital signatures are expensive in terms of both computation and communication, and it is a challenge to meet real-time requirements when every measurement is digitally signed.
Schemes to amortize the signature cost over multiple measurements exist and could reduce the overhead associated with digital signatures. By definition, however, those schemes provide data source authentication for a group of measurements, and the group size must be picked carefully to reduce the costs per measurement while providing data source authentication at a meaningful granularity. Furthermore, loss of one or more measurements in the group might mean that the signature cannot be verified. To pursue this approach, it would be necessary to design mechanisms to prevent or deal with loss of measurements.
An alternative to schemes that rely on asymmetric-key-based (or public-key-based) cryptographic primitives would be schemes that use symmetric-key-based cryptographic primitives but use time synchronization between entities to create the asymmetry necessary for data origin authentication. But such schemes often introduce a great deal of key management complexity and, like the amortized signature schemes, result in verification delays.
Reliability coordinators and other regional entities may need to make decisions based on data from a WAMS that will have economic consequences for their members. They may be held accountable for those decisions, and they might have to defend them. They may therefore need to use an approach that not only protects data integrity but also prevents the data source or sender from denying having sent the data. In other words, they need a nonrepudiation property. Digital signatures are commonly employed to provide nonrepudiation. But as mentioned earlier, digital signatures are expensive in terms of both computation and communication, and it is difficult to meet real-time requirements when every measurement is digitally signed. While signature amortization schemes could perhaps be applied here as well, it is in general harder to provide nonrepudiation via alternative schemes that use symmetric-key-based cryptographic primitives but rely on time synchronization to create asymmetry.
An important aspect of NASPI’s network security solutions will be key management: the ability to generate, distribute, revoke, and update cryptographic keying material among NASPInet entities. The cryptographic keying material might be used to provide various security properties such as entity authentication, data confidentiality or integrity protection, and nonrepudiation. Long-term cryptographic keys established between entities, either with the help of a trusted third party or using an out-of-band mechanism, are often used for secure distribution of keys for confidentiality and data integrity protection.
Key management is more complex in multicast settings than in unicast settings, as the cryptographic keys are shared among a group of entities. When the group composition changes (that is, when a member of the group leaves or a new member joins), group keys need to be updated in a timely manner. Existing group keys need to be revoked and new group keys distributed rapidly, without disrupting the real-time measurement streams. Furthermore, in a multicast setting, a sending gateway may have to maintain a group key for every data stream that it is sharing; that would not be necessary in a unicast or pairwise setting, in which a pairwise key between the sending and receiving gateways might be sufficient to protect all data shared between them.
While multicast networks and their associated security challenges are common to several problem domains (such as audio and video conferencing, mobile ad hoc networks, and wireless sensor networks), NASPInet presents more stringent real-time requirements on data delivery. For certain control applications, latency requirements can range from ten to a few hundred milliseconds for continent-scale applications. Such real-time delivery requirements have a significant impact on security solutions. As discussed above, it is not feasible to digitally sign every data packet for data source authentication and integrity protection, as the -computation and communication overhead could lead to unacceptable delays in data delivery. That further complicates multicast security solutions, especially for message source authentication and integrity protection, and could suggest a need to deploy sophisticated key management solutions that allow timely source authentication of each data packet, utilizing symmetric-key solutions that are significantly more efficient than asymmetric-key-based digital signatures.
Data and Infrastructure Availability
To ensure data availability, it is necessary to ensure the integrity and availability of the underlying computing and communication infrastructure. While a carefully thought-out fault-tolerant design will help, such a design by itself will not be sufficient. As part of a critical infrastructure, NASPInet will be an attractive target and must be resilient against cyberattacks and intrusions by adversaries ranging from novices to nation-states. There should be mechanisms in place to protect against cyberattacks, to monitor for and detect cyberattacks and intrusions, and to respond to and recover from cyberattacks and intrusions in a timely manner. Network access control (NAC) is an example of a network-layer protective mechanism that prevents anyone other than authenticated and authorized devices and entities from accessing the measurement communication infrastructure. Secure logging, along with the associated auditing or monitoring functions, is an example of a mechanism that can help with investigation and recovery from intrusions.
While it is clear that data and infrastructure need to be protected, the level and kind of protection depend on each situation’s relevant threat model and risk assessment. For instance, must data be kept confidential from everyone other than the intended recipients, or is it sufficient to keep the data confidential from anyone outside the measurement network? The latter scenario might require simpler multicast security solutions than the former. Furthermore, even if data must be kept confidential from everyone besides the intended recipients, does the trust model assume that NASPInet organizations are honest, or does it assume that they are potentially malicious? The former scenario might lead to simpler security solutions than the latter. Likewise, depending on the kind of security services available from the underlying network layer or the level of trust in the underlying network layer, security solutions at higher layers such as the application layer may end up being simpler. For example, if the network layer is able to provide data origin authentication, then symmetric-key-based schemes may be sufficient to provide data confidentiality and integrity protection, thereby reducing complexity at the application layer. The requirements, threat model, and risk must therefore be carefully analyzed, as they have major implications for the security design of the system, including the policies, components, and tools needed for an appropriate solution. That said, once a security solution has been deployed, it is far easier to relax its security requirements than to make them more rigorous.
NASPInet: Current Status and Future Directions
The NASPI community is making steady progress toward achieving the vision of a continent-wide WAMS. Through its Smart Grid Investment Grant (SGIG) awards, the DOE is investing significantly in the deployment of hundreds of PMUs, along with the associated communications infrastructure, across the United States.
The realization of the vision of a continent-wide NASPInet will not be trivial, however. It faces many challenges, both technical and business-related. Potential options for creating such a network range from leveraging the public Internet to leased multiprotocol label switching (MPLS) circuits to utility-controlled fiber networks to completely isolated high-speed optical networks. Using the public Internet or other shared media poses QoS and security challenges. On the other end of the spectrum, one can provision and manage a completely isolated, private, high-speed network, but doing so could be prohibitively expensive and still retain security and QoS issues such as the identification of a trustworthy entity that would own and/or manage that network. In recognition of the challenges, many of the SGIG awardees with PMU projects are focusing on increasing PMU deployment and utilizing the data from those deployments at a regional level. The idea is to grow these regional systems into a continent-wide WAMS enabled by NASPInet in the future.
As part of the ongoing PMU data infrastructure development and deployment efforts, several key cybersecurity requirements will be addressed. In the preceding section there was a progression from basic security and functional requirements to more advanced ones, e.g., from unicast security to multicast security and from data origin authentication provided or supported by the network layer to application-level data origin authentication. Correspondingly, solutions that address the basic requirements are less expensive and better understood than those that meet more advanced requirements. Since the infrastructure is at an early stage of development, there is an opportunity to carefully consider a wide range of threats and security requirements so that security solutions can be built in from the ground up. This will let the WAMS be realized as a resilient critical infrastructure that can withstand sophisticated, targeted cyberattacks.
For Further Reading
G. Constable and B. Somerville, A Century of Innovation: Twenty Engineering Achievements That Transformed Our Lives. Washington, DC: National Academy Press, 2003.
North American SynchroPhasor Initiative. (2009, May). Data bus technical specifications for North American Synchrony-Phasor Initiative network. [Online]. Available here.
North American Synchro Phasor Initiative. (2009, May). Phasor gateway technical specifications for North American synchro-phasor initiative network. [Online]. Available here.
D. Novosel, V. Madani, B. Bhargava, K. Vu; and J. Cole. (2008, Jan.–Feb.). Dawn of the grid synchronization. IEEE Power Energy Mag. [Online]. 6(1), 49–60. Available here.
A. G. Phadke and R. M. de Moraes. (2008, Sept.–Oct.). The wide world of wide-area measurement. IEEE Power Energy Mag. [Online]. 6(5), 52–65. Available here.
R. Bobba, E. Heine, H. Khurana, and T. Yardley. (2010, Jan.). Exploring a tiered architecture for NASPInet. Presented at the Innovative Smart Grid Technologies Conf. (ISGT) [Online]. pp. 1–8, 19–21. Available here.
D. E. Bakken, A. Bose, C. H. Hauser, D. E. Whitehead, and G. C. Zweigle. (2011, June). Smart generation and transmission with coherent, real-time data. Proc. IEEE [Online]. 99(6), 928–951. Available here.
Rakesh B. Bobba is with the University of Illinois at Urbana-Champaign.
Jeff Dagle is with the Pacific Northwest National Laboratory.
Erich Heine is with the University of Illinois at Urbana-Champaign.
Himanshu Khurana is with Honeywell Automation and Control Systems Labs.
William H. Sanders is with the University of Illinois at Urbana-Champaign.
Peter Sauer is with the University of Illinois at Urbana-Champaign.
Tim Yardley is with the University of Illinois at Urbana-Champaign.