This article has been adapted verbatim from a paper accepted and presented during CyberCon23 titled ‘Safety versus Security: What’s the difference and why it matters’ authored by Samuel Clements, Mike StJohn-Green and myself.
1. Abstract
In the “Safety-First” culture of the nuclear industry, the nuanced differences between security and safety can be lost. The use of powerful software-programmable digital technology provides the means to transform instrumentation and control systems, but also provides unparalleled opportunity for malicious action by criminals and others.
Computer security is therefore increasingly vital to support safety, and safety analysis is important to inform computer security. Superficially, the concepts are so similar that the words are often used interchangeably. This is especially true in languages like Spanish and Russian where the words seguridad and безопасность are used to define both safety and security. However, below the surface, the two disciplines are based on fundamentally different assumptions, so incompatibilities exist. These differences can lead to serious deficiencies in the response of safety systems to failures and of security systems to malicious attack.
In this paper the authors provide examples of these deficiencies, highlight important differences between the two concepts, demonstrate why they matter, and provide the reader with recommendations for how to address this issue.
2. Differences in concepts
Lack of precision in language leads to misunderstanding. This fallibility is especially true with the concepts of “safety” and “security,” which are related but distinct concepts that feature important differences. This section provides examples of possible misunderstandings and potential consequences. Seguridad vs Seguridad “Safety” and “Security” are essential concepts of a nuclear programme for which much thought and effort are invested in securing nuclear sites and assuring the safety of nuclear processes. Unfortunately, lack of precision in language leads to many misunderstandings, as illustrated in the following story.
A presentation about nuclear security was scheduled to be delivered to senior leaders of nuclear facilities. The presentation’s intent was to inform new leadership of the importance and impact of nuclear security. The leaders had backgrounds in nuclear engineering and safety but were not well versed in the various aspects of a nuclear security programme. The presentation was developed in English and translated to Spanish, the native language of the intended recipients. Thankfully, during the dry run exercise leading up to the presentation someone noticed context that was lost in translation. Spanish only has one word for both safety and security. Additional adjectives or qualifiers must be used to distinguish between them. Lacking this context, the whole purpose of the presentation might have been undermined and intended meanings lost. Unfortunately, Spanish is not alone in the linguistic trap of using a single word for both safety and security. Portuguese (segurança), Russian (безопасность), Swedish (säkerhet), and Danish (sikkerhed) all use a single word to describe both safety and security.
A decision-maker might conclude that because the word is the same, the single word covers all the activities required for safety and security. This could lead to some vital activities not happening as needed, such as new security funding lines not being approved because “Seguridad is already funded” elsewhere. Similarly, decision-makers might conclude that, because “seguridad” is in someone’s job title, the entire range of safety and security tasks have been adequately assigned. However, even with two different words, as in English, there is further potential for misunderstanding as noted by PiètreCambacédès, “Dozens of explicit, but distinct, definitions can be found ranging from slightly different to completely incompatible definitions” 1.
Security and safety are not two perfect mirror images that fit together.
In some organisations, the differences between safety and security are well accepted and different teams are tasked with each. It has been elegantly stated, “Safety: the system must not harm the world. Security: the world must not harm the system” 2). These two definitions may appear to be mirror images of one another, suggesting that current safety and current security activities are perfectly complementary; that together the two disciplines may appear to perfectly address all relevant risks to “the system” and from “the system”; and finally, that the two disciplines will not conflict. However, none of this is necessarily true unless active coordination, collaboration, and change occur.
Safety is defined as reducing the risks of harm arising from dangerous situations, often referred to as “hazards”, to an acceptable level 3. In general, safety analysis relies on modelling the failure of the components of the system to provide an acceptable level of confidence in the calculation of the residual safety risks.
Within the nuclear sector, Probabilistic Safety Analysis (PSA) and Deterministic Safety Analysis (DSA) are methods used to evaluate the safety of a system or process. PSA uses statistical methods and models to estimate the likelihood of potential accidents and their consequences, considering the nature of known failures and human errors, as well as uncertainties in the behaviour of the system.
DSA, on the other hand, uses deterministic methods and models to analyse the behaviour of the system under normal and abnormal conditions. It involves determining the sequences of events that can lead to an accident and the corrective actions that may be undertaken to prevent such an event. The results of a DSA are usually expressed in terms of failure scenarios and the likelihood of their occurrence. However, it is vital to note that malicious action is typically out of scope of such analysis.
Security is defined as reducing the risks arising from malicious action, stemming ultimately from human motivation and behaviour, which is intrinsically difficult to model and calibrate with any confidence. Any calculation of likelihood or probability related to malicious human behaviour is particularly difficult to justify with any confidence 4 5. Risks arising from malicious action cannot in general be eliminated, only reduced, leaving an irreducible but hard-to-calibrate residual risk that is relevant to the safety analysis. Further, the malicious actor is very likely to change the system as part of any attack, and those changes will affect the safety analysis. This became evident with the exposure of the Log4j vulnerability 6 which is used in logging functions of safety and security related systems.
This argument demonstrates that analyses of safety and security can make potentially incompatible assumptions and approximations. The general basis of the safety analysis is that the system under consideration is fully defined by the design and not subject to uncontrolled change, while security analysis generally makes the assumption that the adversary will change the system and that security measures cannot entirely prevent this, leaving an uncalibrated risk of change to the system.
Single or multiple simultaneous failures
As introduced above, in both methods of safety analysis (PSA and DSA), the initiating events for the accidents under consideration typically focus on accidental events, such as equipment failure, natural disasters, or human error. Such events are typically analysed on an individual system basis or are bounded by common groupings of systems, structures, and components that fall under well understood criteria (e.g., systems susceptible to seismic risk).
In contrast, a security analysis will typically focus on intentional events, such as malicious acts undertaken by an intelligent human adversary. Such a malicious adversary will be able to intelligently target systems, structures, and components to cause effects considered improbable (e.g., all redundant pumps fail at a coordinated time) or not considered within the design of the system (e.g., an actuation system, after experiencing a malicious modification to its setpoints, performs an action that propagates rather than contains the accident condition).
These examples highlight security events that have the potential to trigger systemic risks, where a localised failure has the possibility to spread and cause a widespread and significant impact on the safety and security of the larger system. Such a systemic risk potentially leads to the failure of multiple interconnected systems. Such risks must be considered and addressed within the context of the overall facility and its operating environment. Today’s safety and security analysis may be deficient in identifying such multi-domain risks.
3. Things that are often said but are not general truths
This section takes some statements that the authors have heard from well-intentioned engineers, who shall remain anonymous. These statements are helpful in revealing some of the prevailing misunderstandings across many industries, including the nuclear sector.
My system is designed to be fail-safe so additional security is not necessary.
When discussing nuclear security, you may hear “My system is designed to be fail-safe so additional security is not necessary” or “My system is designed for safety, so all this security is not necessary.” While it is true that nuclear systems are designed to fail-safe and said design specifications do provide some security attributes, they are not enough. A variety of methodologies exist to evaluate system safety; Failure Mode and Effects Analysis and Fault Tree Analysis are two of many. Generally, these methodologies identify random failures in processes or components that will cause the system to fail. Safety instrumented systems (SISs) are designed and integrated into nuclear systems to detect when the system is moving into a dangerous state. When unsafe conditions are detected, the SIS actuates to return the system to a safe state. However, an adversary will seek to cause single or multiple targeted simultaneous failures to cause the system to perform its function(s) improperly. Thus, these standard methodologies are incomplete in this regard.
SISs themselves can be the target of the attack to circumvent a layer of safety protection, as illustrated by the Trisis malware 7. As explained above, the security protection for a system and its function(s) can never be absolute; there will always be residual risk, in this case affecting the corresponding safety analysis. Thus, security needs to be an integral part of assuring the safety of a system.
OK, so I need security measures, but they can be added afterwards.
Once it is accepted that security measures are required, it is sometimes claimed, that security measures should be added to the system in such a way that they protect that system from unauthorised change and themselves make no changes to the system. The assumption that any safety assurance certification would remain valid, regardless of which security controls were applied to protect the system is highly attractive. Given the attractiveness of maintaining the safety certification requirement, it is worth exploring in more detail why this in general cannot be achieved.
First, as already explained, security controls cannot be demonstrated to provide perfect protection because the activities of the adversary cannot be perfectly modelled. Further, with software-based digital technology, there may be vulnerabilities that have not yet been discovered by the designer and operator of the system, but they may have been actively researched and discovered by the adversary—a so-called “Zero Day Vulnerability”.
Additionally, for systems of any complexity, especially those using software-based digital technology, security measures will change the system itself. The simplest example is any change to remedy a vulnerability, such as in an operating system, with a software patch. Similarly, a new method of attack may require a change to the system in order to defeat the attack scenario. In general, to achieve true defence-in-depth with security, the system itself will have to be changed, e.g., to harden it.
OK, but as a designer, I can still pass this off to the security team.
The belief that security and safety can be independently designed and executed is a flawed approach that results in suboptimal outcomes. This viewpoint, often referred to as “stove-piped thinking”, results in a narrow and incomplete analysis of both security and safety needs. Safety and security each have their unique strengths that are vital, but each has its own limitations in their current approaches.
In the security discipline, it is not uncommon for the focus to be solely on the security of information (e.g., with the pre-eminence of ISO-27000 8), thereby neglecting the important role that safety plays in the definition and attainment of security objectives. This can lead to an incomplete assessment of the functions that must be protected by security controls. For example, when using this incomplete approach, one may conclude that only physical protection measures are relevant for the protection of safety-critical functions, systems, structures, and components. This could leave critical safety functions and supporting systems more vulnerable to attack, while other less-consequential functions and systems are afforded a greater share of resources. Such incomplete analysis could lead to a misapplication of the graded approach 9 under which resources should be applied in proportion to the severity of the adverse consequences to the function.
On the other hand, safety analysis often lacks adequate consideration of security assumptions. For example, the scenarios may not feature multiple, coordinated failures caused by malicious actors who have a high-level knowledge of the operation and vulnerabilities of multiple systems in the facility. A narrow focus on safety can result in a failure to account for the effects of human adversaries, leaving the system vulnerable to attack.
Security assessments should be scheduled on the calendar, with safety reviews
Some want to accommodate security assessments within the existing safety lifecycle and timetable, asserting that security assessments are only necessary periodically. This approach fails to fully recognise the dynamic and constantly evolving nature of security risks.
Security is a response to the external environment, which is subject to constant change and intelligent, often malicious, human influences. A security review should be triggered by material changes to the underlying assumptions and changes to the risk landscape, e.g., new threat intelligence or new technologies. These changes often occur far beyond the boundaries of nuclear facilities. For example, a novel attack on other industries should be analysed for lessons to be acted upon. The Triton/Hatman/Trisis attack mentioned earlier is one such example that is relevant to most facilities, even if different hardware and software are used. The continuous re-assessment of the security posture requires an ongoing organisational commitment to maintaining awareness of the external environment and its implications for the security of the system.
In contrast, the safety assessment is often viewed as a periodic activity, scheduled as part of the safety lifecycle. While this approach may provide a baseline level of safety, it fails to account for the everchanging security environment and its potential impact on safety if a failure can be exploited to propagate an incident.
A comprehensive approach to security and safety therefore requires a holistic understanding of both domains, informed by the interdependence of security and safety needs, that informs the definition of security objectives. This approach recognises that security and safety are not separate, independent activities, but rather interrelated and interdependent. Failing to adopt this approach may result in deficiencies in the results and a heightened susceptibility to systemic risk.
Perfect bricks make a perfect house.
Faulty reasoning is sometimes seen in both the safety and security communities that components that have been tested and assured to a given standard will somehow convey this same standard to the parent system. This can be summarised as “Perfect bricks make a perfect house”. This is self-evidently not true: a bad craftsman can make a very poor building from perfect bricks; similarly, perfect bricks do not compensate for a poor architecture or hostile environmental factors.
Translating this to safety and security, we can reasonably conclude that a perfectly manufactured SIS or network firewall (even if such objects were feasible with software-based digital technology) cannot compensate for errors in their implementation and operation. This observation is especially relevant when using software-based digital technology because the flexible ways in which this technology can be used, which makes it so valuable, also provides multiple ways to make mistakes in its implementation.
Consequently, except for the simplest system, the security or safety functionality against which the components are assured is not guaranteed to provide the desired security or security objectives of the system as a whole. The same argument applies for systems that are combined to make a system-ofsystems, which is typical of most facilities.
4. Challenges and recommendations
In this final section, the authors offer some practical recommendations for how to avoid the misunderstandings and pitfalls described above.
Ensure decision-makers understand that safety and security are not the same thing.
Extra effort must be applied to clearly define terms, especially in languages in which safety and security are the same word. Use of similar terms, i.e., physical protection, radiological protection, or data confidentiality, will help. When communicating about safety or security, confirm that the reader has understood and is not interpreting content from a different frame of reference. Providing examples also helps clarify and illustrate the differences between the concepts. For example, a security incident event monitor is a security measure, but a backup power supply for emergency pumps is a safety measure.
Be wary of calls for safety to absorb security.
Some say that security is part of safety because, when viewed from an enormous distance, they appear to be doing the same thing—protecting the world from undesirable outcomes. However, taking this approach can introduce peril if either safety or security loses its current strengths and distinctive aspects. Key skills in security work include the ability to think like an adversary and to maintain a close eye on current computer security research and hacking trends across the globe. Security must be able to respond to changes in the external environment, which may require modifying the system within days, hours, or even seconds in the event of an attack in progress.
The concepts and best practices from each discipline can inform the other. For example, security behaviours would improve if the safety-first culture also applied to security. Reciprocally, safety might find that the maturity-based approach favoured by some security standards 10 provides a route to dealing better with systemic risk. The recommendation is therefore to make sure both are funded and staffed: they are both parts of a greater whole, which some call the One House approach.
Adopt a One House systems-engineering approach.
A systems-engineering approach to designing security and safety together recognises the interdependence of security and safety. It takes a universal approach to ensuring that both are effectively integrated into the design of the system and that the integrated approach continues throughout the entire system lifecycle. Such an approach resolves when security and safety requirements are in tension and recognises when they can be implemented in a complementary way 11 12.
Security and safety should be considered from the outset, starting with the functional design, to ensure that the functions of the facility will be performed in a safe and secure manner. As a result, safety and security controls are integrated into the system’s architecture, processes, and components. This approach considers the potential for security risks, the consequences of security failures, and the need for safetycritical functions and their supporting systems to be protected from attack within a graded approach.
A systems-engineering approach can accommodate the early integration of safety and security requirements into the design and maintain that integration throughout the operation of the system. The close and collaborative work by systems engineers, safety experts, and security experts has been called the One House approach. It can address the risks identified in this paper that might otherwise be unrecognised and neglected.
Beware of false confidence.
The engineering world is at varying stages in understanding and implementing this systems-engineering approach, and emerging guidance on nuclear safety and security is being developed by the International Electrotechnical Commission. Some are still unaware of the need for such an approach, while others may see the value but be unaware of the importance of integrating security and safety into the design process. Other engineers may lack the skills and knowledge to effectively integrate security and safety, having no suitable support, while still others may be consciously competent, recognising the importance of this approach, and working to implement it in their work. Much of this can be addressed by providing further guidance, updating education paradigms, and offering targeted training opportunities.
It is important to note that this systems-engineering approach is not easily applied retroactively. Once a system has been designed and deployed, incorporating security and safety into the design becomes much more difficult.
Final points – what should you do about this?
It depends on who you are, but you could ask yourself questions to prompt informed action. For example, does your staff understand the difference between safety and security? Are the terms clearly defined and built into policy, procedures, and training programmes? Does your documentation clearly differentiate between the two terms, especially with qualifiers and examples for languages for which only one word is used for both terms? Do your risk assessments and other analyses include both safety and security?
Finally, every reader should at least think about how security and safety work with each other within their own organisations and relative to their own responsibilities. If the answer is “perfectly”, the authors respectfully suggest that you may not have thought about this question long enough.
5. References
-
The SEMA referential framework: Avoiding ambiguities in the terms “security” and “safety”, Ludovic Piètre-Cambacédès, Claude Chaudet, International Journal of Critical Infrastructure Protection ↩︎
-
Engineering Safe and Secure Software Systems, p.61, Barry Boehm, Axelrod, W. C., Massachusetts, Artech House, 2013, page 61. ↩︎
-
ALARP as low as reasonably practicable. https://www.hse.gov.uk/managing/theory/alarpglance.htm United Kingdom Health and Safety Executive ↩︎
-
Why process safety risk and cyber security risk differ, Sinclair Koelemij, 23 July 2021. https://otcybersecurity.blog/2021/07/23/why-process-safety-risk-and-cyber-security-risk-differ/ and ↩︎
-
ICS cyber security risk criteria, Sinclair Koelemij, 13 August 2021 https://otcybersecurity.blog/2021/08/13/ics-cyber-security-risk-criteria/ ↩︎
-
Review of the December 2021 Log4j Event, US Department of Homeland Security, https://www.cisa.gov/sites/default/files/publications/CSRB-Report-on-Log4-July-11-2022_508.pdf ↩︎
-
TRISIS Malware: Analysis of Safety System Targeted Malware, DRAGOS, https://www.dragos.com/wp-content/uploads/TRISIS-01.pdf ↩︎
-
27000.org. 2019. ISO 27000 - ISO 27001 and ISO 27002 Standards. Available at: https://www.27000.org ↩︎
-
Nuclear Security Recommendations on Physical Protection of Nuclear Material and Nuclear Facilities (INFCIRC/225/Revision 5), IAEA Nuclear Security Series No. 13, IAEA, Vienna (2011) ↩︎
-
Cybersecurity Capability Maturity Model Version 2.1, US Department of Energy, https://c2m2.doe.gov/ ↩︎
-
“A rule-based approach for safety analysis using STAMP/STPA,” D. L. Gurgel, C. M. Hirata and J. De M. Bezerra, 2015 IEEE/AIAA 34th Digital Avionics Systems Conference (DASC), Prague, Czech Republic, 2015, pp. 7B2-1-7B2-8, doi: 10.1109/DASC.2015.7311464./ ↩︎
-
“Cyber-Informed Engineering, Managing Cyber Risk from Concept to Operation”, Idaho National Laboratory, https://inl.gov/cie/ ↩︎