Is Complexity the Enemy of Security?
Since the last post about leverage points in managing complex systems I thought it would be good to revisit and update a post from a few years ago looking at the seemingly accepted wisdom that complexity is the enemy of security.
This remains an interesting question because the consensus answer would seem to be yes. In some sense I would agree, at least in the sense you certainly should not go about deliberately trying to introduce complexity in the name of security.
Although, the problem with casually throwing around the phrase “complexity is the enemy of security” is that it sets up the naive answer that we should fix all the woes of security by eliminating complexity. But this cannot be the answer for the basic reason that all systems are necessarily complex to be useful. It is just a question of where the complexity is and at what level of abstraction. Tesler’s, so-called, Law of the Conservation of Complexity sums this up nicely, albeit in this case specifically directed at UX matters, but I think it follows elsewhere.
In reasoning about complexity and security we also have to deal with the fact that security is not something you can design into complex systems. Rather, security is an emergent property of a complex system. You don’t do security. You do things to get security. Hence our prior discussion on what are the leverage points to do this.
Even if we have what we might describe as a simple system then that can still be quite hard to secure. Simple systems can be equally confusing and hard to manage if not designed well. This becomes more of an issue with scale since, unless we are extraordinarily careful, issues in complex systems are a function of (Simple)^N vs. (Simple)xN.
We are being lazy if, in the face of security issues, we blame the complexity of the system, environment, or eco-system. In my experience, when you apply some rigor and intellectual honesty you often find not an issue with the complexity itself but rather with faulty design that has failed to deal with that, often inherent, complexity.
So, complexity is not the enemy of security. Bad design is.
Now, what are some examples of good design principles? Here is a list of some of the ones I think are most useful, and I know this is nowhere near exhaustive but at least represents a starting toolkit in addition to the leverage points:
Abstract complexity away through hiding it behind APIs and other interfaces. Then, think about security policy objectives and see if they require you to consider and enforce controls at multiple levels of abstraction. If you are, then the abstraction is not designed well enough. For example, using network or application segmentation is a vital security principle, but it is hard work as the tools most often used to achieve this work at one level of abstraction while the policy models and operational reality (to make it work without breaking your business) exist at other levels of abstraction. For example, look at this highly stylized map of the layers of financial sector and think about a business segmentation objective and then think where and how you would actually technically implement that.
2. Linked Behaviors
Establish linking conditions across elements of the system or across layers of abstraction. Setting one control objective shouldn't need to have the implied need to set additional controls applied in multiple places by other elements of the control plane or by additional human toil. For example, if you have a goal to have one protected entry point into a system then the control plane should universally and automatically enforce that goal across all elements of the stack whether it is ingress controllers, load balancers, perimeter front end gateways, service meshes or distributed firewalls. This is hard, very hard, and I don’t think any environment does this well (or well enough for claims of effectiveness to be fully substantiated). However, some cloud providers are getting progressively closer to this and in some, so-called, serverless models are now pretty much there.
3. Opinionated Defaults - Shipping with Full Safeties On
Improving the handling of system-wide control objectives to cause better security properties to emerge through linked behaviors can be made easier by the comprehensive setting of control defaults. This could be everything from specific control measures like having encryption on, everywhere, by default through to ensuring systems and components are closed when instantiated and then have to be more loosely configured as needed. This is where the canon of wider security principles can be applied, from least privilege, fail closed, default deny, protocol least privilege / allow-listing and so on.
4. Declarative Configuration
Define a configuration of the system in a declarative manner so that the actual configuration of the system can be generated and continuously compared to the specification. Treat policy / controls as a lifecycle managed part of the configuration. If you have controls as code then you need to inspect the ability of that control configuration to match intended policy, not just inspect the instantiated environment that results from the declarative configuration. Cloud or on-premise cloud-like modern IT environments have this at their core and as a result deal with complexity much better.
A by-product of a declarative approach to configuration, although by no means guaranteed, is the property of idempotence. In other words, if you push something to happen such as enforcing a particular control then no matter how many times you push it won’t have any other effect than sustaining the control. This is an important design property for applying declarative configurations in complex environments as you simply assert the outcome and let the control plane worry about state. But, designing controls to support idempotence can be challenging, especially in legacy environments.
6. Visualization and Great UX
People think visually, and no matter how good we get at being comfortable that our goals have been expressed in the configuration specification we can still find flaws in our specification by being able to diagrammatically represent that. Overlay visual design cues to highlight where there might be trouble, for example, single points of failure or composition of services in ways that fail to achieve an overall SLO. The tooling we provide engineers and end users should be designed to be intuitive and provide immediate feedback as to whether the intent of an action was in fact performed.
8. Observability and Feedback Loops
Observability of the behavior of the system overall is critical, but it is only useful if the data from observation is put into feedback loops - either positive or negative - to correctly amplify or dampen behaviors. To reiterate what we started with, security is an emergent property of a complex system. One of the ways to drive that emergence is by taking action as a result of feedback loops along with other leverage points.
9. Reduce Error Messages and Guidance
Work to eliminate error messages, user guides and other configuration or set-up guidance for systems. Yes, this is an extreme statement but is one that should be aimed for anyway. Instead of creating better error messages work to reduce the scope of errors needing messaging. Similarly, if your user and configuration guides keep getting bigger and more numerous then ask yourself: are you making the right design choices on defaults, linked behaviors, and levels of abstraction?
10. System-wide Invariants : People, Process and Technology
Like defaults, and other good elements of design, it is valuable to set system-wide invariants. These properties that you want to be held true need supporting processes to enforce them. For example, if you want no single points of failure then build processes to find them and the feedback loops to eliminate them, and the design review practices to discourage the design patterns that lead to them. Additionally, work hard to not experience broken processes.
11. Desire Lines : Principle of Least Action
One unifying theme here, borrowed from physics, is that where we see return on investment, our work aligns with the principle of least action. If our controls and approach can align with the natural "happy path" for our employees and our customers then that will likely provide the maximum returns over time. Another concept is desire lines, if you consistently see an actual or attempted behavior (users or engineers) then you should call that out and make that a secure path. This is like the, perhaps apocryphal, stories of only paving public walkways after the patterns of walking (and wearing away grass) have been observed in practice.
12. Chaos Engineering
Creating random failures in a complex environment teaches engineers and other parts of the system to build for assumed failure. In an environment of large enough scale you likely don’t need to introduce such failures, they’ll happen anyway and the best design is to not try to avoid them (although you should not let per unit failure exceed efficiency expectations) but, rather, make resilience a system goal, not a component goal. Kelly Shortridge and Aaron Rinehart have written an amazing book that casts chaos engineering into security.
13. 5Ys and Blame-free Post-mortems
The final aspect of good design is the ultimate feedback loop of looking for the root cause of the root cause (the system cause) when an event or close-call has occurred. Follow the 5Y’s approach: asking "why?" until you can get no further is crucial. This needs the psychological safety of blame-free post-mortems to do this well. It is also important to remember that human error is not an explanation but rather something to be explained. Most of the time when I've seen claims of human error as an incident cause I have, when digging deeper, been amazed at how well the humans have actually been performing in the face of bad design and their performance has actually kept the incident levels less numerous despite that.
Bottom line: Complexity is not the enemy of security. Bad design is. All useful things are complex at some level of abstraction. To wish the complexity away is to fail to apply good design to deal with it. We need a better body of knowledge of good design principles, not more policies and control objectives. Other engineering disciplines are much better at this.