Is Complexity the Enemy of Security?
One of the many pieces of accepted wisdom in information/cybersecurity is that complexity is the enemy of security. But is it? You certainly should not go about deliberately trying to introduce complexity in the name of security - although I’m sure we all have our catalog of recommendations and products that seem to take that approach. Even if we do assert that complexity is the enemy of security then simplicity cannot be the answer for the basic reason that all systems are necessarily complex to be useful. It is just a question of where the complexity is.
Tesler’s, so called, Law of the Conservation of Complexity sums this up nicely, albeit in this case specifically directed at UX matters, but I think it follows elsewhere.
In reasoning about complexity and security we also have to deal with the fact that security is not something you can design in to complex systems. Rather, security is an emergent property of a complex system. You don’t do security. You do things to get security.
Even if we have what we might describe as a simple system then that can still be quite hard to secure. Simple systems can be equally confusing and hard to manage if not designed well. This becomes more of an issue with scale since, unless we are extraordinarily careful, issues in complex systems are a function of (Simple)^N vs. (Simple)xN. Most of the time when we encounter security issues and then blame the complexity of the system, environment, or eco-system we are being lazy. In my experience, when you do this with some rigor and intellectual honesty you often find not an issue with the complexity itself but rather with faulty design that has failed to deal with that complexity or has created operator confusion.
Complexity is not the enemy of security. Bad design is. So, what are some examples of good design principles. Here is a list of some of the ones I think are most useful, and I know this is nowhere near exhaustive:
Abstract complexity away through hiding it behind APIs and other interfaces. Then, think about security policy objectives and see if they require you to consider and enforce controls at multiple levels of abstraction. If you are then the abstraction is not designed well enough. For example, using network or application segmentation is a vital security principle but it is hard work as the tools most often used today work at one level of abstraction but the policy models and operational reality to make it work without breaking your business exist at other levels of abstraction. For example, look at this highly stylized map of the layers of financial sector and think about a business segmentation objective and then where and how you would actually implement that.
2. Linked Behaviors
Establish linking conditions across elements of the system or across layers of abstraction. Setting one control objective shouldn't need to have the implied need to set additional controls applied in multiple places by other elements of the control plane or by additional human toil. For example, if you have a goal to have one protected entry point into a system then the control plane should universally and autonomically enforce that goal across all elements of the stack whether it is ingress controllers, load balancers, perimeter front end gateways, service meshes or distributed firewalls. This is hard, very hard, and I don’t think any environment does this well (or well enough for claims of effectiveness to be fully substantiated). However, the major cloud providers are getting ever closer to this.
3. Opinionated Defaults
Improving the handling of system wide control objectives to cause better security properties to emerge through linked behaviors can be made easier through comprehensive setting of control defaults. This could be everything from specific control measures like having encryption on, everywhere, by default through to ensuring systems and components are closed when instantiated and then have to be more loosely configured as needed. This is where the canon of wider security principles can be applied, from least privilege, fail closed, default deny, protocol least privilege / allow-listing and so on. Again, it’s much easier to get to this stance in the cloud.
4. Declarative Configuration
Define a configuration of the system in a declarative manner so that the actual configuration of the system can be generated and continuously compared to the specification. Treat policy/controls as a lifecycle managed part of the configuration. If you have controls as code then you need to inspect the ability of that control configuration to match intended policy, not just inspect the instantiated environment that results from the declarative configuration. At the risk of getting too predictable, cloud or on-premise cloud-like modern IT environments have this at their core and as a result deal with complexity much better.
A by-product of a declarative approach to configuration, although by no means guaranteed is the property of idempotence. In other words, if you push something to happen such as enforcing a particular control then no matter how many times you push it won’t have any other affect than sustaining the control. This is an important design property for applying declarative configurations in complex environments as you simply assert the outcome and let the control plane worry about state. But, designing controls to support idempotence can be challenging.
6. Visualization and Great UX
People think visually, and no matter how good we get at being comfortable that our goals have been expressed in the configuration specification we can still find flaws in our specification by being able to diagrammatical represent that. Overlay visual design cues to highlight where there be might be trouble, for example, single points of failure or composition of services in ways to that fail to achieve an overall SLO. The tooling we provide engineers and end users should be designed to be intuitive and provide immediate feedback as to whether the intent of an action was in fact performed.
8. Observability and Feedback Loops
Observability of the behavior of the system overall is critical, but it is only useful if the data from observation is put into feedback loops - either positive or negative - to correctly amplify or dampen behaviors. To reiterate what we started with, security is an emergent property of a complex system. One of the ways to drive that emergence is by taking action as a result of feedback loops.
9. Reduce Error Messages and Guidance
Work to eliminate error messages, user guides and other configuration or set-up guidance for systems. Yes, this is an extreme statement but is one that should be aimed for anyway. Instead of creating better error messages work to reduce the scope of errors needing messaging. Similarly, if your user and configuration guides keep getting bigger and more numerous then ask yourself: are you making the right design choices on defaults, linked behaviors, and levels of abstraction?
10. System-wide Invariants : People, Process and Technology
Like defaults, and other good elements of design, it is valuable to set system wide invariants - properties you want to be held true - and then build processes to enforce them. For example, if you want no single points of failure then build processes to find them and the feedback loops to eliminate them, and the design review practices to discourage the design patterns that lead to them. Additionally, work hard to not experience broken processes.
11. Desire Lines : Principle of Least Action
One unifying theme here, borrowed from physics, is that where we see return on investment, our work aligns with the principle of least action. If our controls and approach can align with the natural "happy path" for our employees and our customers then that will likely provide the maximum returns over time. Another concept, is desire lines, if you consistently see an actual or attempted engineer or end user behavior (particularly in a complex environment) then you should find that and make that a secure path. This is like the, perhaps apocryphal, stories of only paving public walk ways after the patterns of walking (and wearing away grass) have been observed in practice.
12. Chaos Engineering
Creating random failures in a complex environment teaches engineers and other parts of the system to build for assumed failure. In an environment of large enough scale you likely don’t need to introduce such failures, they’ll happen anyway and the best design is to not try to avoid them (although you should not let per unit failure exceed efficiency expectations) but, rather, make resilience a system not a component goal.
13. 5Ys and Blame-free Post-mortems
The final aspect of good design, is the ultimate feedback loop of looking for the root cause of the root cause (the system cause) when an event or close-call has occurred. Follow the 5Y’s approach: asking "why?" until you can get no further is crucial. This needs the psychological safety of blame-free post-mortems to do this well. It is also important to remember that human error is not an explanation but rather something to be explained. Most of the time when I've seen claims of human error as an incident cause I have, when digging deeper, been amazed at how well the humans have actually been performing in the face of bad design to keep the incident levels less numerous.
Bottom line: Complexity is not the enemy of security. Bad design is. All useful things are complex at some level of abstraction. To wish the complexity away is to fail to apply good design to deal with it. We need a better body of knowledge of good design principles not more policies and control objectives. Other engineering disciplines are much better at this.