Many well-known security incidents appear to have a common pattern. They are not the result of some awesome attacker capability to exploit some hitherto unknown vulnerability or to realize a risk from some combination of controls weakness not contemplated. Rather, a remarkably common pattern is that the control or controls that would have stopped the attack (or otherwise detected/contained it) were thought to be present and operational but for some reason were actually not - just when they were most needed.
There are many reasons for these failures to sustain the correct implementation of important controls, for example: a bad change, operational error, a new implementation that was incomplete, other faults/breakage, some wider change that nullified the in-situ control. In fact any issue that drives any type of system error can be an issue that negates or disables a control. The incident/after action reviews no doubt have the same moments of people exclaiming: “But didn’t we have a [ process | tool | component ] to stop that happening?”
Incidentally, we talk about attacks but a similar pattern exists for control failures that lead to other realized risk for other types of incidents across the full spectrum of enterprise risk domains. So, what to do. Treat controls as first class objects like other parts of system function.
Build a catalog of key controls using a well formed ontology (I’ve not totally drunk the overall FAIR cool-aid but their controls ontology is very good).
Conduct independent assurance / design reviews for key controls. This doesn't have to be fully independent - but at least a peer review in whatever development methodology / style you operate.
Treat controls (especially security controls) as automation / code. Build tests / coverage for control correctness as you would with other code.
Test for the presence and integration of controls at build time in integration / deploy suites. Different styles of test will be needed depending on the nature of the controls (component, software, hardware etc.)
Perform continuous control monitoring to assure the continued correct operation of controls at run time and ensure completion of deployment. Minimize the time between a potential control failure and the detection/correction of that failure.
[Big point] Declare any control that doesn’t prove amenable to such assurance or doesn’t emit the data needed for continuous monitoring to be a control in need of improvement / replacement.
When a control (or instance of a control) is detected as having failed then declare a "control incident" and handle as if a security incident has occurred (as it might well actually become if not attended to quickly enough).
Treat control incidents as first class objects alongside security incidents (reporting, escalation, after action review, thematic analysis) whether or not the control incident actually resulted in a security incident. [consider close-calls as well as actual incidents].
There are emerging tools to help (e.g. some vendors and many cloud-native tools) + what some orgs have self-built. Btw - many so called GRC tools don’t do much here unless you count intermittent human self-assessment as continuous monitoring - some are adjusting though.
Bottom line : many incidents are not due to a lack of conception of controls but due to failures of expected controls. Hence the need to conduct continuous control monitoring & treat control incidents as first class events like security incidents.