Maintaining software with the Swiss Cheese approach

From Nick Jenkins
Jump to: navigation, search

When investigating airplane disasters, the crash investigators sometimes use a Swiss Cheese model - known as "Reason's Swiss Cheese theory of organisational vulnerability" - where any overall failure is seen as a result of a whole series of failures (rather than just one failure), and where as many of those failures as possible should be fixed.

You can use the same approach to fixing defects in software. When a problem is encountered by a user, it can sometimes be caused by a whole series of independent failures at different points in the software. When fixing these problems, the easiest and quickest solution is to add a check to find these problems where they first were caused, and thus to prevent them once. The better answer though is to work backwards. Find the last point where something went wrong, and fix that. Test that your fix works. Then find the second last point, and fix that. Test that this fix works. Keep going until all the failure points have been fixed, including the original failure point. For example, using this approach, I've previously encountered problems that upon investigation were caused by problems in 4 separate places. I've fixed all four, but the easy alternative would be to fix these only in the originating one place.

This can be thought of as a form of defensive programming - solving problems at every point where there is a problem, instead of fixing them just at the easiest point. It's true that most of the time, that nobody will probably ever see the deep fixes to such problems. But what it will do is prevent this problem, and probably anything similar to it, from happening again, plus you'll know that the problem has been comprehensively fixed. Which, if you're maintaining the software over the long haul, is what you want. After all, you didn't find the problem before, so how can you be sure you've solved all the related problems this time, unless you really do fix every single component of the current failure?