You Can't Solve What's Undefined

It's all over the news, whether we're talking about the TSA and "security theater" or Wikileaks and the sensitive data spewing out of government, business, and academia (there's a certain irony here, btw, insomuch as much of this data has likely been captured previously). There are "security problems" and they must be solved! Unfortunately, these "solutions" tend to be nothing remotely associated with the actual core problems. Instead, we end up with half-baked ideas that do no real good, or draconian laws that do more harm than good.

At the heart of the matter is one simple challenge: More often than not, leaders "solve" problems that are at best ill-defined. How many billions of dollars are being wasted each year on "solutions" that end up costing organizations more money, whether it be in maintaining the solution, or having to revamp business processes to fit the solution (instead of the other way around), or simply in going through the heartbreak of investing in a technology (*cough*naked scanners*cough*) that wasn't needed in the first place.

Define the Problem, THEN Solve It

As much as I'd love to harp on TSA, I think the Wikileaks situation provides a really good example of the knee-jerk reactions we're starting to see in both the public and private sectors. Vendors are rushing in to fill the demand for solutions, but the clamoring is from uninformed politicians and executives. Who is defining the problem that needs to be solved? Why is there even so much focus on Wikileaks, which is itself no more than a conduit, rather than on the perpetrator of the crime here (one "Private Bradley Manning")?

Bruce Schneier makes a good point on his blog today in noting that the real issue here is how access to data is controlled and monitored. Access control, in particular, is a very hard problem to solve. In fact, it's a very easy problem to solve for a small number of people, but an extremely difficult problem to solve in scale. The promise of RBAC has become stories of legend, and no really good alternative models have become readily accepted and available. Stop and consider: If you're trying to control data for 500,000 people, how do you define that access control model? Define the model needed, and then let's talk about solutions (or the lack thereof).

Before running off to line up "solutions" for this "problem" we need to make sure we even know what the problem is. If you listen to media reports on the Wikileaks incident, you'd think the problem was "data leak" or being able to control or quash leaked data in the wild. Let's be honest with ourselves for a moment and understand something: a) you can't quash the data once it's out, and b) "data leak" is first and foremost an access control issue, and secondarily a monitoring issue. Yet every DLP vendor on the planet has likely descended upon the State Dept and Pentagon in the past couple weeks to hawk their wares.

Putting this into a different context, let's talk about the TSA. I've already covered them at length (see here and here), but it's important to highlight, in this context, one important point; that they are not solving the right problem. The TSA, with their mandate on perimeter security, is focusing on trying to stop objects rather than people. That is, rather than trying to identify people who are behaving suspiciously, they're instead trying to stop an infinite number of attack tools that an attacker might use. The problem, of course, is that this leads them to focus on the last successful method, which is unlikely to have any commonality with the next attack, rather than looking at the actual common factor: the attacker. This situation, hands-down, reflects an improper problem definition, and results in the inanity that is the TSA screening program.

Be Wary of Vendors

To make a bad situation worse, not only do we not define the problems well that need to be solved, but we then listen to the vendors, as if they have a clue what problem needs to be solved. We need to be exceedingly careful when dealing with vendors. They're motivated to sell you a product, whether you need it or not. Note that they don't come in and perform a free, full-scope assessment and analysis before they pitch you on their wares. That should tell you something. Unfortunately, as Mike Smith notes in his blog, this is exactly the situation that is arising around the Wikileaks debacle.

If a vendor comes to you to sell you a solution, then you need to ask a few quick questions:
1) Were they invited in to discuss solving a known, reasonably-well-defined problem?
2) Has the problem even been defined yet?
3) Are they trying to tell you what the problem is?
4) Do they have much, if any, context specific to your organization around their solution pitch?

If the vendor wasn't invited in to solve a problem, or if they're trying to tell you what the problem really is, or if you don't even really know what the problem is yet, or if they have no reasonable context around how their solution addresses your specific problem, then these should be red flags and you should problem kick them out. If you don't know what the problem is yet, then why are you looking at "solutions"? If you're not confident in defining the problem internally, then find someone who can help (e.g., consultant).

Engineer Solutions

We have a fatal flaw in the security industry. Despite knowing better, we more often than not seem to consider ourselves to be neither Science nor Engineering. This is problematic when it comes to designing solutions, because it means we lack a solid foundation based in reason, rational thought, and common understanding. More now than ever, we need to come to grips with basic engineering principles around defining a problem space and then developing alternative solutions to address that space.

Following good engineering practices will allow for the development or identification of reasonable solutions to reasonably-well-defined problems. A reasonable engineering process will look something like this:
1) Define the problem. What is the problem? Has a root cause analysis been performed? Are you looking at correlated or causal results in the analysis?

2) Define resources and constraints. How many people are available to help solve the problem? What sort of budget limitations will you face? How much cost is associated with the problem as it is (i.e., if you did nothing, what would the cost be?)?

3) Develop alternative solutions. There should never be a single solution proposed initially. There should be a set of solutions (ideally a minimum of 3). The set may include "do nothing" as an option. This is not as time-consuming a process as one might think.

4) Develop a cost model for each alternative. Once the alternatives are defined, then model their costs over time.

5) Evaluate the potential impact of each alternative. In addition to modeling cost, you should also model likely impact. How effective will the solution be? If we're talking risk management, then how much will risk be reduced or transferred?

6) Make an informed decision. Given a slate of choices, a model of costs for each, and an analysis of their likely impact, pick the best solution for your organization. Document this process and decision for posterity (this will help with legal defensibility).

7) Pilot, test, measure, and refine the solution. Pilot your chosen solution. Test it out and measure the impact. Refine the solution as necessary. Are cost and impact bearing out as modeled, or do we need to revise those models and revisit the decision?

8) Implement the solution. Once everything is on-spec, then it's time to do your full deployment.

To me, the above steps are fundamental and logical. Unfortunately, it's exceedingly rare to see these processes truly followed, especially when the media is fanning the flames in hopes of generating a knee-jerk reaction. It's imperative that we not allow ourselves to be bullied into making bad decisions. Doing "something" for the sake of being seen as taking action is a lousy justification for doing the wrong thing or making the problem worse.

Focus on Process, Not Policy

In practical terms, one area where we often see a rapid response that ultimately makes little sense is in security policies (or the law). It's not uncommon for the first response to a security incident to be the handing down of a new policy that (even more) explicitly forbids whatever was done that led to the incident. Unfortunately, these snap judgments rarely address the root causes, but instead are emotional responses meant to demonstrate "leadership" in the face of adversity. Unfortunately, these new policies tend to persist and create difficulties down the road. Just as it's rare to see a new "temporary" tax or spending increase backed-out, so it's seemingly inevitable that new knee-jerk policies will become permanently codified in the policy framework, for better or for worse.

This reminds me of a story... a few years ago I was called to a meeting with a client to discuss how to securely get VoIP extended via the Internet to a remote call center. Leased lines were too expensive, as was call routing over traditional lines. Unfortunately, due to a worm infestation a few years prior (likely SQL Slammer), the security department had banned "all UDP" from crossing the firewall. This was a curious situation, because as far as I knew, the company was still able to perform DNS queries. When I asked about this, the firewall engineer first explained their DNS servers were inside the firewalls. Ooookay... I then pointed out that recursive queries still had to pass through the firewall to the Internet, to which he then told me that, no, it was ok, they didn't TCP-based zone transfers. Ummm... yeah. At this point I just rolled my eyes and let it go. The best part was that we were advocating for VoIPSec (VoIP-over-IPSec), and we were still told that we couldn't use that as it was still UDP, even if it was encapsulated within IPSec. Yes, this really did happen.

We put too much emphasis on mindless policies instead of investing time and energy into securing functional processes (or even just defining reasonable processes). Policies don't generally tell us how to do a better job in our daily tasks. Instead, we oftentimes follow processes and procedures (formal and informal) to get our work done. I submit that investing energy in understanding and better-defining those processes and procedures is a far better starting point than writing arbitrary policies that don't appear to have any application to the task at hand of the average employee. Secure the processes, and then worry about having a policy framework, in the form of a knowledge base, to help back that up.

Ask Not What You Won't Yourself Do

And, btw, how's the example we're setting? Are we so busy telling people how poor a job they're doing that we are forgetting to raise up successes? At some point, excessive negativity will simply create a self-perpetuating downward spiral that results in enabling bad behavior. As part of this point are three considerations:

1) Create opportunities for success (and failure). If you want people to be successful, then you need to give them the space to find that success. Part of this process means relaxing less-critical rules to allow for experimentation. Oh, and by the way, this also means letting people fail, with a caveat. The caveat is that failures should be reasonably controlled. Optimize detection and recovery capabilities so that failures are rapidly captured, contained, and cleaned-up. Then it ain't so bad, plus you might learn something from it. ;)

2) Exalt success, temper reactions to (controlled) failure. If you want people to improve security, then you have to give them the freedom to achieve that goal. Set objectives/goals and then allow people to work toward them. When people or teams find success, celebrate them and hold them up as model students. At the same time, don't overreact to failures; especially controlled failures. If people are behaving within normal ranges and happen to trigger a failure, and if you're able to detect it and recover from it rapidly, then where is the true harm? It's more the edge, uncontainable failures you want to hammer hard, but which will hopefully become very rare as your overall program becomes more resilient and demonstrates better survivability traits.

3) Lead by example. Now is not a good time for the cobbler's kids to have no shoes. If your security program doesn't have reasonably-well-defined processes that are also reasonably secure, then why should anybody else's? Security is in your job description and it's (probably) not in theirs. See the problem? I thought you might... ;)

Putting It All Together

This post has gotten a wee bit far-flung from beginning to end, so it's worthwhile to tie it all back together here. The root issue is this: solutions are being flogged for problems that are undefined. We need to stop letting that happen. No solutions should be discussed until the problems are reasonably well understood. Once you understand the problem, you should then follow reasonably sound engineering practices in considering alternative solutions. We must actively resist knee-jerk reactions, such as issuing mindless policy after mindless policy that in the long-term may end up exacerbating the problem, creating other problems, or just overall undermining your security program.

Instead, as part of the problem definition, we should look at the affected processes (formal or informal). If we can develop solutions that enhancing processes that are tied to problems, then we can start talking about how to improve those processes over time. If we need tools or policies to help bolster those processes, then great, not a problem. At the same time, we need to make sure that we give people room to innovate and find success, even though that also means giving people enough slack to fail.

Failure is not, in and of itself, a bad thing. In fact, we tend to learn some very interesting things from failure. However, to allow for failure (which is inevitable), this then puts the impetus on our security programs to have strong failure detection and recovery capabilities. Looking at situations like Wikileaks, this means that we should have the ability to a) define reasonable access controls, b) monitor those access controls and associate accesses to data, and c) be able to react and recover in a reasonable period of time to detected failure modes. In the TSA context, this means accepting that bad things will get through checkpoints no matter how invasive those screening procedures might become, and instead focusing on analyzing people throughout the entire environment, readily detecting suspicious behavior, and rapidly responding to those anomalies. It's the same reason Fire/EMS teams train so diligently, and it makes good sense.

Lastly, all of this needs to be done within an overall survivability and legal defensibility context. 'Nuff said. ;)

About this Entry

This page contains a single entry by Ben Tomhave published on December 9, 2010 2:09 PM.

How to Run a BSides: Reflections on Ottawa was the previous entry in this blog.

The Holiday Blur... is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Monthly Archives

Pages

  • about
Powered by Movable Type 6.3.7