The Marginal Utility of Breach Data

The age-old quant challenge of "imperfect data" seems to be making the rounds again. Part of this is playing out with questions about the utility of data breach reports, such as the Verizon DBIR. The same questions can be equally applied to various vulnerability-oriented descriptive reports, such as those from White Hat Security and Veracode. One of the most common quips is that there is no way to know how representative any of these reports are of "reality." That is, the data is not collected through some form of population sampling, but rather the data is "self-selected" by virtue of an incident occurring. We must thus be very careful in how we use these reports. That said said, we certainly shouldn't throw them out altogether, either.

The vast majority of vendor-produced reports are descriptive, not the result of a statistically sound, rigorous population sampling technique. As such, we have to assume that there is going to be inherent bias and potentially misleading information. In practical terms, this means that we have to be extremely careful in drawing inferences and conclusions. For example, the Verizon DBIR talks about the need to "Achieve essential, and then worry about excellent" security measures/controls. Unfortunately, this finding has been misconstrued by many to suggest that use or maintenance of basic controls would have prevented a significant majority of breaches. Such a statement isn't correct. Instead, the conclusion merely reflects how low the bar is typically set, and cannot say anything about whether or not these orgs may or may not have been compromised if they'd done a bit more due care.

The best way to use these reports is as a slightly more rigorous "wet finger in the air" test of the current general threat landscape. DBIR, for instance, reflects that there's a lot of low-hanging fruit out there, which needs to be addressed. The Veracode and WHS reports reflect that there is still a high prevalence of "trivial" vulnerabilities (of course, remediation of those vulns may or may not be trivial).

Another good use for these reports is in making educated estimates around factors like "vulnerability" and "resistance strength (difficulty)" in a FAIR analysis. Based on DBIR, for example, you can rest fairly assured that failing to implement basic controls will make your resistance strength fairly low, plus you'll be able to know that the average attacker can probably overcome such weak defenses. Where the utility of these reports begins to flag is when your organization implements stronger controls to protect an asset, since we don't generally see as much (if any) useful breach data on those situations.

On the flip side, there are a number of ways to misuse and abuse these reports. For starters, you cannot extrapolate much. The reports tell you something about the self-selected sample, but we cannot know how representative that sample is of the whole population, or even of key subsets (e.g. industry verticals). As noted above, we can see that a low bar seemed a fairly common attribute in breaches, but we cannot turn that around to imply that a high(er) bar will automatically reduce the frequency of breaches (we'd like to think that's the case, but it would be very difficult to demonstrate under ideal study condition, and is impossible given the data available).

Similarly, with various threat and vulnerability reports, we can point out that these "trivial" weaknesses are easily exploited, but we can't be quite so definitive about the cost or utility of remediation (Denim Group has a great resource site on the cost of remediation, which I highly recommend checking out). As such, we again must be very careful in how we're position the "results" of these descriptive reports. Drawing any sort of conclusions puts us on thin ice from a logical and statistical perspective.

Final point... these reports are informative... they do tell us interesting things... for example, we can potentially see patterns in detected (and reported) breaches. These patterns could allow us to alter behavior, prioritize remediation, or redirect investments. However, it must all be taken with a grain of salt and properly contextualized. Many breaches aren't included in the VzB DBIR. For example, we know there have been incidents with Stuxnet in Iran, and yet that data is not captured in the report (as far as we know anyway). It's unclear how well large-scale threat agents are represented in the available data. Moreover, the data is incomplete in terms of the overall breaches that happen each year. Will the 2012 DBIR include data for RSA, Barracuda, Comodo, Sony PSN, or Sony Erickson? There's really no way to know, and the answer we've heard is "maybe yes, maybe no" with a caveat that, even if they are included, we probably won't be told.

Hence... marginal utility. Interesting. Informative. Not conclusive. Not comprehensive. Not complete. But, potentially useful, if used carefully.

The Marginal Utility of Breach Data

Categories:

Search

About this Entry

Categories

Monthly Archives

Pages