Friday, December 30, 2016

thoughts on "In Search of Evidence-Based IT-Security"

Christopher Soghoian brought to my attention a video of a talk by Hanno Böck at the 33rd Chaos Communication Congress. in it Hanno puts forward the claim that IT security is largely science-free, so let's follow a staple of the scientific process - peer review.

Hanno introduces himself as a journalist and hacker and says that he prefers to avoid the term "security researcher" and that he hopes the audience will see why.  for those who are relatively well versed in the field of anti-malware it should definitely become obvious why he prefers to avoid that term and i'll return to this near the end.

Hanno is a skeptic, and far from the only one, his talk ultimately expresses the same sentiments that are now common-place in the perennially misinformed information security community. the difference is that Hanno has found a novel way of expressing them, couched in scientific jargon and easily mistaken for insight. he spends altogether too long and dives too deeply into the medical analogy upon which computer viruses and by extension anti-virus software is named. the analogy has long been recognized as deeply imperfect and limited. that's why, in reality, there are relatively few references to this analogy in the anti-malware field other than "computer virus", "anti-virus", and "infection" (all three of which date back virtually to the beginning of the field). his call towards the end of his talk for blinded or even double blinded studies, aside from being prohibitively expensive to perform, seem to cling to this medical paradigm in spite of the fact that the subject of such experimentation (ie. the computer, since we're interested in whether AV can prevent computers from becoming compromised) cannot be psychologically influenced by knowledge of which (if any) anti-virus is being used.

when he FINALLY leaves the topic of medical science to return to security products (about 14 minutes into his half hour talk) he harps on the absence of one very particular kind of experiment being performed on security products - what he calls a randomized controlled trial. it turns out this is a hold-over from his preoccupation with medical science. when Hanno says that IT security is largely science-free it is the absence of this particular kind of scientific experiment that he is referring to, but that doesn't actually make it science-free because science has a variety of different ways to study and experiment on things that aren't people.

there is in fact good scientific evidence for the efficacy of anti-virus software and it's provided by none other than Microsoft:

now it's true that this is data is from an observational study and that it only shows correlation rather than causation, but that's not the end of the world. observational studies are still science. showing correlation may not be definitive evidence but it's still strong evidence, especially considering the scope of the study (hundreds of millions of computers around the world out of a total estimated population of 1.25 billion windows PCs). in this particular case A may not be causing B but B definitely can't cause A and if anyone can think of a confounding variable that might be present on hundreds of millions of systems then maybe let Microsoft know so that they can try to account for it in the future.

another source of scientific evidence (oft derided in information security circles because the results don't match experts' anecdata) are the independent testing labs like av-test.org or av-comparatives.org. they eliminate the influence of confounding variables and so are capable of showing causation rather than just correlation. unfortunately Hanno believes their methodology is "extremely flawed". let's look at his complaints:
  • "If a software detects a malware it does not mean it would've caused harm if undetected."
    • this is trivially false. anyone who actually reads the testing methodology at av-comparatives (for example) can find right at the beginning a statement about first testing the malware without the AV present and eliminating any that don't work in that scenario. therefore every sample that is detected by AV in their tests would have caused harm if it had gone undetected.
  • "Alternatives to Antivirus software are not considered." (the talk gives "regular updates" and "application whitelisting" as examples)
    • the example of "regular updates" is frankly a little bit bizarre given Hanno's earlier references to confounders. not controlling for this scenario would actually introduce a confounding variable and make it more difficult to show a causal relationship between the use of a particular AV and the prevention of malware incidents.
    • the example of "application whitelisting" underscores a serious problem in Hanno's understanding of what he's critiquing. application whitelisting isn't an alternative to AV, it's a part of AV. many products include this as a feature. Symantec's product, for example, has what they call a reputation engine which alerts when it encounters anything that doesn't have a known good reputation (which means new/unknown malware, traditionally the bane of known-malware scanning, will get alerted on because it hasn't been seen before and thus no reputation, good or bad).
  • "Antivirus software as a security risk is not considered."
    • when malware exploiting vulnerabilities in anti-virus software is found in the wild then perhaps the test methodologies should be updated to include this possibility. until then, changing the methodology to account for malware that doesn't seem to exist outside a lab has no real benefit.
  • "None of these tests are with real users."
    • again, this would introduce a confounding variable. maybe the lack malware incidents is because of something the user did rather than because of the AV. alternatively maybe the failure to stop malware incidents is because of something the user did rather than because of a failure of the AV. if you want to establish causation you have to control your variables (something our scientifically-minded speaker Hanno should know all too well). does the anti-virus prevent malware incidents? the tests say yes. can a user preempt or compromise that prevention? also yes. is there any prevention a user can't preempt or compromise? sadly (or perhaps thankfully) no. if you want a study that includes users and thus eliminates the ability to establish a causal link between AV use and prevention of malware incidents, see the study by Microsoft, but even with the inclusion of the users it still suggests AV prevents malware incidents.

when Hanno addressed the paucity of scientific papers dealing with security i found myself confused. using Google Scholar to find the most cited scientific papers? surely he doesn't think the realm of security is so narrowly focused that he'll find what he's looking for that way. security is in fact incredibly broad, covering many different quasi-related domains, and looking at a handful of the most popular scientific papers across all of security is in no way representative of the corpus of available works related to any one particular field (like security software). perhaps i'm biased, having previously (in the very distant past) maintained a reference library of papers related specifically to anti-virus, but it doesn't seem like Hanno showed much evidence that he knew how to find evidence-based security. is it really that hard to add the term "malware" to his search query? could he not find a few and then use them as a seed in an algorithm that crawls backwards and forwards through scientific papers by citation? did he even bother to look at Virus Bulletin? does he even know what that is?

security isn't the only thing that is incredibly broad - so too is the practice and discipline of science itself. there are many different fields and each one does things in their own particular way. we do not perform randomized controlled trials on the cosmos. as a general rule we do not intervene in volcano formation. the work being done at the large hadron collider does not follow exactly the same methodologies that are used in medical science. are we to judge cosmology, volcanology, or particle physics poorly because of this? no of course not. a question you might well ask is what kind of science should logically be used when it comes to studying computer security and, while i suspect multiple scientific disciplines could be useful, the one that springs immediately to mind is computer science. does computer science look anything like medical science? as someone with a degree in computer science i can tell you the answer is emphatically no. we do many things in computer science but randomized controlled trials are not among them (because computers are not people). while Hanno may style himself as "scientifically minded" he doesn't seem to demonstrate an appreciation for the breadth of valid scientific research methodologies and one is left to wonder if he's familiar with any kind of science outside of medicine.

when it comes right down to it, it's this apparent lack of familiarity with the subject matter he's talking about that i found most troubling about Hanno's talk.what is anti-virus software really? what is av testing methodology really? what does science really look like? where do you look for scientific research into malware and anti-malware? these all seem to be questions Hanno struggles with, which brings us back to the subject of why he likes to avoid the term "security researcher". if i had to venture a guess i'd say it's because he doesn't do research, even the basic research necessary to understand the subject matter. as such i would say avoiding the term "security researcher" is probably appropriate (for now).

i'm not sure what one can say in a talk about a subject one hasn't done one's homework on, but hopefully that can improve in the future. Hanno referenced Tavis Ormandy during his talk (as people who criticize AV like to do). Tavis' work on AV also suffered from a lack of understanding in the beginning, but he improved over time and, while he still has room for more improvement, now has arguably done some good work in finding vulnerabilities in AV and holding vendor's accountable for the quality of their software. i'm certain Hanno can also improve. i know there are real criticisms to be made of AV software and the industry behind it, but they have to be informed, they have to come from a place of real knowledge and understanding. i look forward to Hanno reaching that place.