Friday, February 24, 2006

how to spot snake oil using the halting problem

i blogged previously about how the halting problem is good for separating the possible from the impossible in the malware field, but i suspect it's still a little over most people's heads... art kopp came up with the term "snake oil spotter guidelines" and it inspired me to try and distill what i said earlier into a (hopefully) simpler heuristic that anyone can use...

let's say you have a sequence of one or more bits or bytes... let's further say that it's of finite length (in practice we never deal with anything that goes on for ever) and that it is well defined (we know the exact sequence)... this might sound familiar, it might even sound like a virus signature, but i'm talking in a much more general sense here so we'll just call it a pattern...

  1. any system that claims to find all instances of something that is equal to or contains such a pattern (ie. finding all instances of the character 'a' or the string "dog") does not run into difficulties with the halting problem and is not snake oil (at least not as far as that particular claim is concerned)...

  2. any system that claims to find all instances of something that is not equal to or does not contain such a pattern (ie. finding all instances of a characters that aren't 'a' or all words that don't contain the substring "dog") does not run into difficulties with the halting problem and is not snake oil (as far as that claim is concerned)...

  3. any system that claims to find all instances of something using the equal/not equal or contains/not contains comparison to any one of a finite list of such patterns (ie. finding all instances of the characters 'a' and 'b' or all words that don't contain the substring "dog" or "shoe") does not run into difficulties with the halting problem and is therefore not snake oil as far as that claim is concerned...

  4. any system that claims to find all instances of something less trivial than what has been described so far, something that can't be represented as a well defined finite set of such patterns (viruses, for example, cannot be represented that way as there are an infinite number of possible viruses - so too with any function, since there are an infinite number of ways to implement any given function - also any class of bug or vulnerability has an infinite number of ways in which it can occur in practice), or by using a less trivial method of comparison (certain types of preprocessing, like decryption or decompression notwithstanding) generally does run into difficulties with the halting problem and therefore probably is snake oil...

  5. any system that claims to prevent or avoid all instances of something must use a technique that is equally capable of finding all instances of that something and so the previous rules apply...

  6. any system that only claims to find/prevent some instances of something doesn't run into difficulties with the halting problem and is therefore probably not snake oil as far as that claim is concerned...


wow, that's actually pretty big... let's condense it...

any system that claims to find all instances of something and that something is non-trivial and/or is to be found using a non-trivial comparison method will generally run into difficulties with the halting problem and therefore be snake oil...

as a corollary to the above, all real world threats and vulnerabilities wouldn't be problems if they were trivial... therefore any system that claims to find or prevent all instances of a particular kind of threat or vulnerability is snake oil unless the particular kind in question is specified narrowly enough that it applies only to a specific, well defined, finite set of items that can themselves be represented by one or more of a finite number of the above mentioned patterns... for example, all known viruses represents just such a set of items and can therefore be detected and/or prevented without running into problems with the halting problem...

0 comments: