avwhy: reversing anti-virus detection signatures

Release date: 20-06-2012


This post is related to one of my older ones in which I showed that a lot of antivirus products seem to use a variant of simple file hashing as a detection signature. It is still true for lots of them - the test I performed on the sample I wirte about later on in this post showed that for a file detected by 20 AV products, just adding one byte to the end of the file (which in fact only changes the hash function on the file but does not change the functionality and semantics), eliminates the detection of 8 of them. Here are the links to Virustotal with analysis of both of the files respectively:

6ea6487c68dbfcca7ab8b0c2a406295b
1d32a9c05a4324ad107a6cf0a3703d26

This time though I decided to go a bit futher than that. But first, how did it start ? I had a very irritating AV detections coming up at a customer site and they kept asking why is that and wanted to confirm the false positive detection. The file was a part of a wireless keyboard driver and although the code was really badly written and using some old third party libraries, I could not find anything malicious in it per se. Yes, it was hooking the keyboard, but was not saving keystrokes to a file nor sending them to a remote server. Yes, it contained code to go to some remote servers, but only to benign site like iTunes. And for the particular anti-virus product I was targeting (MS Security Essential), simply changing the hash sum of the file did not change the detection

So, I decided to figure out which signatures are responsible for the said detection, hoping that it might point me to some place in the code such that I can confirm it is malicious. At first, I was thinking about doing a hardcore reverse engineering with a debugger attached to the scanner but I would have to at the very least reverse and understand some part of the scanning engine until I could start looking at why it catches on the sample. As this seemed like loads of work, a simpler idea came to my mind. Why not do a "behavioral" analysis, change a single byte at a time in the sample, and then, after each change, run the anti-virus scanner to check if the sample is still detected as malicious ?

This is what the script does. It is splitting the fuzzed files in batches and scans them. Currently it supports McAfee uvscan for Linux and MS Security Essentials for Windows, but adding other scanners is trivial. For more info please look into the script's code and the command line help.

Here is an example output of the script. For the file in question that I had investigated, I have obtained the following results:

$ python avwhy.py ms suspicious.exe 2>/dev/null
len: 17
00B047   61 70 70 6C 65 2E 63 6F 6D 2F 69 74 75 6E 65 73    apple.com/itunes
00B057   2F                                                 /

len: 37
00B05C   43 61 6E 27 74 20 66 6F 75 6E 64 20 74 68 65 20    Can't found the
00B06C   69 54 75 6E 65 73 20 6F 6E 20 79 6F 75 72 20 73    iTunes on your s
00B07C   79 73 74 65 6D                                     ystem

len: 37
00B084   41 72 65 20 79 6F 75 20 77 61 6E 74 20 74 6F 20    Are you want to
00B094   64 6F 77 6E 6C 6F 61 64 20 61 20 69 54 75 6E 65    download a iTune
00B0A4   73 20 6E 6F 77                                     s now

len: 14
00B0C3   6D 75 73 69 63 6D 61 74 63 68 2E 63 6F 6D          musicmatch.com

len: 49
00B0DC   43 61 6E 27 74 20 66 6F 75 6E 64 20 74 68 65 20    Can't found the
00B0EC   4D 75 73 69 63 4D 61 74 63 68 20 4A 75 6B 65 62    MusicMatch Jukeb
00B0FC   6F 78 20 6F 6E 20 79 6F 75 72 20 73 79 73 74 65    ox on your syste
00B10C   6D                                                 m

len: 49
00B110   41 72 65 20 79 6F 75 20 77 61 6E 74 20 74 6F 20    Are you want to
00B120   64 6F 77 6E 6C 6F 61 64 20 61 20 4D 55 53 49 43    download a MUSIC
00B130   4D 41 54 43 48 20 4A 75 6B 65 62 6F 78 20 6E 6F    MATCH Jukebox no
00B140   77                                                 w

Funny enough ? :-) So, I could conclude: it detects certain strings in the binary, but these are not malicious code fragments. And this was all I wanted to know. If you want to know as well why your antivirus is yelling about something I hope you will find this thing usefull.

UPDATE: The first release of multi-engine scanner is out! It can be downloaded here (github)