Reversing java and other type of VM bytecodes - do we have tools ?

Release date: 23-06-2012


There exist many tools for debugging real CPU machine code, especially x86 32-bit (enough to mention excellent IDA Pro with HexRays Decompiler and OllyDbg with tons of plugins and clones). If you buy a malware analysis training, the instructor will teach you how to debug x86 32-bit code only. There is less out there when it comes to x86 64-bit code - whereas we can still use IDA Pro, HexRays decompiler is not available yet and no 64-bit version of OllyDbg has been released so far (I have used IDA Pro with WinDbg stub for debugging in this case - still doable). For ARM architectures, IDA can be of help (HexRays decompiler is able to decompile ARM machine code!) for both disassembling and debugging. There might be also some other free or commercial tools available for dissasembling and debugging (metasm, BinNavi etc.) although nothing can replace IDA and Olly (or any clone of Olly such as ImmunityDebugger).

How about debugging virtual machine bytecode ? There is (and will be more) malware written in bytecode! Well, I'm afraid we are really missing some good reversing/debugging/disassembling/decompiling/etc. tools, especially for Java VM. For .NET executables for example, there exists a tool called .NET Reflector Pro which ($195) which will decompile and debug the program with single-stepping included, but we have no way to debug obfuscated programs where the only choice is to debug the IL bytecode because they cannot be decompiled. In this case, WinDbg can debug on a per-instruction level.

Thanks to the popularity of Android-based devices, there exist a free tool to debug Dalvik VM code: android-apktool. Although last time when I tried it, it required archive version of NetBeans to run (Eclispe and newer Netbeans do not work because you can't put breakpoints on lines containing comments only). I ended up succesfully debugging Dalvik code with NetBeans 6.8 and it performed reasonably well - I could put breakpoints everywhere within Dalvik bytecode (represented in NetBeans IDE as comments). This was really cool, but there should be a way to use more recent NetBeans or Eclipse. Still, not bad ...

Some time ago I got java (JVM) malware to analyze. The .class file containing more embedded and obfuscated class files extracted at runtime. First, I was trying to get my Eclipse to debug decompiled code. Unfortunately most of the obfuscated code was badly decompiled (or failed to decompile at all), and a second thing - there was no way to place breakpoints on each line unless the class file contained a line number attribute pointing to a given bytecode instruction (and my malware did not contain line number attributes at all). After trying some other tools and throwing them away for one reason or another, I found Java Bytecode Visualizer. This is a cool program that can visualize and debug Java bytecode.

The site provides a plugin for Eclipse and has recently become open source (at the time I was checking it the Pro version was still paid). This one can do almost everything I wanted, it can even do singlestepping when debugging. Unfortunately, it is missing one but very important feature - you can't place breakpoints on instructions when there is no line number attribute included in the class file. [UPDATE: Actually, even when there are line numbers, Bytecode Visualizer places brekapoint at the beginning of the method for some reason]. In my case, I had a .class file that had no line number attribute table at all, which means I can only place breakpoints on an entry and exit from methods. That was a major disadvantage for me and I was really upset with that. I actually think that from all three: .NET, Dalvik VM and Java VM - the latter is really missing a good tool or techinque to debug bytecode. The I started to think how to improve this situation and an idea came to my mind - why not modify the class file before loading it into the debugger, so that it contains line number attributes pointing to every bytecode instruction in a method ? This would make per-instruction breakpoints perfectly possible !

Well, this is what I'm planning to do if time permits. I would like to create a tool to modify a class file (there exist free frameworks to do class file modifications) to add line number table to every method in a class file, and fill it with entries pointing to each bytecode instruction out there. I have mentioned this to the author of Bytecode Visualizer and he liked the idea. So, unless there is anyone there who knows the tool like this already exists (please let me now - deresz signal11 . eu), I will try to mobilize myself one day and create this little tool. I will keep you posted.

2014 UPDATE: Of course I didn't have time to do it. But good news: the idea has finally been implemented in dirtyJOE by ReWolf. Excellent write-up about solving a Java crackme using similar techniques is available here.