3 min read

Code Virtualization and Tamper-proofing

Photo by Jay / Unsplash
Photo by Jay / Unsplash

Code Virtualization

Code virtualization is a very powerful obfuscation technique. The main idea is to transform a function or application into an interpreter (virtual machine) and a corresponding bytecode that is interpreted at runtime. The goal is to make the reverse engineer understand how the interpreter works before understanding how the function or application is implemented.

Code virtualization allows for a great deal of diversity in the protected application, at the level of bytecode representation, bytecode capabilities, interpreter architecture, interpreted internal implementation, etc.

The Tigress obfuscator can virtualize programs written in C. Concretely, Tigress can generate an interpreter with the following components:

  • a virtual instruction set;
  • a bytecode array;
  • a virtual program counter;
  • a virtual stack pointer;
  • a dispatch unit;
  • and instruction handlers for each virtual instruction.
Code Virtualization. Picture source: Tigress website

Code virtualization allows for static diversity:

  • opcodes can be randomized;
  • multiple instructions can have the same semantics;
  • instructions can pass arguments in any combination of stack
  • diversified locations and registers;
  • instructions can have an arbitrary level of semantic complexity;
  • etc.

It also allows for dynamic diversity: randomized bogus functions can be merged with real ones; instruction traces can be made artificially long, etc. Duplicate operators give the same semantic meaning to multiple virtual instructions. Superoperators merge multiple virtual instructions into a new virtual instruction. Finally, you can apply code transformations to the bytecode and to the interpreter, too.

However, it is important to know that a program protected by code virtualization is not unbreakable! Jonathan Salwan has overcome several challenges involving code virtualization proposed by the Tigress team. All scripts and methods have been published on Github.

The deobfuscation stages were the following:

  • Parsing the protected binary;
  • computing a symbolic representation of the binary;
  • converting the symbolic representation to LLVM-IR
  • applying LLVM optimizations;
  • and rebuilding a simplified version.
Deobfuscation stages of Tigress virtualized code challenges. Picture source: Github

Code Tamper-Proofing

Software that is tamper-resistant is software that is difficult to modify. This could be because interesting places are hard to find, because of code obfuscation, or thanks to special tamper-proofing techniques. This is not to be confused with copy protection or anti-cloning mechanisms and trusted hardware execution (TPM, SGX, secure enclaves, custom HW, etc.). There is currently no known provably secure software-only tamper-proofing technique.

Dedicated code-tampering techniques often involve two distinct stages: detection of modification coupled with a response mechanism. There are several response strategies that can be implemented:

  • the software might have its functionality lowered;
  • it might start repeating the same actions over and over (infinite loop);
  • it might close;
  • it might crash;
  • it might notify a remote backend;
  • etc.

General ideas of software tamper-proofing techniques are:

  • Packing: some functions or the whole application is stored encrypted and authenticated. Unpacking is performed just before execution.
  • Software guards: this idea was published by Chang and Attalah in 2001 in their paper Protecting Software Code by Guards. Guards are tiny pieces of software that are inserted into the binary and check that the code has not been modified. Guards can protect guards. This technique was commercialized by Arxan Technologies, now part of Digital.ai.
  • Self-hashing code coupled with control-flow flattening: this idea has been published by Wehrli et al. in 2015 in their paper Obfuscator-LLVM – Software Protection for the Masses. The software is continuously self-hashing and the hash values are used to compute the control flow; strong.codes SA commercialized the technology and then sold it to Snap Inc.

Code tamper-proofing techniques are often a chicken-and-egg problem because the values of the checks depend on the final binary. Therefore, a post-processing phase is necessary to adjust the binary after linking.

Hardware-Based Code Tamper-Proofing

Trusted computing and trusted execution environments make sure that approved, i.e., unmodified code is run safely. These different technologies let you run critical code and safely store sensitive data in a tamper-proof environment:

In the next episode, I’ll cover anti-debug tricks and jailbreak/root detection. Stay tuned!


Thanks for reading Crumbs of Cybersecurity! Subscribe for free to receive new posts and support my work.