As an interpreted language, Java’s highly abstract nature means that it is easy to decompile and decompile, and naturally there are measures to prevent decompilation. I read a related article today, I benefited a lot, I know each other and know myself!! The reason why I am interested in Java decompilation is because in the process of learning, I often need to learn from other people’s results (you know… ）。 Maybe decompiling other people’s code isn’t very ethical, this one….
Without further ado, the text is as follows:
Common protection techniques
easier to decompile due to the high level of abstraction of Java bytecode. This section describes several common methods for protecting Java bytecode from decompilation. In general, these methods do not absolutely prevent the program from being decompiled, but only make it more difficult to decompile, because these methods have their own usage environment and weaknesses.
The simplest way to isolate Java programs is to prevent users from accessing
Java Class programs, which is the most fundamental method and can be implemented in a variety of ways.
For example, developers can put the critical Java Class on the server side, and the client gets the service by accessing the server’s relevant interface, rather than directly accessing the Class file. This way hackers have no way to [decompile the Class file]. At present, there are more and more standards and protocols for providing services through interfaces, such as HTTP, Web services, RPC, etc. However, there are many applications that are not suitable for this type of protection, for example, Java programs cannot be isolated for programs running on a single machine. This type of protection is shown in Figure 1.
In order to prevent Class files from being directly decompiled, many developers encrypt some critical Class files, such as registration codes, serial number management related classes, etc. Before using these encrypted classes, the program first needs to decrypt these classes and then load them into the JVM. Decryption of these classes can be done by hardware or software.
At implementation time, developers often complete the loading of cryptographic classes by customizing the ClassLoader class (note that for security reasons, the applet cannot support custom ClassLoaders). The custom ClassLoader first finds the encrypted class, decrypts it, and finally loads the decrypted class into the JVM. In this way of protection, the custom ClassLoader is a very critical class. Since it is not encrypted itself, it can be the first target for hackers. If the associated decryption key and algorithm are overcome, the encrypted class can also be easily decrypted. A schematic of this protection method is shown in Figure 2.
Converting to native code Converting a program to
native code is also an effective way to prevent decompilation.
Because native code is often difficult to decompile. Developers can choose to convert the entire application to native code, or they can choose to convert key modules. If only critical parts of the modules are converted, Java programs need to use JNI technology to call them when using these modules.
Of course, while using this technology to protect Java programs, it also sacrifices Java’s cross-platform nature. For different platforms, we need to maintain different versions of the native code, which will increase the work of software support and maintenance. However, for some critical modules, sometimes this solution is often necessary.
In order to guarantee that this native code cannot be modified or replaced, it is often necessary to digitally sign the code. Before using these local codes, it is often necessary to authenticate these local codes to ensure that they have not been altered by hackers. If the signature check passes, the relevant JNI method is called. A schematic of this type of protection is shown in Figure 3.
Code obfuscation Code obfuscation
is the reorganization and processing of Class files so that the processed code completes the same function (semantics) as the pre-processing code. But obfuscated code is difficult to be [decompiled, that is, [the code obtained after decompilation is very difficult to understand and obscure, so [it is difficult for decompilers to derive the true semantics of the program. Theoretically, if hackers have enough time, the obfuscated code can still be cracked, and some people are even currently working on anti-obfuscation tools. However, from a practical point of view, due to the diversified development of obfuscation technology and the maturity of obfuscation theory, obfuscated Java code can still prevent [decompilation] well. Below we will explain the obfuscation technique in detail, because obfuscation is an important technique for protecting Java programs. Figure 4 is a diagram of code obfuscation.
Figure 4 Code obfuscation diagram
Summary of several technologies The above technologies have different application environments, each with its own weaknesses, and Table 1 is a comparison of related characteristics.
Introduction to obfuscation techniques
Table 1 Comparison table of different protection technologies
class=”rich_pages wxw-img” src=”https://mmbiz.qpic.cn/mmbiz_png/8Jeic82Or04koGGAppNdrzibfMpJooHOialeYZOrj7L7uiaNHjD2MpBeVoPEtibDrXGwTtCC9CRW9z3Cm9PFIibemibpw/640?wx_fmt=png”>
So far, obfuscation techniques are the most basic protection method for the protection of Java programs. There are also many Java obfuscation tools, including commercial, free, and open source. Sun also offers its own obfuscation tool. Most of them obfuscate Class files, and there are also a few tools that first process the source code and then process the Class, which increases the obfuscation process. Some of the more commercially successful obfuscation tools include JProof’s 1stBarrier series, Eastridge’s JShrink, and 4thpass.com’s SourceGuard. The main obfuscation techniques can be classified according to the obfuscation goal as follows, they are Lexical Obfuscation, Data Obfuscation, Control Obfuscation, and Prevent Transformation.
In Class, there is a lot of information that is not related to the program execution itself, such as method names, variable names, and the names of these symbols often have certain meanings. For example, if a method named getKeyLength(), then this method is likely to be used to return the length of Key. Symbolic obfuscation is the shuffling of this information, turning it into meaningless representations, such as numbering all variables starting from vairant_001; For all methods starting from method_001 numbering. This will cause certain difficulties in decompilation. For private functions, local variables, it is usually possible to change their sign without affecting the operation of the program. However, for some interface names, public functions, member variables, if there are other external modules that need to reference these symbols, we often need to keep these names, otherwise the external module cannot find the methods and variables of these names. Therefore, most obfuscation tools provide a wealth of options for symbol obfuscation, allowing users to choose whether and how to obfuscate symbols.
class=”rich_pages wxw-img” src=”https://mmbiz.qpic.cn/mmbiz_png/8Jeic82Or04koGGAppNdrzibfMpJooHOialul3yL5Ah5nibE86jp8Azn9zibABAnzOxBLkIfkODwLHX9Qe3xyhwn80g/640?wx_fmt=png”>
Data obfuscation is the obfuscation of data used by a program. There are also many methods of obfuscation, which can be mainly divided into changing data storage and encoding (Store and Encode Transform) and changing data access (Access Transform).
Changing the data storage
and encoding can disrupt the way the program uses data storage. For example, split an array with 10 members into 10 variables and scramble the names of these variables; Convert a two-dimensional array to a one-dimensional array, etc. For some complex data structures, we will scramble its data structure, such as replacing one complex class with multiple classes, etc.
Another way is to change data access. For example, when accessing the subscript of an array, we can perform certain calculations, Figure 5 is an example.
In practical obfuscation, the two approaches are often used in combination, disrupting data storage as well as the way data is accessed. After obfuscating the data, the semantics of the program become complex, which increases the difficulty of decompilation.
Control obfuscationControl obfuscation
is to confuse the control flow of the program, making the control flow of the
program more difficult [decompilation, usually the change of control flow requires adding some additional calculations and control flow, so it will have a certain negative impact on the performance. Sometimes, there is a trade-off between the performance of a program and the degree of confusion. The techniques for controlling obfuscation are the most complex and skillful. These techniques can be divided into the following categories:
Increasing obfuscation control Hiding the original semantics of a program by adding additional, complex control flow. For example, for two statements A and B that are executed sequentially, we can add a control condition to determine the execution of B. This makes disassembly more difficult. But all interference control should not affect the execution of B. Figure 6 shows three ways to add obfuscation control to this example.
Figure 6 Three ways to increase confusion controlControl
flowReassembly reassembly control flow is also an important obfuscation method. For example, a program calls a method, and after obfuscation, the method code can be embedded in the calling program. Conversely, a piece of code in a program can be turned into a function call. In addition, for a control flow of one loop, it is a control flow that can split multiple loops, or convert the loop into a recursive process. This method is the most complex and has a large number of researchers.
is usually designed for specialized decompilers, and in general, these techniques take advantage of decompiler weaknesses or bugs to design obfuscation schemes. For example, some decompilers do not decompile instructions after Return, while some obfuscation schemes place code after the Return statement. The effectiveness of this confusion also does not work the same for different decompilers. A good obfuscation tool will often use a combination of these obfuscation techniques.
In practice, protecting a large Java program often requires a combination of these methods rather than a single method. This is because each method has its weaknesses and application environment. The combination of these methods makes the protection of Java programs more effective. In addition, we often need to use other related security technologies, such as security authentication, digital signatures, PKI, etc.
The example given in this article is a Java application, which is a SCJP (Sun Certificate Java Programmer) mock exam software. The app comes with a large number of mock questions, all of which are encrypted and stored in files. Since the question bank it comes with is a core part of the software, access and access to the question bank has become a very core class. Once these related classes are decompiled, all the libraries will be cracked. Now, let’s consider how to protect these pools and related classes.
In this example, we consider the use of comprehensive protection techniques, which include native code and obfuscation techniques. Because the software is mainly released on Windows, only one version of the native code needs to be maintained after conversion to native code. In addition, obfuscation is very effective for Java programs, suitable for such independently distributed applications.
In the specific solution, we divide the program into two parts, one is a module accessed by a question bank written in native code, and the other module is developed by Java. This allows a higher degree of protection of the topic management module from being decompiled. For Java developed modules, we still want to use obfuscation techniques. A schematic diagram of this scheme is shown in Figure 7.
Figure 7 SCJP protection technology scheme diagram
For the topic management module, because the program is mainly used under Windows, the C++ development question bank access module is used, and a certain access interface is provided. In order to protect the interface of the library access, we have also added an initialization interface for initialization work before each use of the library access interface. Its interfaces are mainly divided into two categories:
Before using the question bank module, we must first call the initialization interface. When calling this interface, the client needs to provide a random number as a parameter. Through this random number, the question bank management module and the client generate the same SessionKey at the same time according to a certain algorithm to encrypt all the data input and output in the future. In this way, only authorized (valid) clients can connect to the correct connection and generate the correct SessionKey for accessing the problem database information. It is difficult for illegal customers to generate the correct SessionKey and therefore cannot obtain the information in the question bank. If a higher level of secrecy is required, mutual authentication techniques can also be employed.
After the data provider authentication is completed, the client can access the database
However, both the input and output data are encrypted by the SessionKey. Therefore, only the correct pool management module can use the pool management module. Figure 8 Timing diagram shows the interaction process between the question bank management module and other parts.