加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

Living in the Matrix with Bytecode Manipulation--转

发布时间:2020-12-14 06:22:28 所属栏目:Java 来源:网络整理
导读:原文地址:https://www.infoq.com/articles/Living-Matrix-Bytecode-Manipulation You are probably all too familiar with the following sequence: You input a .java file into a Java compiler,(likely using javac or a build tool like ANT,Maven or G

原文地址:https://www.infoq.com/articles/Living-Matrix-Bytecode-Manipulation

You are probably all too familiar with the following sequence: You input a .java file into a Java compiler,(likely using javac or a build tool like ANT,Maven or Gradle),the compiler grinds away,and finally emits one or more .class files.?

Figure 1: What is Java bytecode?

If you run the build from the command line with verbose enabled,you can see the output as it parses your file until finally it prints your .class file.?


The generated .class file contains the bytecode,essentially the instruction set for the Java virtual machine (JVM),and is what gets loaded by the Java runtime class loader when a program executes.

In this article we will investigate Java bytecode and how to manipulate it,and why anyone would ever want to do so.

Bytecode-manipulation frameworks

Some of the more popular frameworks for manipulating bytecode include:

  • ,
  • ,
  • ,
  • ,
  • ,
  • ,and
  • .

This article focuses on Javassist and ASM.

Why should you care about manipulating bytecode?

Many common Java libraries such as Spring and Hibernate,as well as most JVM languages and even your IDEs,use bytecode-manipulation frameworks. For that reason,and because it’s really quite fun,you might find bytecode manipulation a valuable skillset to have. You can use bytecode manipulation to perform many tasks that would be difficult or impossible to do otherwise,and once you learn it,the sky's the limit.?

One important use case is program analysis. For example,the popular FindBugs bug-locator tool uses ASM under the hood to analyze your bytecode and locate bug patterns. Some software shops have code-complexity rules such as a maximum number of if/else statements in a method or a maximum method size. Static analysis tools analyze your bytecode to determine the code complexity.

Another common use is class generation. For example,ORM frameworks typically use proxies based on your class definitions. Or consider security applications that provide syntax for adding authorization annotations. Such use cases lend themselves nicely to bytecode manipulation.

JVM languages such as Scala,Groovy,and Grails all use a bytecode-manipulation framework.

Consider a situation where you need to transform library classes without having the source code,a task routinely performed by Java profilers. For example,at New Relic,bytecode instrumentation is used to time method executions.

With bytecode manipulation,you can optimize or obfuscate your code,or you can introduce functionality such as adding strategic logging to an application. This article will focus on a logging example,which will provide the basic tools for using these bytecode manipulation frameworks.

Our example

Sue is in charge of ATM coding for a bank. She has a new requirement: add key data to the logs for some designated important actions.

Here is a simplified bank-transactions class. It allows a user to log in with a username and password,does some processing,withdraws a sum of money,and then prints out “transactions completed.” The important actions are the login and withdrawal.


To simplify the coding,Sue would like to create an?@ImportantLog?annotation for those method calls,containing input parameters that represent the indexes of the method arguments she wants to record. With that,she can annotate her?login?and?withdraw?methods.?


For?login,Sue wants to record the account ID and the username so her fields will be set to “1” and “2”,(she doesn’t want to display the password!) For the?withdraw?method,her fields are “0” and “1” because she wants to output the first two fields: account ID and the amount of money to remove. Her audit log ideally will contain something like this:

To hook this up,Sue is going to use a Java agent. Introduced in JDK 1.5,Java agents allow you to modify the bytes that comprise the classes in a running JVM,without requiring any source code.

Without an agent,the normal execution flow of Sue’s program is:

  1. Run Java on a main class,which is then loaded by a class loader.
  2. Call the class’s main method,which executes the defined process.
  3. Print “transactions completed.”

When you introduce a Java agent,a few more things happen — but let’s first see what’s required to create an agent. An agent must contain a class with a premain method. It must be packaged as a JAR file with a properly constructed manifest that contains a?Premain-Class?entry. There is a switch that must be set on launch to point to the JAR path,which makes the JVM aware of the agent.


Inside premain,register a?Transformer?that captures the bytes of every class as it is loaded,makes any desired modifications,and returns the modified bytes. In Sue’s example,?Transformercaptures?BankTransaction,which is where she makes her modifications and returns the modified bytes. Those are the bytes that are loaded by the class loader,and which the main method will execute to perform its original functionality in addition to Sue’s required augmented logging.

When the agent class is loaded,its?premain?method is invoked before the application?mainmethod.

Figure 2: Process with Java agent.

It’s best to look at an example.

The?Agent?class doesn’t implement any interface,but it must contain a?premain?method,as follows:

The?Transformer?class contains a?transform?method,whose signature accepts a?ClassLoader,class name,?Class?object of the class being redefined,?ProtectionDomain?defining permissions,and the original bytes of the class. Returning?null?from the?transform?method tells the runtime that no changes have been made to that class.

To modify the class bytes,supply your bytecode manipulation logic in?transform?and return the modified bytes.

Javassist

A subproject of JBoss,Javassist (short for “Java Programming Assistant”) consists of a high-level object-based API and a lower-level one that is closer to the bytecode. The more object-based one enjoys more community activity and is the focus of this article. For a complete tutorial,refer to the?.

In Javassist,the basic unit of class representation is the?CtClass?(“compile time class”). The classes that comprise your program are stored in a?ClassPool,essentially a container forCtClass?instances.

The?ClassPool?implementation uses a?HashMap,in which the key is the name of the class and the value is the corresponding?CtClass?object.?

A normal Java class contains fields,constructors,and methods. The corresponding?CtClassrepresents those as?CtField,?CtConstructor,and?CtMethod. To locate a?CtClass,you can grab it by name from the?ClassPool,then grab any method from the?CtClass?and apply your modifications.

Figure 3.

CtMethod?contains lines of code for the associated method. We can insert code at the beginning of the method using the?insertBefore?command. The great thing about Javassist is that you write pure Java,albeit with one caveat: the Java must be implemented as quoted strings. But most people would agree that’s much better than having to deal with bytecode! (Although,if you happen to like coding directly in bytecode,stay tuned for the ASM section.) The JVM includes a bytecode verifier to guard against invalid bytecode. If your Javassist-coded Java is not valid,the bytecode verifier will reject it at runtime.

Similar to?insertBefore,there's an?insertAfter?to insert code at the end of a method. You can also insert code in the middle of a method by using?insertAt?or add a catch statement withaddCatch.

Let's kick off your IDE and code your logging feature. We start with an?Agent?(containingpremain) and our?ClassTransformer.?

<span class="token keyword">package com<span class="token punctuation">.example<span class="token punctuation">.spring2gx<span class="token punctuation">.agent<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.ClassFileTransformer<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.IllegalClassFormatException<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.security<span class="token punctuation">.ProtectionDomain<span class="token punctuation">;

<span class="token keyword">public <span class="token keyword">class <span class="token class-name">ImportantLogClassTransformer
<span class="token keyword">implements <span class="token class-name">ClassFileTransformer <span class="token punctuation">{

<span class="token keyword">public <span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] <span class="token function">transform<span class="token punctuation">(ClassLoader loader<span class="token punctuation">,String className<span class="token punctuation">,Class <span class="token class-name">classBeingRedefined<span class="token punctuation">,ProtectionDomain protectionDomain<span class="token punctuation">,<span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] classfileBuffer<span class="token punctuation">) <span class="token keyword">throws IllegalClassFormatException <span class="token punctuation">{
<span class="token comment">// manipulate the bytes here
<span class="token keyword">return modified_bytes<span class="token punctuation">;
<span class="token punctuation">}

To add audit logging,first implement?transform?to convert the bytes of the class to a?CtClassobject. Then,you can iterate its methods and capture ones with the?@ImportantLogin?annotation on them,grab the input parameter indexes to log,and insert that code at the beginning of the method.

<span class="token comment">// get important method parameter indexes
List parameterIndexes <span class="token operator">= <span class="token function">getParamIndexes<span class="token punctuation">(annotation<span class="token punctuation">)<span class="token punctuation">;
<span class="token comment">// add logging statement to beginning of the method
currentMethod<span class="token punctuation">.<span class="token function">insertBefore<span class="token punctuation">(
<span class="token function">createJavaString<span class="token punctuation">(currentMethod<span class="token punctuation">,className<span class="token punctuation">,parameterIndexes<span class="token punctuation">)<span class="token punctuation">)<span class="token punctuation">;
<span class="token punctuation">}
<span class="token punctuation">}
<span class="token keyword">return cclass<span class="token punctuation">.<span class="token function">toBytecode<span class="token punctuation">(<span class="token punctuation">)<span class="token punctuation">;
<span class="token punctuation">}
<span class="token keyword">return null<span class="token punctuation">;
<span class="token punctuation">}

Javassist annotations can be declared as “invisible” or “visible”. Invisible annotations,which are only visible at class loading time and compile time,are declared by passing in theRententionPolicy.CLASS?argument to the annotation. Visible annotations (RententionPolicy.RUNTIME) are loaded and visible at run time. For this example,you only need the attributes at compile time,so make them invisible.?

The?getAnnotation?method scans for your?@ImportantLog?annotation and returns null if it doesn’t find the annotation.


With the annotation in hand,you can retrieve the parameter indexes. Using Javassist’sArrayMemberValue,the member value fields are returned as a String array,which you can iterate to obtain the field indexes you had embedded in the annotation.?


You are finally? in a position to? insert your log statement in?createJavaString.


Your implementation creates a?StringBuilder,appending some preamble followed by the required method name and class name. One thing to note is that if you're inserting multiple Java statements,you need to surround them with squiggly brackets (see lines 4 and 26).

(Brackets are not required for just a single statement.)

That pretty much covers the code for adding audit logging using Javassist. In retrospect,the positives are:

  • Because it uses familiar Java syntax,there’s no bytecode to learn.
  • There wasn't too much programming to do.
  • Good documentation on Javassist exists.

The negatives are:

  • Not using bytecode limits capabilities.
  • Javassist is slower than other bytecode-manipulation frameworks.

ASM

ASM began life as a Ph.D. project and was open-sourced in 2002. It is actively updated,and supports Java 8 since the 5.x version. ASM consists of an event-based library and an object-based one,similar in behavior respectively to SAX and DOM XML parsers. This article will focus on the event-based library. Complete documentation can be found?.

A Java class contains many components,including a superclass,interfaces,attributes,fields,and methods. With ASM,you can think of each of these as events; you parse the class by providing a?ClassVisitor?implementation,and as the parser encounters each of those components,a corresponding “visitor” event-handler method is called on the?ClassVisitor(always in this sequence).?


Let’s get a feel for the process by passing Sue’s?BankTransaction?(defined at the beginning of the article) into a?ClassReader?for parsing.

Again,start with the?Agent premain:


Then pass the output bytes to a no-op?ClassWriter?to put the parsed bytes back together in the byte array,producing a rehydrated?BankTransaction?that as expected is virtually identical to our original class.

Figure 4.

<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.ClassFileTransformer<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.lang<span class="token punctuation">.instrument<span class="token punctuation">.IllegalClassFormatException<span class="token punctuation">;
<span class="token keyword">import java<span class="token punctuation">.security<span class="token punctuation">.ProtectionDomain<span class="token punctuation">;

<span class="token keyword">public <span class="token keyword">class <span class="token class-name">ImportantLogClassTransformer <span class="token keyword">implements <span class="token class-name">ClassFileTransformer <span class="token punctuation">{
<span class="token keyword">public <span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] <span class="token function">transform<span class="token punctuation">(ClassLoader loader<span class="token punctuation">,<span class="token keyword">byte<span class="token punctuation">[<span class="token punctuation">] classfileBuffer<span class="token punctuation">) <span class="token keyword">throws IllegalClassFormatException <span class="token punctuation">{
ClassReader cr <span class="token operator">= <span class="token keyword">new <span class="token class-name">ClassReader<span class="token punctuation">(classfileBuffer<span class="token punctuation">)<span class="token punctuation">;
ClassWriter cw <span class="token operator">= <span class="token keyword">new <span class="token class-name">ClassWriter<span class="token punctuation">(cr<span class="token punctuation">,ClassWriter<span class="token punctuation">.COMPUTE_FRAMES<span class="token punctuation">)<span class="token punctuation">;
cr<span class="token punctuation">.<span class="token function">accept<span class="token punctuation">(cw<span class="token punctuation">,<span class="token number">0<span class="token punctuation">)<span class="token punctuation">;
<span class="token keyword">return cw<span class="token punctuation">.<span class="token function">toByteArray<span class="token punctuation">(<span class="token punctuation">)<span class="token punctuation">;
<span class="token punctuation">}
<span class="token punctuation">}

Now let’s modify our?ClassWriter?to do something a little more useful by adding a?ClassVisitor(named?LogMethodClassVisitor) to call our event handler methods,such as?visitField?orvisitMethod,as the corresponding components are encountered during parsing.

Figure 5.


For your logging requirement,you want to check each method for the indicative annotation and add any specified logging. You only need to overwrite?ClassVisitor?visitMethod?to return aMethodVisitor?that supplies your implementation. Just like there are several components of a class,there are several components of a method,corresponding to the method attributes,annotations,and compiled code. ASM’s?MethodVisitor?provides hooks for visiting every opcode of the method,so you can get pretty granular in your modifications.


Again,the event handlers are always called in the same predefined sequence,so you always know all of the attributes and annotations on the method before you have to actually?visit?the code. (Incidentally,you can chain together multiple instances of?MethodVisitor,just like you can chain multiple instances of?ClassVisitor.) So in your?visitMethod,you’re going to hook in thePrintMessageMethodVisitor,overriding?visitAnnotations to capture your annotations and insert any required logging code.

Your?PrintMessageMethodVisitor?overrides two methods. First comes?visitAnnotation,so you can check the method for your?@ImportantLog?annotation. If present,you need to extract the field indexes from that field’s property. When?visitCode?executes,the presence of the annotation has already been determined and so it can add the specified logging. ThevisitAnnotation?code hooks in an?AnnotationVisitor?that exposes the field arguments on the@ImportantLog?annotation.


Now,let's look at the?visitCode?method. First,it must check if the?AnnotationVisitor?flagged the annotation as present. If so,then add your bytecode.?

<span class="token punctuation"&gt;}

This is the scary part of ASM — you actually have to write bytecode,so that’s something new to learn. You have to know about the stack,local variables,etc. It’s a fairly simple language,but if you just want to hack around,you can actually get the existing bytecode pretty easily with?javap:


I recommend writing the code you need in a Java test class,compiling that,and running it though?javap -c?to see the exact bytecode. In the code sample above,everything in blue is actually the bytecode. On each line,you get a one-byte opcode followed by zero or more arguments. You will need to determine those arguments for the target code,and they can usually be extracted by doing a?javap-c -v?on the original class (-v?for verbose,which displays the constant pool).

I encourage you to look at the?,which defines every opcode. There are operations likeload?and?store?(which move data between your operand stack and your local variables),overloaded for each parameter type. For example,?ILOAD?moves an integer value from the stack into a local variable field whereas?LLOAD?does the same for a long value.

There are also operations like?invokeVirtual,?invokeSpecial,?invokeStatic,and the recently added?invokeDynamic,for invoking standard instance methods,? constructors,static methods,and dynamic methods in dynamically typed JVM languages,respectively. There are also operations for creating new classes using the new operator,or to duplicate the top operand on the stack.

In sum,the positives of ASM are:

  • It has a small memory footprint.
  • It’s typically pretty quick.
  • It’s well documented on the web.
  • All of the opcodes are available,so you can really do a lot with it.
  • There’s lots of community support.

The really only one negative,but it’s a big one: you’re writing bytecode,so you have to understand what's going on under the hood and as a result developers tend to take some time to ramp up.?

Lessons learned

  • When you're dealing with bytecode manipulation,it's important to take small steps. Don't write lots of bytecode and expect it to immediately pass verification and work. Write one line at a time,think about what's in your stack,think about your local variables,and then write another line. If it's not passing the verifier,change one thing at a time; otherwise you'll never get it to work. Also keep in mind that besides the JVM verifier,ASM maintains a separate bytecode verifier,so it's good to run both and verify that your bytecode passes both of them.
  • It's important to think about class loading when you're modifying classes. When you use a Java agent,its transformer will touch every class as it is loaded into the JVM,no matter which class loader is loading it. So you need to make sure that the class loader can also see that object. Otherwise,you're going to run into trouble.?
  • If you're using Javassist and an application server that has multiple class loaders,you have to be concerned about your class pool being able to see your class objects. You might have to register a new classpath to your class pool to get it to see your class objects. You can chain your class pools like Java chains class loaders,so if it doesn't find the?CTClass?object in its class pool,it can go look at its parents.
  • Finally,it’s important to note that the JDK has its own capability to transform classes,and some limitations will apply to any class that the JDK has already transformed; you can modify the implementation of methods but,unlike original transformations,re-transformations are not permitted to change the class structure,for example by adding new fields or methods,or by modifying signatures.

Bytecode manipulation can make life easier. You can find bugs,add logging (as discussed),obfuscate source code,perform preprocessing like Spring or Hibernate,or even write your own language compiler. You can restrict your API calls,analyze code to see if multiple threads are accessing a collection,lazy-load data from the database,and find differences between JARs by inspecting them.

So I encourage you to make a bytecode-manipulation framework your friend. Someday,one might save your job.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读