如何理解《深入理解java虚拟机》第二版中对String.intern()方法的讲解中所举的例子?

代码如下 public class RuntimeConstantPoolOOM { public static void main(String[] args) { String str1 = new StringBuilder("计算机").append("软件").toString(); System.out.println(str1.intern() == str1); String str2 = new StringBuilder("ja").append("va").toString(); System.out.println(str2.intern() == str2); } } 结果 true false 不明白为什么"java"字符串会在执行StringBuilder.toString()之前出现…
关注者
265
被浏览
8793
想不到有那么多人邀请俺回答这问题…大家都是好奇宝宝啊 >_<
周志明大大写书的时候举的这个例子真是挖坑不填让大家直往里跳啊哈哈哈。

对Java类库以及JVM有一点认识的同学肯定很快就能猜到个大概:Java标准库在JVM启动过程中加载的部分,可能里面就有类里有引用"java"字符串字面量,这个字面量被初次引用的时候就会被intern,加入到字符串常量池中去。
等到题主写的main()执行的时候,那个"java"字符串早就已经在字符串常量池里了,所以会有这样的结果。

但现有的答案没有任何一个确切给出是哪个类导致这个"java"字符串被intern的。猜测是好的开头,但最终还是得眼见为实,做实验确认了才是硬道理。
让俺来满足一下好奇宝宝们的好奇心吧~

其实这事情很简单:首先,这个行为必然是要针对某个具体的JDK / JRE实现来讨论的,因为Java语言规范 / JVM规范 / Java SE标准库的JavaDoc(也是Java SE平台规范的一部分)都没有、也不会强制指定哪个类里一定要引用"java"这个字符串常量,而且它必须是第一个使得"java"被intern的类——规定这个也太无聊了。

所以我们只能针对具体实现来讨论这个问题。
那倒是好办。找一个具体的JDK / JRE实现,用它来调试一下题主的程序,就可以指出针对这个JDK / JRE哪个类是源头了。

我在Oracle JDK7u45上做的实验显示,这个版本的JDK / JRE上,字符串常量池里的"java"字符串来自:
——
——
——
sun.misc.Version类!

sun.misc.Version 类会在JDK类库的初始化过程中被加载并初始化,而在初始化时它需要对静态常量字段根据指定的常量值(ConstantValue)做默认初始化,此时被 sun.misc.Version.launcher 静态常量字段所引用的"java"字符串字面量就被intern到HotSpot VM的字符串常量池——StringTable里了。

大家可以看看OpenJDK 7u45里的对应源码:
jdk7u/jdk7u/jdk: c5ca4daec23b src/share/classes/sun/misc/Version.java.template
这是一个template文件,因为其中有很多常量的值是在JDK构建的过程中根据配置填入的。在实际构建出来的JDK7u里,这个文件填入了常量值最终形成的Java源码文件的开头部分是这样的:
package sun.misc;
import java.io.PrintStream;

public class Version {


    private static final String launcher_name =
        "java";
    // ...
}

具体来说,HotSpot VM会在初始化过程中主动触发 java.lang.System 类的加载和初始化,过程中会调用到 java.lang.System.initializeSystemClass() 静态方法:
jdk7u/jdk7u/jdk: c5ca4daec23b src/share/classes/java/lang/System.java
    private static void initializeSystemClass() {
        // ...
        sun.misc.Version.init();
        // ...
    }
这个方法会进一步调用 sun.misc.Version.init() 静态方法,由此触发 sun.misc.Version 类的加载与初始化。在初始化上面提到的 launcher_name 静态常量字段时,就把其所引用的"java"字符串常量给放进StringTable里了。

实验使用的代码源自题主所引用的那段,稍微修改:
public class YY {
  public static void main(String[] args) {
    String str1 = new StringBuilder("计算机").append("软件").toString();
    System.out.println(str1);
    System.out.println(str1.intern() == str1);

    String str2 = new StringBuilder("ja").append("va").toString();
    System.out.println(str2);
    System.out.println(str2.intern() == str2);
  }
}

在Mac OS X上用LLDB来调试Oracle JDK7u45-b18运行上述YY类。
这里用的是原装Oracle JDK7u45,没有使用自己准备的额外调试符号信息,所以大家要想自己动手做这个实验也可以做得到。
可以看到在"java"即将被放入 StringTable 时的调用栈是:
(lldb) bt
* thread #4: tid = 0x120e3cd, 0x00000001024bc2da libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int), stop reason = breakpoint 1.1
  * frame #0: 0x00000001024bc2da libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int)
    frame #1: 0x00000001024bc498 libjvm.dylib`StringTable::basic_add(int, Handle, unsigned short*, int, unsigned int, Thread*) + 92
    frame #2: 0x00000001024bc594 libjvm.dylib`StringTable::intern(Handle, unsigned short*, int, Thread*) + 170
    frame #3: 0x00000001024bc75c libjvm.dylib`StringTable::intern(Symbol*, Thread*) + 90
    frame #4: 0x00000001021e7b7f libjvm.dylib`constantPoolOopDesc::string_at_impl(constantPoolHandle, int, Thread*) + 111
    frame #5: 0x00000001022dce12 libjvm.dylib`initialize_static_field(fieldDescriptor*, Thread*) + 751
    frame #6: 0x00000001022abc46 libjvm.dylib`instanceKlass::do_local_static_fields_impl(instanceKlassHandle, void (*)(fieldDescriptor*, Thread*), Thread*) + 122
    frame #7: 0x00000001022dcb07 libjvm.dylib`java_lang_Class::create_mirror(KlassHandle, Thread*) + 437
    frame #8: 0x0000000102197a1e libjvm.dylib`ClassFileParser::parseClassFile(Symbol*, Handle, Handle, KlassHandle, GrowableArray<Handle>*, TempNewSymbol&, bool, Thread*) + 10518
    frame #9: 0x0000000102198dab libjvm.dylib`ClassLoader::load_classfile(Symbol*, Thread*) + 489
    frame #10: 0x00000001024c38f9 libjvm.dylib`SystemDictionary::load_instance_class(Symbol*, Handle, Thread*) + 243
    frame #11: 0x00000001024c3497 libjvm.dylib`SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*) + 1139
    frame #12: 0x00000001024c3d4a libjvm.dylib`SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*) + 260
    frame #13: 0x00000001024c417b libjvm.dylib`SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*) + 49
    frame #14: 0x00000001021e87ba libjvm.dylib`constantPoolOopDesc::klass_at_impl(constantPoolHandle, int, Thread*) + 506
    frame #15: 0x000000010239d0c4 libjvm.dylib`LinkResolver::resolve_klass(KlassHandle&, constantPoolHandle, int, Thread*) + 32
    frame #16: 0x000000010239d12e libjvm.dylib`LinkResolver::resolve_pool(KlassHandle&, Symbol*&, Symbol*&, KlassHandle&, constantPoolHandle, int, Thread*) + 50
    frame #17: 0x000000010239f660 libjvm.dylib`LinkResolver::resolve_invokestatic(CallInfo&, constantPoolHandle, int, Thread*) + 80
    frame #18: 0x00000001022d1f22 libjvm.dylib`InterpreterRuntime::resolve_invoke(JavaThread*, Bytecodes::Code) + 550
    frame #19: 0x000000010401d77a java/lang/System.initializeSystemClass() @ bci 34
    frame #20: 0x00000001040004e7 call_stub
    frame #21: 0x00000001022d6d90 libjvm.dylib`JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) + 554
    frame #22: 0x00000001022d6ec1 libjvm.dylib`JavaCalls::call_static(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*) + 145
    frame #23: 0x00000001022d6fc3 libjvm.dylib`JavaCalls::call_static(JavaValue*, KlassHandle, Symbol*, Symbol*, Thread*) + 57
    frame #24: 0x00000001024f260c libjvm.dylib`Threads::create_vm(JavaVMInitArgs*, bool*) + 2740
    frame #25: 0x0000000102309be1 libjvm.dylib`JNI_CreateJavaVM + 98
    frame #26: 0x0000000100002915 java`JavaMain + 308
    frame #27: 0x00007fff8c0e8899 libsystem_pthread.dylib`_pthread_body + 138
    frame #28: 0x00007fff8c0e872a libsystem_pthread.dylib`_pthread_start + 137
    frame #29: 0x00007fff8c0ecfc9 libsystem_pthread.dylib`thread_start + 13
其中frame #19和#20的名字是我用别的工具抓出来的 >_<
<- 这可能是这个实验最tricky的地方。不过这解释起来挺麻烦所以这里就不展开说了。

Frame #19是由HotSpot VM的解释器所执行的java.lang.System.initializeSystemClass() 方法,其当前执行的字节码是:
     34: b8 31 00       invokestatic        cpc#49    // Method: sun/misc/Version.init:()V

这个调试过程挺简单直观——只要知道足够多HotSpot VM的实现细节。
下面是我使用的命令和部分命令的输出:
$ lldb /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/bin/java
(lldb) b StringTable::lookup
(lldb) run -XX:+TraceClassLoading YY
Process 9392 launched: '/Users/krismo/sdk/jdk1.7.0_45/Contents/Home/bin/java' (x86_64)
2 locations added to breakpoint 1
[Opened /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.Object from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.io.Serializable from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.Comparable from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.CharSequence from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.String from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.reflect.GenericDeclaration from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.reflect.Type from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.reflect.AnnotatedElement from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.Class from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.Cloneable from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.ClassLoader from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
[Loaded java.lang.System from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
Process 9392 stopped
* thread #4: tid = 0x1213756, 0x0000000102cbc2da libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int), stop reason = breakpoint 1.1
    frame #0: 0x0000000102cbc2da libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int)
libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int):
-> 0x102cbc2da:  pushq  %rbp
   0x102cbc2db:  movq   %rsp, %rbp
   0x102cbc2de:  pushq  %r15
   0x102cbc2e0:  pushq  %r14
(lldb) br l
Current breakpoints:
1: name = 'StringTable::lookup', locations = 2, resolved = 2, hit count = 1
  1.1: where = libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int), address = 0x0000000102cbc2da, resolved, hit count = 1 
  1.2: where = libjvm.dylib`StringTable::lookup(Symbol*), address = 0x0000000102cbc3a8, resolved, hit count = 0 

(lldb) br mod -c "((char16_t*)$arg3)[0] == 'j' && ((char16_t*)$arg3)[1] == 'a' && ((char16_t*)$arg3)[2] == 'v' && ((char16_t*)$arg3)[3] == 'a' && ((int)$arg4) == 4" 1.1
(lldb) cont
Process 9392 resuming
(lldb) [Loaded java.lang.Throwable from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
...
[Loaded java.lang.Runtime from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
Process 9392 stopped
* thread #4: tid = 0x1213756, 0x0000000102cbc2da libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int), stop reason = breakpoint 1.1
    frame #0: 0x0000000102cbc2da libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int)
libjvm.dylib`StringTable::lookup(int, unsigned short*, int, unsigned int):
-> 0x102cbc2da:  pushq  %rbp
   0x102cbc2db:  movq   %rsp, %rbp
   0x102cbc2de:  pushq  %r15
   0x102cbc2e0:  pushq  %r14
然后执行 bt 命令(backtrace)就可以看到前面说的那个调用栈了。
(除了frame #19 和 #20 的符号之外 >_<)

这个实验的原理是:
  • 在 StringTable::lookup() 函数入口下断点,并且配置它为一个条件断点,仅当传入的字符串内容为"java"时才停下来;
  • 在碰到上述断点时,查看调用栈(backtrace)。

StringTable其实就是个简单的哈希表,是HotSpot VM里用来实现字符串驻留功能的全局数据结构。如果用Java语法来说,这个StringTable其实就是个HashSet<String>——它并不保存驻留String对象本身,而是存储这些被驻留的String对象的引用。
VM层面触发的字符串驻留(例如把Class文件里的CONSTANT_String类型常量转换为运行时对象),以及Java代码主动触发的字符串驻留(java.lang.String.intern()),两种请求都由StringTable来处理。
在驻留的过程中,StringTable::lookup() 函数是必经之路,是用来探测(probe)看某个字符串是否已经驻留在StringTable里了。所以在这里下断点的话,某个字符串第一次经过这个地方的时候看调用者是谁就可以看出是什么地方在触发某个字符串的驻留。

不知道如何抓出frame #19 与 #20 的符号信息的话也不用气馁,在碰上该实验的断点后,执行几次 finish 命令来向上返回,可以看到在从 ClassFileParser::parseClassFile() 函数返回到 ClassLoader::load_classfile() 函数的时候, -XX:+TraceClassLoading 功能会打出日志:
[Loaded sun.misc.Version from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
混在lldb的输出中是这样的:
(lldb) finish
Process 9392 stopped
* thread #4: tid = 0x1213756, 0x0000000102997a1e libjvm.dylib`ClassFileParser::parseClassFile(Symbol*, Handle, Handle, KlassHandle, GrowableArray<Handle>*, TempNewSymbol&, bool, Thread*) + 10518, stop reason = step out
    frame #0: 0x0000000102997a1e libjvm.dylib`ClassFileParser::parseClassFile(Symbol*, Handle, Handle, KlassHandle, GrowableArray<Handle>*, TempNewSymbol&, bool, Thread*) + 10518
libjvm.dylib`ClassFileParser::parseClassFile(Symbol*, Handle, Handle, KlassHandle, GrowableArray<Handle>*, TempNewSymbol&, bool, Thread*) + 10518:
-> 0x102997a1e:  cmpq   $0x0, 0x8(%rbx)
   0x102997a23:  jne    0x102997d8a               ; ClassFileParser::parseClassFile(Symbol*, Handle, Handle, KlassHandle, GrowableArray<Handle>*, TempNewSymbol&, bool, Thread*) + 11394
   0x102997a29:  cmpq   $0x0, -0x200(%rbp)
   0x102997a31:  jne    0x102997a37               ; ClassFileParser::parseClassFile(Symbol*, Handle, Handle, KlassHandle, GrowableArray<Handle>*, TempNewSymbol&, bool, Thread*) + 10543
(lldb) finish
[Loaded sun.misc.Version from /Users/krismo/sdk/jdk1.7.0_45/Contents/Home/jre/lib/rt.jar]
Process 9392 stopped
* thread #4: tid = 0x1213756, 0x0000000102998dab libjvm.dylib`ClassLoader::load_classfile(Symbol*, Thread*) + 489, stop reason = step out
    frame #0: 0x0000000102998dab libjvm.dylib`ClassLoader::load_classfile(Symbol*, Thread*) + 489
libjvm.dylib`ClassLoader::load_classfile(Symbol*, Thread*) + 489:
-> 0x102998dab:  cmpq   $0x0, 0x8(%rbx)
   0x102998db0:  jne    0x102998dd9               ; ClassLoader::load_classfile(Symbol*, Thread*) + 535
   0x102998db2:  movq   %rax, %r14
   0x102998db5:  movq   %r15, %rdi
(lldb)  
这是HotSpot VM在 ClassFileParser::parseClassFile() 函数的末尾,在为刚加载的类创建并初始化好其Java mirror(java.lang.Class对象)之后打出的日志。
这行日志就指明了刚才是哪个类里有静态常量字段在引用着"java"字符串了。

好玩不?

我在Linux的Oracle JDK 8和Zing JDK7上也做了同一个实验,对它们也可以得出跟上面一样的结论,"java"字符串常量是在初始化 sun.misc.Version 类时被放进字符串常量池 StringTable 里的。

==============================================

说到 sun.misc.Version 类…

在Oracle JDK7u / OpenJDK7u里的HotSpot VM,会借助 sun.misc.Version 类来获取JDK的名字(例如是Oracle JDK还是OpenJDK还是IcedTea之类)和具体版本信息,以便在需要的时候输出友好的提示信息——例如在JVM crash时,在hs_err文件的开头输出这些信息。
——那个呃,不好意思,这个功能是我实现的:jdk7u/jdk7u/hotspot: src/share/vm/runtime/thread.cpp diff,其中这行代码会去查找 sun.misc.Version 类,不过它不是触发初次加载的地方:
  klassOop k = SystemDictionary::find(vmSymbols::sun_misc_Version(),
                                      Handle(), Handle(), CHECK_AND_CLEAR_NULL);
岁月啊。当时我做这个功能的时候我还在淘宝的JVM组工作…

做这个功能主要是因为当时的Oracle JDK与Oracle HotSpot VM有少量私有实现,会跟OpenJDK HotSpot VM有非常小的不兼容;如果拿OpenJDK HotSpot VM搭配Oracle JDK类库来跑的话可能会在很奇怪的地方crash。
为了在crash时一眼就能看出有没有这样的问题,我就给HotSpot VM加了这个功能。做了一点微小的工作。