Java String, StringBuilder & StringBuffer
String is an important part in any application that we write everyday, and we need to manipulate many kinds of string data such as text file, and structured or unstructured data input by users. If you ever learn C/C++, you will know that deal with string is a complex work.
For example, in C/C++,
- coder needs to deal with encoding, that C/C++ provide char and wchar, which represents for differnt kind of encoding
- In C, it's only support to handle string with char array or wchar array. In C++, the standard library provides
string
orwstring
class
Even C++ has significate improvements in string operation compared with C, but compared with Java, it's not easy to use.
What is Java String
Java String is an immmutable final class which provides kinds of String operations. Due to String is immutable,
- String operations such as trim, split, concat will create a new String, this will have perfomance impact
- String is a thread safe class
- Cache
- Easy encoding processing
-
Contains operations such as,
- get length
- trim
- check if is empty
- find
- check with prefix or suffix
- substring
- regex
- replace
- case convert
- value convert from other data types
- etc...
There are some differce in String class between version 8 and version 9 afterward.
Java 8 and Before
- it use char array to hold a String data
- Char is 16 bit, so String is UTF-16 encoding
- to get the String length is easy, just return the length of the char array
Java 9 and Afterwards
- it use byte array to hold a String data
- support compact String
-
the encoding is based on if COMPACT_STRINGS is enabled, and if string can be encoded with Latin encoding
- if COMPACT_STRINGS is not enabled, string will be always encode as UTF-16
- if COMPACT_STRINGS is enabled, and string can be encoded with Latin encoding, then the byte data will be encoded as Latin
- othewise, encode string as UTF-18
-
new methods such as
isBlank
,strip
andrepeat
are added
String Cache
According to research, 25% of an application data are string, and about half of the string are duplicated. Base on this, JVM will try to cache the string to avoid creating duplicated string in memory.
For example,
Assume, there is a string I am a String
in memory, and there are two variables that reference to this string, and this will saves memory. Cache is that a mechanism allow JVM to reduct duplicates string, that allows variables reference to a same string just have one copied of the real data.
Java provides a native method intern()
to allow user add a string to the cache pool manually, when intern()
method of a string is called,
- if that string is already in the string pool, then the string in the pool returned
- if that string is not in the string pool, then the string will be added into the pool and return
Intern is not a good mechanism, after Java 8u20, the G1 GC suppprts string duplication reduction by pointer multiple same string to one copy.
For example, guess what the output of the following code,
var s = new String("aa");
var q = new String("aa");
System.out.println(s == q);
var p = "aa";
var t = "aa";
System.out.println(p == t);
The anwser is,
From the example, we can see,
- using
new
to create a string, will always return a new string - literal string will use cache, that means return an exising string reference if there is
StringBuffer & StringBuilder
StringBuffer and StringBuilder are two classes which provide similiar functionailties to modify strings. Unlike using +
operator on two strings, the StringBuffer and StringBuilde aims to reduce the string objects creation during the string modification.
StringBuffer and StringBuilder has an interal array like string to hold on the data,
- String is immutable class, any modification will create a String object
-
Unlike String, any modification on StringBuffer or StringBuilder will update the internal array, not to create a new object every time
- if the original array's size is not enough to hold on modified data, the array will be resized
- the default array size of the StringBuffer and StringBuilder is
16
, to improve the perfomance, you'd better to specify the initial size to reduce the array resize
The differece between StringBuffer and StringBuilder is that,
- StringBuffer is thread safe, every method of StringBuffer is synchronized
- StringBuilder is not thread safe, use should handle it manually
Consider the following code, run with different JDK with javac
& javap
.
public class StringConcatExample {
public static void main(String[] args) {
System.out.println(concat1());
System.out.println(concat2());
}
public static String concat1() {
String a = "aa";
String b = "bb";
String str = a + b + "cc";
return str;
}
public static String concat2() {
String str = "aa" + "bb" + "cc";
return str;
}
}
-
JDK 8
Output is:Bash... public static java.lang.String concat1(); descriptor: ()Ljava/lang/String; flags: ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=3, args_size=0 0: ldc #6 // String aa 2: astore_0 3: ldc #7 // String bb 5: astore_1 6: new #8 // class java/lang/StringBuilder 9: dup 10: invokespecial #9 // Method java/lang/StringBuilder."<init>":()V 13: aload_0 14: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 17: aload_1 18: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 21: ldc #11 // String cc 23: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 26: invokevirtual #12 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 29: astore_2 30: aload_2 31: areturn LineNumberTable: line 8: 0 line 9: 3 line 10: 6 line 11: 30 public static java.lang.String concat2(); descriptor: ()Ljava/lang/String; flags: ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=1, args_size=0 0: ldc #13 // String aabbcc 2: astore_0 3: aload_0 4: areturn LineNumberTable: line 15: 0 line 16: 3 } ...
-
JDK 19
Output is:Bash... public static java.lang.String concat1(); descriptor: ()Ljava/lang/String; flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=2, locals=3, args_size=0 0: ldc #28 // String aa 2: astore_0 3: ldc #30 // String bb 5: astore_1 6: aload_0 7: aload_1 8: invokedynamic #32, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; 13: astore_2 14: aload_2 15: areturn LineNumberTable: line 10: 0 line 11: 3 line 12: 6 line 13: 14 LocalVariableTable: Start Length Slot Name Signature 3 13 0 a Ljava/lang/String; 6 10 1 b Ljava/lang/String; 14 2 2 str Ljava/lang/String; public static java.lang.String concat2(); descriptor: ()Ljava/lang/String; flags: (0x0009) ACC_PUBLIC, ACC_STATIC Code: stack=1, locals=1, args_size=0 0: ldc #36 // String aabbcc 2: astore_0 3: aload_0 4: areturn LineNumberTable: line 17: 0 line 18: 3 LocalVariableTable: Start Length Slot Name Signature 3 2 0 str Ljava/lang/String; } ...
We can see that,
- Both JDK8 and JDK19 will optimize the String concat in method
concat2
to a concated constant String - JDK8 optimize the String concat in method
concat1
by using the StringBuilder - JDK19 optimize the String concat in method
concat1
by using JVM instructionInvokeDynamic #0:makeConcatWithConstants
, which is decoupled with Java byte code