TABLE OF CONTENTS

Improve Memory Consumption Using Java’s String Pool

Author Rotem Benishti
Rotem Benishti on Feb 6, 2021
8 min 🕐

We all learned not to use equation checks for validating whether strings are equal in Java. Why? Because when we are using equation checks in Java (==) we are actually checking the pointer to that object instead of the value of the object.

For example:

Example 1

Copy
public static void main(String[] args) {
String firstStr = new String("example");
String secondStr = new String("example");
if ( firstStr == secondStr ) {
System.out.printLn("The strings are equal!");
} else {
System.out.printLn("The strings are NOT equal!");
}
}
// It will print: The strings are NOT equal!

Let's Mess Up Your Mind A Little Bit

Here is an another example of a simple code in java:

Example 2

Copy
public static void main(String[] args) {
String firstStr = "example";
String secondStr = "example";
if ( firstStr == secondStr ) {
System.out.printLn("The strings are equal!");
} else {
System.out.printLn("The strings are NOT equal!");
}
}
// It will print: The strings are equal!

How are those two strings equal by using a pointer?

Java's String Pool

A Java String Pool is a special memory region where strings are stored in the JVM.

Before Java 7, this special memory region was placed by the JVM in the PermGen space, which has a fixed size – it can not be expanded at runtime and is not eligible for garbage collection, which means that we can get an OutOfMemoryError from the JVM if there are too many strings in the string pool.

From Java 7 onwards, the java string pool is stored in the Heap space, which is where all the objects are stored on the JVM (and garbage collected from there). The advantage of this change is that it reduces the risk of getting an OutOfMemoryError because the string in the string pool will be garbage collected when there will be no pointer to them.

Which strings are stored in Java’s String Pool?

The strings that will be stored are:

  1. Literal Strings, for Example:
Copy
String myString = "test string pool";

The string above will be stored in the string pool automatically because it is a literal string.

  1. Interning a string. For example:
Copy
String myString = new String("test string pool").intern();

The intern method of the String class tells the JVM explicitly to put the string inside the string pool.

When and how should I use it?

The string pool provides you with a special memory region in the JVM which should be used for reducing the memory usage of a program.

How?

As we saw in example 2, the pointers of firstStr and secondStr are pointing to the same object in memory, which means that in this example we have a single String object in memory (“example”) instead of having two strings with the same value.

**We saved memory!**

improve-memory-consumption-using-javas-string-pool-heap

Going Over A Real Example

Consider you are building a server written in Java, where you should store the state of each student in memory. The student’s state holds all the courses names that this student is participating in.

For example, David is participating in “Programming Fundamentals”, “Programming”, “Computer Systems” and “Databases 1”.

Great ! here is our database (written in studentsAndCourses.json file):

STUDENTS COURSES DATABASE

Copy
{
"David": [
"Programming Fundamentals",
"Programming",
"Computer Systems",
"Databases 1"
],
"Isabella": ["Accounting", "Math", "Programming Fundamentals", "Databases 1"],
"James": ["Accounting", "Math", "Programming Fundamentals", "Databases 1"],
"Olivia": [
"Accounting",
"Computer Systems",
"Programming Fundamentals",
"Databases 1"
]
}

Let’s read it and create our cache from it:

Copy
public class StudentsCache {
private final static String STUDENTS_DB_FILE_NAME = "studentsAndCourses.json";
private final Map<String, List<String>> studentsToCourses;
public StudentsCache() throws IOException, ParseException {
this.studentsToCourses = createCacheFromStudentsDB();
System.out.println(studentsToCourses);
}
private Map<String, List<String>> createCacheFromStudentsDB() throws IOException, ParseException {
InputStream is = StudentsCache.class.getClassLoader().getResourceAsStream(STUDENTS_DB_FILE_NAME);
assert is != null;
Reader dbReader = new InputStreamReader(is);
JSONParser parser = new JSONParser();
JSONObject dbData = (JSONObject) parser.parse(dbReader);
// Put our JSON data into the cache
// Each key in our Map is a student's name
// Each value is the list of courses he attends.
Map<String, List<String>> studentsToCourses = new HashMap<>();
for (Object studentName: dbData.keySet()) {
List<String> coursesNames = new ArrayList<>();
JSONArray coursesList = (JSONArray) dbData.get(studentName);
for (Object courseName: coursesList) {
coursesNames.add(((String) courseName));
}
studentsToCourses.put((String) studentName, coursesNames);
}
return studentsToCourses;
}
public static void main(String[] args) throws IOException, ParseException {
// Initialize the cache.
StudentsCache s = new StudentsCache();
System.out.println();
}
}

Now, let’s use the debugger to see the objects in memory:

improve-memory-consumption-java-string-pool-1

Note: (char[24]@534) – The number @534 is the 534th object created since the application started, which means the above debugger example refers to different objects for each of the same “Programming Fundamentals” string.

As we can see from the debugger, the course “Programming Fundamentals” exists in all students. As a result, we have 4 different instances of the same string stored in memory, which will lead to a big waste of memory!

Let’s use what we’ve learned about String Pool’s. We use intern to put the courses names in the String Pool, therefore making sure it will use the same string stored in memory.

Here is the same code with minimal change:

Copy
...
...
// Put our JSON data into the cache
// Each key in our Map is a student's name
// Each value is the list of courses he attends.
Map<String, List<String>> studentsToCourses = new HashMap<>();
for (Object studentName: dbData.keySet()) {
List<String> coursesNames = new ArrayList<>();
JSONArray coursesList = (JSONArray) dbData.get(studentName);
for (Object courseName: coursesList) {
// -- Change: ------------------------------------
// We are now interning the string!
coursesNames.add(((String) courseName.intern()));
// -----------------------------------------------
}
studentsToCourses.put((String) studentName, coursesNames);
}
return studentsToCourses;
}
...
...

We have added a call to the String method intern() which explicitly tells the JVM to add this String into the string pool. Why should we do that? Because we know that there will be multiple instances of the same course name and we would like to save memory!

Let’s run the debugger once again:

improve-memory-consumption-java-string-pool-2

As we can see, now the course “Programming Fundamentals” (and every other course) is stored in memory just once instead of 4 times. In addition, we can see that all of the “Programming Fundamentals” strings refers to the same object in memory @516.

Now consider a production application like the well known website – Udemy. Udemy has got a huge amount of courses to display, if it will hold all of them in memory without using an option like String Pool, their application will crash. By using an option like String Pool they can make sure only one instance of the course name will be saved instead of saving it multiple times in the memory cache.

This was a simple solution to memory consumption issues you will encounter in real life production applications, I’m sure you will now be better equipped to tackle such memory issues.

© 2020-present Sagi Liba. All Rights Reserved