1. Why We Don’t Recommend Enabling HeapDumpOnOutOfMemoryError#
1.1. When HeapDumpOnOutOfMemoryError is Enabled, Which OutOfMemoryErrors Actually Trigger It?#
Here’s something interesting - once you enable HeapDumpOnOutOfMemoryError, not every OutOfMemoryError will actually trigger a heap dump! Let’s break down the different types of OutOfMemoryError exceptions and see which ones play along:
OutOfMemoryError: Java heap space
andOutOfMemoryError: GC overhead limit exceeded
: Both of these indicate insufficient Java heap memory - one occurs during allocation when there’s not enough space left, while the other hits a specific threshold. Both of these WILL triggerHeapDumpOnOutOfMemoryError
OutOfMemoryError: unable to create native thread
: This happens when the system can’t create new platform threads. This one WON’T triggerHeapDumpOnOutOfMemoryError
OutOfMemoryError: Requested array size exceeds VM limit
: Thrown when the requested array size exceeds heap memory limits. This WILL triggerHeapDumpOnOutOfMemoryError
OutOfMemoryError: Compressed class space
andOutOfMemoryError: Metaspace
: Both relate to metaspace issues. Both WILL triggerHeapDumpOnOutOfMemoryError
OutOfMemoryError: Cannot reserve xxx bytes of direct buffer memory (allocated: xxx, limit: xxx)
: In DirectByteBuffer, the system first requests quota from the Bits class, which maintains a global totalCapacity variable tracking all DirectByteBuffer sizes. You can limit this with-XX:MaxDirectMemorySize
. This WON’T triggerHeapDumpOnOutOfMemoryError
OutOfMemoryError: map failed
: This occurs during file memory mapping (MMAP) when system memory is insufficient. This WON’T triggerHeapDumpOnOutOfMemoryError
There are also some additional cases:
Shenandoah allocation region bitmap memory issues that trigger
OutOfMemoryError
WILL triggerHeapDumpOnOutOfMemoryError
OutOfMemoryError: Native heap allocation failed
: The message might vary across operating systems, but typically includes “native heap.” This usually isn’t related to Java object heap but rather other memory allocation failures. These WON’T triggerHeapDumpOnOutOfMemoryError
1.2. Why We Recommend Against Enabling HeapDumpOnOutOfMemoryError
#
Let’s dive into how HeapDumpOnOutOfMemoryError
actually works:
The JVM enters a safepoint, pausing all application threads. For HeapDumpOnOutOfMemoryError specifically, it uses single-threaded dumping (unlike jcmd/jmap which can use multiple threads) to create multiple files. Then it exits the safepoint.
These multiple files are then merged into one and compressed.
The main bottleneck here is the first step - the writing process - and specifically, disk I/O performance. Let’s look at some real-world cloud storage performance standards:
- AWS EFS (standard storage): https://docs.aws.amazon.com/efs/latest/ug/performance.html
- AWS EBS (SSD equivalent): https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html
For a 4GB heap, using EFS (which corresponds to under 100GB disk), writing would take at least 4 * 1024 / 300 = 13.65
seconds (and that’s at peak performance!). If peak performance is already being used elsewhere, you’re looking at 4 * 1024 / 15 = 273
seconds. Even with EBS, you’d still need 4 * 1024 / 1000 = 4
seconds. Remember, this is the time your application threads are completely frozen in a stop-the-world state! And this doesn’t even account for multiple container instances on the same machine. From a cost perspective, we can’t exactly give every microservice AWS EBS (SSD equivalent) storage.
So our recommendation? Skip the HeapDumpOnOutOfMemoryError
altogether!
2. What to Use Instead of HeapDumpOnOutOfMemoryError?#
2.1. Use JFR for Memory Leak Detection#
When I need to track down OutOfMemoryError issues, I typically rely on JFR’s Object Allocation Sample and Old Object Sample data to pinpoint problematic objects. Only when these approaches don’t yield results do I consider generating a heap dump.
2.2. Why Should Microservices Experiencing OutOfMemoryError Be Restarted?#
Here’s the thing - most code, including JDK source code, doesn’t account for OutOfMemoryError at every memory allocation point. This can lead to inconsistent application state. For example, during a HashMap rehash operation, if an OutOfMemoryError is thrown partway through, the previously updated state becomes corrupted. Most libraries rarely catch Throwable - they typically only catch Exception.
It’s simply not practical to handle OutOfMemoryError at every memory allocation point. To prevent unexpected consistency issues caused by OutOfMemoryError, the safest approach is to take the service offline and restart it.
2.3. How to Implement Automatic Restart for Microservices Experiencing OutOfMemoryError?#
You can use -XX:OnOutOfMemoryError="/path/to/script.sh"
to specify a script that handles:
- Graceful microservice shutdown
- Microservice restart
For Spring Boot applications, consider enabling local access to /actuator/shutdown
to gracefully shut down the microservice (though some community members report this can hang when OutOfMemoryError occurs - this might be due to having HeapDumpOnOutOfMemoryError enabled as mentioned in section 1.2). Kubernetes will automatically spin up a new instance.