Skip to main content
  1. Posts/

Why HeapDumpOnOutOfMemoryError Should Be Avoided in Production

·702 words·4 mins
NeatGuyCoding
Author
NeatGuyCoding
Table of Contents

1. Why We Don’t Recommend Enabling HeapDumpOnOutOfMemoryError
#

1.1. When HeapDumpOnOutOfMemoryError is Enabled, Which OutOfMemoryErrors Actually Trigger It?
#

Here’s something interesting - once you enable HeapDumpOnOutOfMemoryError, not every OutOfMemoryError will actually trigger a heap dump! Let’s break down the different types of OutOfMemoryError exceptions and see which ones play along:

  1. OutOfMemoryError: Java heap space and OutOfMemoryError: GC overhead limit exceeded: Both of these indicate insufficient Java heap memory - one occurs during allocation when there’s not enough space left, while the other hits a specific threshold. Both of these WILL trigger HeapDumpOnOutOfMemoryError

  2. OutOfMemoryError: unable to create native thread: This happens when the system can’t create new platform threads. This one WON’T trigger HeapDumpOnOutOfMemoryError

  3. OutOfMemoryError: Requested array size exceeds VM limit: Thrown when the requested array size exceeds heap memory limits. This WILL trigger HeapDumpOnOutOfMemoryError

  4. OutOfMemoryError: Compressed class space and OutOfMemoryError: Metaspace: Both relate to metaspace issues. Both WILL trigger HeapDumpOnOutOfMemoryError

  5. OutOfMemoryError: Cannot reserve xxx bytes of direct buffer memory (allocated: xxx, limit: xxx): In DirectByteBuffer, the system first requests quota from the Bits class, which maintains a global totalCapacity variable tracking all DirectByteBuffer sizes. You can limit this with -XX:MaxDirectMemorySize. This WON’T trigger HeapDumpOnOutOfMemoryError

  6. OutOfMemoryError: map failed: This occurs during file memory mapping (MMAP) when system memory is insufficient. This WON’T trigger HeapDumpOnOutOfMemoryError

There are also some additional cases:

  1. Shenandoah allocation region bitmap memory issues that trigger OutOfMemoryError WILL trigger HeapDumpOnOutOfMemoryError

  2. OutOfMemoryError: Native heap allocation failed: The message might vary across operating systems, but typically includes “native heap.” This usually isn’t related to Java object heap but rather other memory allocation failures. These WON’T trigger HeapDumpOnOutOfMemoryError

1.2. Why We Recommend Against Enabling HeapDumpOnOutOfMemoryError
#

Let’s dive into how HeapDumpOnOutOfMemoryError actually works:

  1. The JVM enters a safepoint, pausing all application threads. For HeapDumpOnOutOfMemoryError specifically, it uses single-threaded dumping (unlike jcmd/jmap which can use multiple threads) to create multiple files. Then it exits the safepoint.

  2. These multiple files are then merged into one and compressed.

The main bottleneck here is the first step - the writing process - and specifically, disk I/O performance. Let’s look at some real-world cloud storage performance standards:

  1. AWS EFS (standard storage): https://docs.aws.amazon.com/efs/latest/ug/performance.html
  2. AWS EBS (SSD equivalent): https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html

For a 4GB heap, using EFS (which corresponds to under 100GB disk), writing would take at least 4 * 1024 / 300 = 13.65 seconds (and that’s at peak performance!). If peak performance is already being used elsewhere, you’re looking at 4 * 1024 / 15 = 273 seconds. Even with EBS, you’d still need 4 * 1024 / 1000 = 4 seconds. Remember, this is the time your application threads are completely frozen in a stop-the-world state! And this doesn’t even account for multiple container instances on the same machine. From a cost perspective, we can’t exactly give every microservice AWS EBS (SSD equivalent) storage.

So our recommendation? Skip the HeapDumpOnOutOfMemoryError altogether!

2. What to Use Instead of HeapDumpOnOutOfMemoryError?
#

2.1. Use JFR for Memory Leak Detection
#

When I need to track down OutOfMemoryError issues, I typically rely on JFR’s Object Allocation Sample and Old Object Sample data to pinpoint problematic objects. Only when these approaches don’t yield results do I consider generating a heap dump.

2.2. Why Should Microservices Experiencing OutOfMemoryError Be Restarted?
#

Here’s the thing - most code, including JDK source code, doesn’t account for OutOfMemoryError at every memory allocation point. This can lead to inconsistent application state. For example, during a HashMap rehash operation, if an OutOfMemoryError is thrown partway through, the previously updated state becomes corrupted. Most libraries rarely catch Throwable - they typically only catch Exception.

It’s simply not practical to handle OutOfMemoryError at every memory allocation point. To prevent unexpected consistency issues caused by OutOfMemoryError, the safest approach is to take the service offline and restart it.

2.3. How to Implement Automatic Restart for Microservices Experiencing OutOfMemoryError?
#

You can use -XX:OnOutOfMemoryError="/path/to/script.sh" to specify a script that handles:

  1. Graceful microservice shutdown
  2. Microservice restart

For Spring Boot applications, consider enabling local access to /actuator/shutdown to gracefully shut down the microservice (though some community members report this can hang when OutOfMemoryError occurs - this might be due to having HeapDumpOnOutOfMemoryError enabled as mentioned in section 1.2). Kubernetes will automatically spin up a new instance.

Related

Can GraalVM Native Image Processes Be Detected by jps? Plus Our Production Strategy
·335 words·2 mins
Discover when GraalVM Native Image processes show up in jps and learn our battle-tested approach for choosing between GraalVM Native Image and JVM in production environments. We break down our strategy for Lambda-style tasks versus long-running microservices.
OpenJDK JVM Deep Dive: The Most Detailed JVM Memory Structure Analysis
·23344 words·110 mins
A comprehensive deep-dive into JVM memory architecture covering heap memory, metaspace, thread stacks, and compressed object pointers. This technical analysis examines memory allocation processes, Native Memory Tracking, and provides practical examples using tools like jol, jhsdb, and JFR for understanding JVM memory management internals.
OpenJDK JVM Deep Dive: Java Memory Model - A Comprehensive Guide to Concurrency and Memory Barriers
·12058 words·57 mins
A deep dive into Java Memory Model (JMM) from specification to implementation, covering memory barriers, CPU reordering, and Java 9+ VarHandle APIs. Learn about coherence, causality, consensus, and how volatile, final, and other synchronization mechanisms work under the hood with practical jcstress examples.