microservices

Why HeapDumpOnOutOfMemoryError Should Be Avoided in Production

1 May 2025·702 words·4 mins

A comprehensive guide exploring why enabling HeapDumpOnOutOfMemoryError can cause significant performance issues in production environments, which OutOfMemoryError types actually trigger heap dumps, and better alternatives like JFR for memory leak detection and automatic service restart strategies.

Can GraalVM Native Image Processes Be Detected by jps? Plus Our Production Strategy

19 April 2024·335 words·2 mins

Discover when GraalVM Native Image processes show up in jps and learn our battle-tested approach for choosing between GraalVM Native Image and JVM in production environments. We break down our strategy for Lambda-style tasks versus long-running microservices.

Maximizing Request Throughput to Third-Party APIs: A Practical Testing Approach

18 April 2024·892 words·5 mins

Learn how to develop and test high-performance API clients using WebClient, TestContainers, and toxicproxy. This comprehensive guide covers asynchronous request handling, isolated testing environments, and realistic failure simulation for robust microservice development.

The Hidden Performance Killer: Why Code Location in Logs Can Destroy Your Microservice Performance

2 March 2022·898 words·5 mins

Discover how enabling code location in logs can cause severe CPU performance issues in microservices, especially reactive applications. This deep-dive analysis reveals the hidden costs of stack walking in Log4j2 and provides actionable solutions for high-throughput systems.

Spring Data Redis Connection Leak Mystery: When Your Microservice Goes Rogue

14 October 2021·1820 words·9 mins

A production incident investigation revealing how Spring Data Redis + Lettuce can leak connections when mixing SessionCallback and executeWithStickyConnection operations. Deep dive into connection management mechanisms, JFR analysis techniques, and practical solutions to prevent your Redis connection pool from becoming a black hole.

Gateway Avalanche Crisis: How Synchronous Redis Calls Nearly Brought Down Our System

1 September 2021·1662 words·8 mins

A deep dive into a production incident where our Spring Cloud Gateway experienced cascading failures due to blocking Redis operations. Learn how synchronous API calls in reactive environments can cause thread starvation, leading to health check failures and system-wide avalanches, plus the complete solution using async patterns.

Troubleshooting a SSL Performance Bottleneck Using JFR

27 March 2021·395 words·2 mins

In-depth analysis of a microservice performance issue with CPU spikes and database connection anomalies. Through JFR profiling, we discovered the root cause was Java SecureRandom blocking on /dev/random and provide solutions using /dev/urandom.