Skip to main content

Engineering

Tackling a Mysterious JVM Safepoint Issue: A Journey from Problem to Solution

Tackling a Mysterious JVM Safepoint Issue: A Journey from Problem to Solution

·1004 words·5 mins
A deep dive into diagnosing and resolving a production JVM issue where applications would freeze during hourly log synchronization tasks. We explore safepoint analysis, JVM log output blocking, asynchronous logging implementation, and WebFlux optimization to achieve a complete solution.
Spring Data Redis Connection Leak Mystery: When Your Microservice Goes Rogue

Spring Data Redis Connection Leak Mystery: When Your Microservice Goes Rogue

·1820 words·9 mins
A production incident investigation revealing how Spring Data Redis + Lettuce can leak connections when mixing SessionCallback and executeWithStickyConnection operations. Deep dive into connection management mechanisms, JFR analysis techniques, and practical solutions to prevent your Redis connection pool from becoming a black hole.
Gateway Avalanche Crisis: How Synchronous Redis Calls Nearly Brought Down Our System

Gateway Avalanche Crisis: How Synchronous Redis Calls Nearly Brought Down Our System

·1662 words·8 mins
A deep dive into a production incident where our Spring Cloud Gateway experienced cascading failures due to blocking Redis operations. Learn how synchronous API calls in reactive environments can cause thread starvation, leading to health check failures and system-wide avalanches, plus the complete solution using async patterns.