Skip to main content

troubleshooting

Tackling a Mysterious JVM Safepoint Issue: A Journey from Problem to Solution
·1004 words·5 mins
A deep dive into diagnosing and resolving a production JVM issue where applications would freeze during hourly log synchronization tasks. We explore safepoint analysis, JVM log output blocking, asynchronous logging implementation, and WebFlux optimization to achieve a complete solution.
Solving JVM Safepoint Delays: A Journey from EFS Integration to Async Logging
·970 words·5 mins
An in-depth investigation into mysterious JVM safepoint delays after upgrading to Java 17 and implementing centralized log collection with AWS EFS. We discovered how file I/O blocking during log output can freeze entire JVM processes and solved it using async logging and proper WebFlux implementation.
Troubleshooting Memory Issues After Spring Boot Upgrade: A Deep Dive into ResolvableType Object Creation
·1180 words·6 mins
An investigation into excessive memory allocation and YoungGC frequency after upgrading to Spring Boot 2.4.6 + Spring Cloud 2020.0.x, revealing how BeanUtils.copyProperties creates massive ResolvableType objects without caching in Spring 5.3.x versions.
MySQL Optimizer Statistics: Why Your Queries Choose the Wrong Index
·1600 words·8 mins
A deep dive into MySQL’s InnoDB optimizer statistics and how sampling inaccuracies can lead to poor index selection, causing dramatic performance differences between similar queries. Learn practical solutions to prevent slow SQL queries caused by optimizer misjudgments.
A Peculiar Bug Hunt: When Exceptions Lose Their Voice
·1195 words·6 mins
A deep dive into a mysterious production issue where exception logs mysteriously disappeared, leading us through Arthas debugging, Log4j2 internals, and the discovery that an exception’s getMessage() method was itself throwing exceptions due to Guava-Guice version incompatibility.
A Hidden Production Issue Discovered Through SQL Optimization
·1101 words·6 mins
When our operations team brought us a complex SQL query that was taking forever to execute, we thought it was just a performance issue. Little did we know, this investigation would uncover a deeply hidden character encoding mismatch that had been silently causing full table scans in our production database.