PostgreSQL vs MySQL: Finding the Right Fit for Your OLTP Workloads#
When choosing between PostgreSQL and MySQL for OLTP scenarios (assuming default transaction engines and encoding/compression settings), the decision often comes down to understanding their fundamental architectural differences. Let me walk you through the key distinctions that really matter in practice.
The Tale of Two Index Architectures#
The most significant difference lies in how these databases handle secondary indexes, and trust me, this impacts performance more than you might expect.
MySQL’s Clever Approach: Secondary index leaf nodes store primary key values (thanks to LiZN from the monstaxl WeChat account for the clarification). This design choice is actually quite brilliant for write-heavy workloads.
PostgreSQL’s Direct Method: Secondary index leaf nodes directly point to record locations, just like primary indexes. While this seems straightforward, it creates an interesting challenge.
Here’s where it gets fascinating: when PostgreSQL updates a row, especially with MVCC and variable-length fields, the row location often changes. This means all secondary indexes need updating. MySQL sidesteps this entirely with its primary key reference approach.
Now, PostgreSQL does have a neat optimization called Heap-Only Tuples (HOT) that can avoid index updates when only non-indexed fields change (assuming sufficient buffer pool). However, in real-world OLTP scenarios, HOT hit rates aren’t as impressive as we’d hope.
The Bottom Line: MySQL typically outperforms PostgreSQL for tables with secondary indexes under high-concurrency updates, especially when dealing with variable-length field changes (like expanding varchar fields) and heavy insert workloads.
The Trade-off: MySQL’s secondary index reads require an additional primary key lookup, making them slower than PostgreSQL’s direct access approach. It’s all about choosing your battles!
MVCC: Two Philosophies, Different Outcomes#
The second major difference lies in their Multi-Version Concurrency Control implementations.
PostgreSQL’s xmin/xmax Dance: PostgreSQL uses an elegant xmin/xmax mechanism where:
- Updates create new row versions with xmin set to the current transaction ID, while marking the old version’s xmax
- Deletes simply set the row’s xmax to the current transaction ID
MySQL’s Undo Log Strategy: MySQL relies on row locks and undo logs, with each record containing hidden columns for transaction ID (trx_id) and rollback pointer (roll_pointer). InnoDB uses these to locate the correct row version for each transaction.
PostgreSQL’s Reading Advantage: Old versions remain directly accessible, and reads never block updates on the same row. It’s beautiful for read-heavy scenarios.
The Dark Side: Frequent updates cause rapid table bloat. Vacuum operations sometimes can’t keep pace with high-speed writes, and autovacuum can be problematic in certain scenarios, leading to dead tuple accumulation that eventually makes queries extremely I/O expensive. Manual DBA intervention becomes necessary. Insert performance also suffers from this multi-versioning overhead.
MySQL’s Writing Strength: Only the actively read/written rows get locked, allowing higher concurrent writes without interference.
When the Rubber Meets the Road#
Both databases face challenges with high-frequency updates on large tables, but the breaking points differ. We’re not talking about typical order or transaction tables here, but rather scenarios like user balance tables with constant updates.
PostgreSQL’s xmin/xmax MVCC design leads to faster table bloat compared to MySQL’s Oracle-like redo log approach. This means MySQL generally handles larger datasets before requiring sharding.
Interestingly, PostgreSQL explored the zheap project around 2020 to adopt redo log mechanisms, but development seems to have stalled. You can still find traces at: https://wiki.postgresql.org/wiki/Zheap
The Practical Verdict#
For pure OLTP workloads, MySQL often proves more suitable. Modern cloud providers have largely solved the read performance gap with incredibly low-latency read replicas. Take Aurora, for instance – you can attach up to 12 read instances to a single write instance, with latency typically around 10ms (occasionally spiking to 300ms during peak traffic).
PostgreSQL shines in its rich ecosystem and OLAP capabilities. Many analytical databases actually use PostgreSQL’s wire protocol, and PostgreSQL’s development trajectory increasingly focuses on OLAP ecosystem enhancements.
Real-World Validation#
Uber’s 2015 migration from sharded PostgreSQL to sharded MySQL for their OLTP workloads provides compelling real-world evidence. Their detailed analysis is worth reading: https://www.uber.com/en-HK/blog/postgres-to-mysql-migration/
The choice ultimately depends on your specific workload characteristics, but understanding these fundamental differences helps make informed decisions rather than following trends or assumptions.