MySQL 5.7 的现状
MySQL 5.7 在可伸縮性方面並不出色。以下圖表展示了在特定配置下,MySQL 5.7.39 中的 TPC-C 吞吐量與并发性之間的關係。此包括將交易隔離等級設定為「Read Committed」並調整 `innodb_spin_wait_delay` 參數以減輕吞吐量下降。
圖1:BenchmarkSQL 測試中 MySQL 5.7.39 的可伸縮性问题
從圖表中可以看出,可伸縮性问题极大地限制了 MySQL 吞吐量的增加。例如,在 100 个并发之后,吞吐量開始下降。
為了解決上述性能崩潰問題,使用了 Percona 的線程池。以下圖表展示了在配置 Percona 線程池後,TPC-C 吞吐量與并发性之間的關係。
圖2:Percona 線程池減輕了 MySQL 5.7.39 的可伸縮性问题
雖然線程池引入了一些開銷,並且峰值性能有所下降,但它已減輕了高并发下的性能崩潰問題。
MySQL 8.0 的现状
讓我們看看 MySQL 8.0 在可伸縮性方面所做的努力。
重做日志優化
首個主要改進是重做日志優化 [3]。
commit 6be2fa0bdbbadc52cc8478b52b69db02b0eaff40
Author: Paweł Olchawa <[email protected]>
Date: Wed Feb 14 09:33:42 2018 +0100
WL#10310 Redo log optimization: dedicated threads and concurrent log buffer.
0. Log buffer became a ring buffer, data inside is no longer shifted.
1. User threads are able to write concurrently to log buffer.
2. Relaxed order of dirty pages in flush lists - no need to synchronize
the order in which dirty pages are added to flush lists.
3. Concurrent MTR commits can interleave on different stages of commits.
4. Introduced dedicated log threads which keep writing log buffer:
* log_writer: writes log buffer to system buffers,
* log_flusher: flushes system buffers to disk.
As soon as they finished writing (flushing) and there is new data to
write (flush), they start next write (flush).
5. User threads no longer write / flush log buffer to disk, they only
wait by spinning or on event for notification. They do not have to
compete for the responsibility of writing / flushing.
6. Introduced a ring buffer of events (one per log-block) which are used
by user threads to wait for written/flushed redo log to avoid:
* contention on single event
* false wake-ups of all waiting threads whenever some write/flush
has finished (we can wake-up only those waiting in related blocks)
7. Introduced dedicated notifier threads not to delay next writes/fsyncs:
* log_write_notifier: notifies user threads about written redo,
* log_flush_notifier: notifies user threads about flushed redo.
8. Master thread no longer has to flush log buffer.
...
30. Mysql test runner received a new feature (thanks to Marcin):
--exec_in_background.
Review: RB#15134
Reviewers:
- Marcin Babij <[email protected]>,
- Debarun Banerjee <[email protected]>.
Performance tests:
- Dimitri Kravtchuk <[email protected]>,
- Daniel Blanchard <[email protected]>,
- Amrendra Kumar <[email protected]>.
QA and MTR tests:
- Vinay Fisrekar <[email protected]>.
進行了一項測試,比較了在優化前後,不同并发等级的 TPC-C 吞吐量。具體細節如下圖所示:
圖3:不同並發等級下重做 日志优化的影響
圖表结果显示,在低並發等級下,吞吐量有了显著的提升。
通過鎖分片優化 Lock-Sys
第二大的改进是 Lock-Sys 的優化 [5]。
commit 1d259b87a63defa814e19a7534380cb43ee23c48
Author: Jakub Łopuszański <[email protected]>
Date: Wed Feb 5 14:12:22 2020 +0100
WL#10314 - InnoDB: Lock-sys optimization: sharded lock_sys mutex
The Lock-sys orchestrates access to tables and rows. Each table, and each row,
can be thought of as a resource, and a transaction may request access right for
a resource. As two transactions operating on a single resource can lead to
problems if the two operations conflict with each other, Lock-sys remembers
lists of already GRANTED lock requests and checks new requests for conflicts in
which case they have to start WAITING for their turn.
Lock-sys stores both GRANTED and WAITING lock requests in lists known as queues.
To allow concurrent operations on these queues, we need a mechanism to latch
these queues in safe and quick fashion.
In the past a single latch protected access to all of these queues.
This scaled poorly, and the managment of queues become a bottleneck.
In this WL, we introduce a more granular approach to latching.
Reviewed-by: Pawel Olchawa <[email protected]>
Reviewed-by: Debarun Banerjee <[email protected]>
RB:23836
基於優化 Lock-Sys 前後程序,使用 BenchmarkSQL 對 TPC-C 吞吐量與並發進行比較,具體結果如下圖所示:
圖4:不同並發等級下 Lock-Sys 優化的影響
從圖中可以看出,在高等級並發下,優化 Lock-Sys 對於吞吐量有著顯著的提升,而在低並發下,因為冲突較少,效果較不明顯。
為 trx-sys 實現鎖分片
第三大的改進是為 trx-sys 實現鎖分片。
commit bc95476c0156070fd5cedcfd354fa68ce3c95bdb
Author: Paweł Olchawa <[email protected]>
Date: Tue May 25 18:12:20 2021 +0200
BUG#32832196 SINGLE RW_TRX_SET LEADS TO CONTENTION ON TRX_SYS MUTEX
1. Introduced shards, each with rw_trx_set and dedicated mutex.
2. Extracted modifications to rw_trx_set outside its original critical sections
(removal had to be extracted outside trx_erase_lists).
3. Eliminated allocation on heap inside TrxUndoRsegs.
4. [BUG-FIX] The trx->state and trx->start_time became converted to std::atomic<>
fields to avoid risk of torn reads on egzotic platforms.
5. Added assertions which ensure that thread operating on transaction has rights
to do so (to show there is no possible race condition).
RB: 26314
Reviewed-by: Jakub Łopuszański [email protected]
基於這些優化前後,使用 BenchmarkSQL 對 TPC-C 吞吐量與並發進行比較,具體結果如下圖所示:
圖5:不同並發等級下 trx-sys 鎖分片的影響
從圖中可以看出,这一改进显著提升了 TPC-C 吞吐量,在 200 並發時达到高峰。值得注意的是,在 300 並發時,影響度有所下降,主要因為與 MVCC ReadView 相關的 trx-sys 子系統持續存在的可伸縮性问题。
優化 MySQL 8.0
其餘的改進為我們自立的研究。
MVCC 讀取视线
首批主要改進是 MVCC 讀取视线資料結構的 enhancements [1].
進行了性能對比測試以評估 MVCC 讀取视线優化的有效性。下方圖表展示了在修改 MVCC 讀取视线資料結構前後,不同並發等級下的 TPC-C 吞吐量對比。
圖 6: 修改 NUMA 中新型混合資料結構後的性能對比
從圖中可以看出,此次變革主要優化了可伸縮性,並在 NUMA 環境中提升了 MySQL 的峰值吞吐量。
解決雙重闩鎖問題
我們所做的第二項主要改進是解決「雙重闩鎖」問題,這裡的「雙重闩鎖」指的是 view_open
和 view_close
同時需要全局 trx-sys 闩鎖 [1]。
使用 MVCC 讀取视线優化版本,比較修改前後的 TPC-C 吞吐量。具體細節顯示在以下圖表中:
圖 7: 解決雙重闩鎖瓶頸後的性能提升
從圖中可以看出,修改後在高度並發條件下显著提高了可伸縮性。
交易節流機制
最後的改進是實現了一個交易節流機制,以防極端並發下性能 breakdown [1] [2] [4]。
以下圖表描繪了在應用交易限制後進行的TPC-C可伸縮性壓力測試。該測試在NUMA BIOS禁用 scenario下進行,將最多512個用戶線程的限制進入交易系統。
圖9:在BenchmarkSQL中應用交易限制機制的最大TPC-C吞吐量
從圖表中可以看出,應用交易限制機制顯著提高了MySQL的可伸縮性。
總結
總的來說,在BenchmarkSQL TPC-C測試的低衝突情景中,MySQL能夠在數十萬個同時連接下保持性能而不崩潰是完全可行的。
參考文献
- 王斌 (2024)。軟件工程問題解決之道:如何讓MySQL更好。
- 新的MySQL線程池
- Paweł Olchawa。2018。MySQL 8.0:新的無鎖、可伸縮的WAL設計。MySQL部落格檔案。
- Yu Xiangyao。一個在一千核上對並發控制的精確評估。博士論文,馬薩诸塞理工學院,2015年。
- MySQL 8.0參考手冊
Source:
https://dzone.com/articles/mysql-scalability-improvement-for-benchmarksql