MDEV-38936 Proactive handling of InnoDB tablespace full condition#4721
MDEV-38936 Proactive handling of InnoDB tablespace full condition#4721FarihaIS wants to merge 1 commit into
Conversation
|
Feature request, allow additional use cases (example, tablespace becomes larger than expected):
|
64ab2ed to
5bd3d38
Compare
|
@mikegriffin I have just pushed some new changes. Could you please take a look and confirm whether the new implementation addresses the additional use cases you mentioned above? Thank you! |
iMineLink
left a comment
There was a problem hiding this comment.
Thanks for your contribution!
I left a few comments on the feature.
Since it's only adding logs and not solving actual bugs related to excessive InnoDB tablespace size (like the recently discovered MDEV-38898), please also wait for @dr-m comments.
Nevertheless, it's fair to say that the feature, when disabled, seems to have a small runtime cost (checking a variable in an ATTRIBUTE_COLD function, new members of fil_space_t, whose footprint may be further reduced by reordering to avoid padding, or eliminated by storing only high 32 bits of threshold + reorder).
5a0d8ee to
7c0e2a0
Compare
|
@iMineLink Thank you for the detailed review! I have addressed all your comments and updated the PR description to reflect the latest version of the feature. Please let me know if I have missed anything, thank you. I will wait for @dr-m's review in the meantime. |
iMineLink
left a comment
There was a problem hiding this comment.
Thanks for addressing the previous review points. I have just a couple more points, then it's good for me.
|
@iMineLink thank you for the suggestions again, I've addressed all the new comments as well! Please let me know if you have any other thoughts while we wait for @dr-m's review. |
dr-m
left a comment
There was a problem hiding this comment.
What is this good for? Do you have an example of already implemented external monitoring that would react when some warning messages appear in the server error log?
Could we have something that would better integrate with event handlers and other existing mechanisms?
gkodinov
left a comment
There was a problem hiding this comment.
This is a preliminary review. LGTM. Please keep working with Marko on his review.
59cedaa to
1f4e0c0
Compare
|
@dr-m thank you for your feedback. I have addressed the two code changes you requested above now. Please let me know if these changes look okay or if they need further modification. As for the questions you asked above,
These warnings would be helpful for external monitoring tools, for example, AWS RDS, which monitors the error log for operational alerts. This follows the same pattern as existing InnoDB warnings (undo truncation, system tablespace full, etc.).
Could you please help guide me to the kind of integration you're looking for? I'm not entirely sure what the new approach would look like, but I'm happy to make the changes once I have a clearer understanding. |
I think @dr-m is after best practices in log message in general and tooling integration. So perhaps MDEV-27147 JSON Error log to STDERR/STDOUT as an option, and perhaps - https://opentelemetry.io/docs/specs/otel/logs/data-model/#events |
| "Threshold in bytes for tablespace size warnings (0 = disabled)", | ||
| NULL, NULL, | ||
| 17592186044416ULL, /* Default setting */ |
There was a problem hiding this comment.
Can this remain disabled by default?
There was a problem hiding this comment.
Sure, I've disabled it by default.
@grooverdan Thanks for the pointers! Since this uses |
|
@dr-m thank you for the detailed feedback! I've addressed/responded to all your comments above - could you please take a look and see if there are any other changes needed? |
dr-m
left a comment
There was a problem hiding this comment.
Sorry for the delay. I think that as part of this, we must pay back some maintenance debt of the fil_space_extend() function.
| SET @old_threshold = @@global.innodb_tablespace_size_warning_threshold; | ||
| SET @old_pct = @@global.innodb_tablespace_size_warning_pct; | ||
| # Test system variables | ||
| SHOW VARIABLES LIKE 'innodb_tablespace_size_warning_threshold'; | ||
| Variable_name Value | ||
| innodb_tablespace_size_warning_threshold 0 | ||
| SHOW VARIABLES LIKE 'innodb_tablespace_size_warning_pct'; | ||
| Variable_name Value | ||
| innodb_tablespace_size_warning_pct 85 |
There was a problem hiding this comment.
There is no point in saving and restoring the old values if the test fails when run with non-default values:
mysql-test/mtr --mysqld=--innodb-tablespace-size-warning-threshold=4 --mysqld=--innodb-tablespace-size-warning-pct=42 innodb.tablespace_size_warninginnodb.tablespace_size_warning [ fail ]
Test ended at 2026-06-05 12:33:20
CURRENT_TEST: innodb.tablespace_size_warning
--- /mariadb/main/mysql-test/suite/innodb/r/tablespace_size_warning.result 2026-06-05 12:27:30.660602135 +0300
+++ /mariadb/main/mysql-test/suite/innodb/r/tablespace_size_warning.reject 2026-06-05 12:33:20.040125327 +0300
@@ -6,10 +6,10 @@
# Test system variables
SHOW VARIABLES LIKE 'innodb_tablespace_size_warning_threshold';
Variable_name Value
-innodb_tablespace_size_warning_threshold 0
+innodb_tablespace_size_warning_threshold 4
SHOW VARIABLES LIKE 'innodb_tablespace_size_warning_pct';
Variable_name Value
-innodb_tablespace_size_warning_pct 85
+innodb_tablespace_size_warning_pct 42
# Test basic warning emission
SET GLOBAL innodb_tablespace_size_warning_threshold = 10485760;
SET GLOBAL innodb_tablespace_size_warning_pct = 70;
Result content mismatchI don’t think there is a way to check the built-in default values in the regression test suite.
There was a problem hiding this comment.
Makes sense, I removed the SHOW VARIABLES checks and save/restore logic. The test now sets explicit values before each test block and resets to defaults at the end.
| --disable_query_log | ||
| let $i = 10; | ||
| while ($i) { | ||
| eval INSERT INTO t1 (data) VALUES (REPEAT('a', 1024*1024)); | ||
| dec $i; | ||
| } | ||
| --enable_query_log |
There was a problem hiding this comment.
This can be written in a single line:
INSERT INTO t1(data) SELECT REPEAT('a',1024*1024) FROM seq_1_to_10;For this to work, we will need the following at the start of the test file:
--source include/have_sequence.inc
There was a problem hiding this comment.
Thank you for the pointer, I've replaced the affected lines with your suggested rewrite above!
| --enable_query_log | ||
|
|
||
| let SEARCH_FILE=$MYSQLTEST_VARDIR/log/mysqld.1.err; | ||
| let SEARCH_PATTERN=Tablespace 'test/t1' size [^\n]* bytes reached [^\n]*% of configured threshold; |
There was a problem hiding this comment.
A more appropriate pattern for matching a string of digits would be \d+.
We seem to issue exact numbers. Therefore, I would look for exact messages, instead of filtering out the numbers.
There was a problem hiding this comment.
Makes sense, I've updated the test with exact numbers as requested above, thank you.
| /** Threshold value used for the last warning */ | ||
| ulonglong m_last_warning_threshold{0}; |
There was a problem hiding this comment.
Do we really have to allocate 64 bits for this? The files should grow by extents of FSP_EXTENT_SIZE, which is 1MiB, or 64 pages, whichever is greater (2MiB or 4MiB for the two largest innodb_page_size). At least 20 of the least significant bits would be constantly 0 in a byte counter.
Could we use a uint32_t counter of pages here? After all, a page is the smallest unit that we work with.
There was a problem hiding this comment.
Good point, changed to uint32_t page counter!
| const ulonglong threshold= fil_system.tablespace_size_warning_threshold; | ||
| const uint warning_pct= fil_system.tablespace_size_warning_pct; |
There was a problem hiding this comment.
Inside fil_space_extend(), which we are calling before entering here, we are acquiring and releasing fil_system.mutex. Hence, we should be able to read these fields from fil_system as normal data members, not Atomic_relaxed. Can you refactor the logic? I think that fil_space_extend would best be replaced with a member function fil_space_t::extend(uint32_t, mtr_t *mtr), which would include this warning logic. Each caller is going to assign size_in_header and invoke mtr->write<4,mtr_t::FORCED>. Therefore, that logic can be part of the replacement function fil_space_t::extend() itself.
There was a problem hiding this comment.
I see, I refactored fil_space_extend() usage into a new fil_space_t::extend(uint32_t, buf_block_t*, mtr_t*) member function that handles the file extension, size_in_header update, mtr write, and size warning check. Both callers now use space->extend(), and I also removed Atomic_relaxed since fil_system.mutex protects access.
InnoDB write failures occur when tablespace files exceed filesystem size
limits. Current behavior logs errors but continues accepting
transactions, causing repeated failures and potential data integrity
issues.
Add proactive monitoring by emitting warnings when InnoDB tablespaces
approach a configurable size threshold.
Key features:
- Two new system variables:
* innodb_tablespace_size_warning_threshold (default 0, disabled):
Maximum tablespace size in bytes before warnings begin
* innodb_tablespace_size_warning_pct (default 85%): Percentage of
threshold at which to start emitting warnings
- Warning frequency:
* Below warning_pct: No warnings
* At or above warning_pct: Every 1% increase (85%, 86%, 87%, etc.)
- Per-tablespace tracking with automatic reset on TRUNCATE/DROP or
threshold/percentage changes
- Zero overhead when threshold is 0
- Progressive warnings capped at 100%
Implementation adds fil_space_t::extend() which consolidates file
extension, size_in_header update, and size warning checks.
Per-tablespace warning state is tracked in fil_space_t
(m_last_size_warning_pct, m_last_warning_threshold, m_last_warning_pct).
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
|
@dr-m thank you for the detailed feedback again! I've addressed all your comments above - could you please take a look and see if there are any other changes needed? |
Description
InnoDB write failures occur when tablespace files exceed filesystem size limits (e.g. 16TB on ext4, 2TB on ext3 - varies by filesystem). Current behavior logs errors but continues accepting transactions, causing repeated failures, user disruption, and potential data integrity issues.
Add proactive monitoring by emitting warnings when InnoDB tablespaces approach a configurable size threshold.
Key features:
innodb_tablespace_size_warning_threshold(default 0, disabled): Maximum tablespace size in bytes before warnings begininnodb_tablespace_size_warning_pct(default 85%): Percentage of threshold at which to start emitting warningswarning_pct: No warningswarning_pct: Every 1% increase (85%, 86%, 87%, etc.)TRUNCATE/DROPor threshold/percentage changesImplementation adds
fil_space_t::extend()which consolidates file extension,size_in_headerupdate, and size warning checks. Per-tablespace warning state is tracked infil_space_t(m_last_size_warning_pct,m_last_warning_threshold,m_last_warning_pct).Release Notes
Added proactive InnoDB tablespace size monitoring to prevent filesystem size limit failures. Two new system variables enable configurable warning thresholds with incremental warning frequency:
innodb_tablespace_size_warning_threshold(default 0, disabled): Maximum size before warningsinnodb_tablespace_size_warning_pct(default 85%): When to start warningsWarning frequency:
How can this PR be tested?
Execute the
innodb.tablespace_size_warningtest in mysql-test-run. This commit adds a test in the innodb suite.The test validates:
TRUNCATE TABLEresets warning stateExpected warning behavior in error log:
Below
innodb_tablespace_size_warning_pct(default 85%): No warningsAt or above
innodb_tablespace_size_warning_pct: Every 1% increaseExample:
[Warning] InnoDB: Tablespace 'test/t1' size 7340032 bytes reached 70% of configured threshold of 10485760 bytesBasing the PR against the correct MariaDB version
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.