-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-38254 refixing MDEV-37686 #4472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nch" This reverts commit 3241798. Due to MDEV-38212.
Instead of `NULL`ifying of this member a finer approach is taken to
screen the replicated temporary tables from the slave applier
when it runs within `start_new_trans` context.
For that *all* access critical `THD::{has,lock,unlock}_temporary_tables`
functions are made to check possible `start_new_trans` condition.
This measure forces `start_new_trans` executing slave thread to always take
the normal thread access path to a temporary table repository.
This commit also unifies with sql/temporary_tables.cc checking of the
current thread is slave with usage of THD::rgi_slave instead of
THD::slave_thread (last time chosen by MDEV-33426).
Tested with rpl.create_or_replace_mix2 of GTT feature branch.
|
|
bnestere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andrelkin,
I agree the patch is more concise and has a reduced scope, which is nice for a workaround. I don't think we need to decide if we want to put it into 12.3 right now though. Perhaps we can put this w/a just into the GTT branch for now, so QA can continue, and then revert MDEV-37686.
And generally to the w/a patch itself, I'd probably use MDEV-38212 as the "owner"; and then throughout the patch/commit, mention TODOs for MDEV-38254 to remove this w/a once a comprehensive fix is done.
|
It's a bit suspicious that innodb.log_file_size_online and main.lock_sync fail. Maybe a false-positive, of course. Anyway, the patch works fine for me. Let's keep it in gtt and monitor the QA results. |
|
I'm sorry Andrei, but I have to say this patch is un-reviewable. The problem is that the description is incomprehensible. What is the actual problem to be addressed? How does the And most importantly, have you completely understood all the code you are modifying and all the ways it interacts with other parts of the server? Brandon, if you have understood the issue in full, maybe you can help write up a good description? (And if you have not, you should not approve the patch...) |
|
Howdy Kristian!
# knielsen left a comment (MariaDB/server#4472)
I'm sorry Andrei, but I have to say this patch is un-reviewable. The problem is that the
description is incomprehensible.
The description might be indeed too short. It just assumed, apparently
too much, that the reviewer was familiar with that unfortunate MDEV-37686
What is the actual problem to be addressed?
So it is MDEV-37686. It was extensively analyzed on the MDEV itself,
by Sergei.
The issue appeared as an incorrect attempt to close a temporary table
by transaction that must not see it at all.
The original commit message of MDEV-37686 summed it up this way
MDEV-37686 rpl.create_or_replace_mix2 fails in MDEV-35915 branch
The test failed as the Jira ticket describes because of insufficient
main transaction isolation from an out-of-band dynamic "new" transaction.
Specifically the replicated transaction's temporary tables became
accessible to the "new" transaction.
How does the start_new_trans stuff work?
`start_new_trans` is a kind of a hook that creates a temporary transaction
context to execute auxiliary work not related to one of the main
transaction, like clearing statistics for a table that is re-created.
It must be isolated from the main transaction (not modify at least its
context), and that applies to the slave side as well.
Why does
start_new_trans have to do anything special for slave threads?
The slave side exposed `THD::rgi_slave->rli->save_temporary` table
repository to `start_new_trans`. So it thought of itself as slave
applier and regarded therefore the empty `THD::temporary_tables` (which
`start_new_trans()` ctor take care to clean) as natural state
for the slave applier that finds in the repo, stores a temporary table
pointer into THD::temporary_tables and in the end closes the temporary table.
Why does this turn up now, with
global temporary tables?
Let me offer for that to read Serg's analysis for MDEV-37686.
And most importantly, have you completely understood all the code you are modifying and all the
ways it interacts with other parts of the server?
I believe I did. From the start_new_trans point of view there most be
nothing wrong. Ideally it might have been design with own `THD` (I read
`start_new_trans::orig_thd` as a hint to that), and we would not have had MDEV-37686.
So all effects are effectively to the slave side.
The main part is that the slave temporary table repository is protected now to not let
`start_new_trans` access it. And if it needs own temporary tables it has
to use a normal "master" access path which remains intact.
Brandon, if you have understood the issue in full, maybe you can help write up a good
description? (And if you have not, you should not approve the patch...)
I hope I clarified, if not we'll see each other in about 24 hours and
I'll share everything what I am able to.
Cheers,
Andrei
PS. strangely `markdown` is not processed when I replied via email.
|
bnestere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, after discussing the issue in-person.
Description
MDEV-38254 re-fixing of MDEV-37686.
(this PR also brings the MDEV-38212 revert commit as a base for the actual patch)
Instead of
NULLifying of this member a finer approach is taken toscreen the replicated temporary tables from the slave applier
when it runs within
start_new_transcontext.For that all access critical
THD::{has,lock,unlock}_temporary_tablesfunctions are made to check possible
start_new_transcondition.This measure forces
start_new_transexecuting slave thread to always takethe normal thread access path to a temporary table repository.
This commit also unifies within
sql/temporary_tables.ccchecking of thecurrent thread is
slavewith usage ofTHD::rgi_slave insteadofTHD::slave_thread(last time chosen by MDEV-33426).Tested with
rpl.create_or_replace_mix2ofGTTfeature branch.While it's rather fair to qualify this patch as a work-around (of limited isolation of
start_new_transfeature)its direction is the best we can do practically at the moment for providing isolation.
Checking the small patch may dismiss or soften doubts Jira.
The patch adjusts MDEV-33426 fixes, cosmetically.
Other approaches have been discussed and the tentative winner
between myself and Brandon is an asynchronous
start_new_trans(see MDEV-38254 description too).Release Notes
TODO: What should the release notes say about this change?
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.
How can this PR be tested?
TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended.
Consult the documentation on "Writing good test cases".
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
mainbranch.PR quality check