feat: advanced join semantics, transactional integrity, and expanded SLT suite#13
feat: advanced join semantics, transactional integrity, and expanded SLT suite#13
Conversation
📝 WalkthroughWalkthroughThis PR adds index creation support with B-tree indexing, extends transaction rollback semantics for DELETE/UPDATE operations via undo logs, introduces FULL and RIGHT join parsing, changes LIMIT/OFFSET to signed integers for unlimited support, and adds comprehensive test coverage for indexes, transactions, and limit/offset behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant QueryExecutor
participant Catalog
participant Storage as FileStorage
participant HeapTable
Client->>QueryExecutor: execute(CreateIndexStatement)
activate QueryExecutor
QueryExecutor->>Catalog: validate table exists
Catalog-->>QueryExecutor: table metadata
QueryExecutor->>Catalog: resolve column positions/types
Catalog-->>QueryExecutor: column info
QueryExecutor->>Catalog: add index entry
Catalog-->>QueryExecutor: index registered
QueryExecutor->>Storage: create B-tree index file
alt File Creation Success
Storage-->>QueryExecutor: file created
QueryExecutor->>HeapTable: scan for existing data
activate HeapTable
loop For each tuple
HeapTable-->>QueryExecutor: tuple
QueryExecutor->>Storage: insert into index
Storage-->>QueryExecutor: ack
end
deactivate HeapTable
QueryExecutor-->>Client: success
else File Creation Fails
Storage-->>QueryExecutor: error
QueryExecutor->>Catalog: remove index entry (rollback)
Catalog-->>QueryExecutor: ack
QueryExecutor-->>Client: error
end
deactivate QueryExecutor
sequenceDiagram
participant TransactionMgr as TransactionManager
participant UndoLog
participant HeapTable
participant BTreeIndex as IndexStructure
Note over TransactionMgr: ROLLBACK initiated
TransactionMgr->>UndoLog: iterate undo logs (reverse order)
activate UndoLog
loop For each undo log entry
alt Type == DELETE
UndoLog->>HeapTable: undo_remove(rid)
activate HeapTable
HeapTable->>HeapTable: reset xmax to 0
HeapTable-->>UndoLog: success
deactivate HeapTable
else Type == UPDATE
UndoLog->>HeapTable: physical_remove(rid)
activate HeapTable
HeapTable-->>UndoLog: removed new version
deactivate HeapTable
alt old_rid exists
UndoLog->>HeapTable: undo_remove(old_rid)
activate HeapTable
HeapTable->>HeapTable: restore old version
HeapTable-->>UndoLog: success
deactivate HeapTable
end
else Type == INSERT
UndoLog->>HeapTable: physical_remove(rid)
activate HeapTable
HeapTable-->>UndoLog: success
deactivate HeapTable
end
alt Index present
UndoLog->>BTreeIndex: update index (remove new, reinsert old if applicable)
BTreeIndex-->>UndoLog: ack
end
end
deactivate UndoLog
TransactionMgr-->>TransactionMgr: transaction rolled back
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
README.md (1)
48-53:⚠️ Potential issue | 🟡 MinorFix the
run_test.shpath in this build snippet.After
cd build,./tests/run_test.shis no longer relative to the repo root, so the command shown here will fail. Use../tests/run_test.shfrom this directory, or keep the script invocation in the separate “Running Tests” section only.Doc fix
-make -j$(nproc) # Or ./tests/run_test.sh for automated multi-OS build +make -j$(nproc) # Or ../tests/run_test.sh for automated multi-OS build🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 48 - 53, The build snippet after "cd build" calls ./tests/run_test.sh which is incorrect because it's relative to the build directory; update the snippet to invoke ../tests/run_test.sh (or remove the script invocation here and keep it only in the "Running Tests" section) so run_test.sh is referenced from the repository root when executed from the build directory; ensure the README's build commands show either the corrected ../tests/run_test.sh path or omit the script call altogether.src/executor/query_executor.cpp (1)
491-495:⚠️ Potential issue | 🔴 CriticalDon't couple delete-side index cleanup to WAL.
old_tupleis only loaded whenlog_manager_is present, but Line 499's index-removal block also depends on it. With WAL disabled, the heap delete succeeds and each secondary index keeps a dangling entry for the deleted row.Local fix
- /* Retrieve old tuple for logging */ - Tuple old_tuple; - if (log_manager_ != nullptr && txn != nullptr) { - static_cast<void>(table.get(rid, old_tuple)); - } + /* Retrieve old tuple for logging/index maintenance */ + Tuple old_tuple; + if ((!table_meta->indexes.empty() || (log_manager_ != nullptr && txn != nullptr)) && + !table.get(rid, old_tuple)) { + result.set_error("Failed to read tuple before delete"); + return result; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/executor/query_executor.cpp` around lines 491 - 495, The code only loads old_tuple when log_manager_ is set, but the index-removal code later uses old_tuple regardless, leaving dangling secondary-index entries when WAL is disabled; make the tuple fetch unconditional (remove the log_manager_ check) or otherwise ensure the index-removal path always has a valid old_tuple by calling table.get(rid, old_tuple) whenever txn != nullptr (or before running the index-removal block). Update the logic around old_tuple, log_manager_, txn, and the index-removal block so index cleanup does not depend on WAL being enabled.
🧹 Nitpick comments (2)
tests/logic/joins.slt (1)
42-59: Assert the unmatched right-side tuple identity, not just its amount.These RIGHT/FULL cases only project
users_j.nameandorders_j.amount, so an implementation bug could still pass while returning the wrong right-side row. Includingorders_j.idororders_j.user_idwould make the new outer-join coverage much tighter.Suggested assertion upgrade
-query TR -SELECT users_j.name, orders_j.amount FROM users_j RIGHT JOIN orders_j ON users_j.id = orders_j.user_id ORDER BY orders_j.amount; +query TIR +SELECT users_j.name, orders_j.id, orders_j.amount +FROM users_j RIGHT JOIN orders_j ON users_j.id = orders_j.user_id +ORDER BY orders_j.amount; ---- -Alice 25.0 -Alice 50.0 -NULL 75.0 -Bob 100.0 +Alice 102 25.0 +Alice 101 50.0 +NULL 104 75.0 +Bob 103 100.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/logic/joins.slt` around lines 42 - 59, The RIGHT and FULL join test cases only project users_j.name and orders_j.amount which allows wrong right-side rows to slip by; update the two queries (the RIGHT JOIN and the FULL JOIN in tests/logic/joins.slt) to also select a unique identifier from the right table (e.g., orders_j.id or orders_j.user_id) and update the expected result rows to include that identifier so the assertions verify the exact unmatched right-side tuple identity as well as the amount.tests/logic/indexes.slt (1)
32-39: Strengthen theDROP INDEXassertion.The follow-up
SELECTonly proves the table is still queryable; it would also pass ifDROP INDEXsilently leftidx_idin place. Recreatingidx_idafter Line 34, or asserting duplicate-name failure before the drop, would verify that the index entry was actually removed.Suggested SLT tightening
# 5. Drop index and verify it still works statement ok DROP INDEX idx_id; +statement ok +CREATE INDEX idx_id ON idx_test (id); + +statement ok +DROP INDEX idx_id; + query IT SELECT id, val FROM idx_test WHERE id = 1; ---- 1 one🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/logic/indexes.slt` around lines 32 - 39, The DROP INDEX test is weak because the subsequent SELECT only shows the table still works and doesn't prove idx_id was removed; update the test to verify the index was actually dropped by either attempting to CREATE INDEX idx_id ... after the DROP and asserting that creation succeeds (proving the name was freed), or by attempting a duplicate CREATE INDEX idx_id ... before the DROP and asserting it fails, then DROP idx_id and assert a subsequent CREATE succeeds. Reference the index name idx_id and the table/query SELECT id, val FROM idx_test WHERE id = 1 when adding the extra assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/transaction/transaction.hpp`:
- Around line 123-126: The add_undo_log API currently allows creating an UPDATE
undo log without old_rid; change the API to enforce that UPDATE must provide
old_rid by splitting into two overloads: add_undo_log(UndoLog::Type type, const
std::string& table_name, const storage::HeapTable::TupleId& rid) for non-UPDATE
types and add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid, const storage::HeapTable::TupleId& old_rid)
for UPDATE, and update both overloads to validate the invariant (e.g., assert or
throw if the caller uses the wrong overload: first must ensure type !=
UndoLog::Type::UPDATE, second should ensure type == UndoLog::Type::UPDATE)
before pushing into undo_logs_, so callers cannot omit old_rid for UPDATE and
rollback logic that checks old_rid remains correct.
In `@src/executor/query_executor.cpp`:
- Around line 317-330: The code ignores the boolean return value of
storage::BTreeIndex::insert/remove during the index backfill and DML maintenance
paths (e.g., in execute_create_index where the backfill loop calls index.insert,
and the delete/update paths around the other noted regions); modify those call
sites to check the returned bool and abort the statement (propagate an
error/return failure) if any index write fails. Implement a small shared helper
(e.g., bool apply_index_write(BTreeIndex&, const common::Value& key, tuple_id_t
id, Operation op)) used by execute_create_index and the insert/delete/update
maintenance code to perform the insert/remove, log the failure with context, and
return false so the caller can fail the statement and avoid heap/index
divergence.
- Around line 282-303: The code currently allows multiple columns in
stmt.columns() but only uses column_positions[0] and overwrites key_type,
causing incorrect single-column index behavior; modify the handler in the
function that processes stmt.columns() (before building col_positions/key_type
and before calling catalog_.create_index) to reject composite indexes: if
stmt.columns().size() != 1 then call result.set_error with a clear message
(e.g., "Composite indexes not supported") and return result; otherwise proceed
and set col_positions and key_type from the single column only (ensuring
key_type is taken from that one column) before calling
catalog_.create_index(index_name(), table_meta->table_id, col_positions,
IndexType::BTree, stmt.unique()).
In `@src/network/server.cpp`:
- Line 338: The connection-scoped QueryExecutor (executor::QueryExecutor
exec(...)) lets an explicit transaction outlive statements and leaks if the
client disconnects; modify handle_connection() so that on socket
disconnect/connection teardown you detect any active transaction on exec and
explicitly abort/rollback it (e.g., call a new or existing method such as
QueryExecutor::rollbackActiveTransaction() or invoke TransactionManager's
abort/rollback using the executor's current txn id), or alternatively make exec
statement-scoped instead of connection-scoped; locate the instantiation
"executor::QueryExecutor exec(catalog_, bpm_, lock_manager_,
transaction_manager_)" and add cleanup logic on all disconnect/error paths to
call rollback on the active transaction (or add a QueryExecutor destructor that
performs this rollback and ensure it's invoked on drop).
In `@src/parser/parser.cpp`:
- Around line 136-145: The parser currently demands JOIN immediately after
consuming TokenType::Right or TokenType::Full so forms like "RIGHT OUTER JOIN"
fail; update the branches handling TokenType::Right and TokenType::Full in
parser.cpp (the code using consume(...) and setting join_type to
SelectStatement::JoinType::Right/Full) to optionally accept TokenType::Outer
before requiring TokenType::Join — i.e., after consuming Right/Full, call
consume(TokenType::Outer) if present (ignore its return if false) and then
ensure consume(TokenType::Join) succeeds, returning nullptr on failure as
before.
In `@src/transaction/transaction_manager.cpp`:
- Around line 180-188: The undo/rollback code is ignoring the bool results from
table.undo_remove(...) and table.physical_remove(...), so failures are
swallowed; capture each call’s boolean return and propagate failure instead of
discarding it (e.g., store the result of table.undo_remove(log.rid) and
table.physical_remove(log.rid), check for false, log/record the failure and
abort the rollback routine or return false to the caller). Update the handling
in the UndoLog::Type::DELETE and UndoLog::Type::UPDATE branches to check the
return values for table.undo_remove(log.rid), table.physical_remove(log.rid) and
table.undo_remove(log.old_rid.value()), and ensure a failed undo causes the
abort path to stop and report failure rather than continuing silently.
- Around line 183-188: The UPDATE undo branch (UndoLog::Type::UPDATE) removes
the new heap tuple and undeletes the old one but never updates indexes, leaving
stale index entries for log.rid; fix it by invoking the index cleanup/repair
APIs around the existing calls (e.g., call the index manager to remove index
entries for log.rid before/after table.physical_remove(log.rid)) and, if
log.old_rid.has_value(), re-insert or restore index entries for log.old_rid
(using the project's index insertion API) so indexes match the restored heap;
update the code around table.physical_remove and table.undo_remove to call those
index functions (use the project’s existing methods such as
remove_index_entries_for_rid and insert_index_entries_for_tuple or their
equivalents).
In `@tests/logic/limit_offset.slt`:
- Around line 47-52: The test uses "OFFSET 3 LIMIT 10" which doesn't validate
offset-before-limit because only two rows remain so LIMIT 10 is non-binding;
change the query in tests/logic/limit_offset.slt to use a LIMIT that is smaller
than the remaining rows (for example replace "OFFSET 3 LIMIT 10" with "OFFSET 3
LIMIT 1") and update the expected output from "4\n5" to just "4" so the test
will fail if OFFSET is applied after LIMIT.
---
Outside diff comments:
In `@README.md`:
- Around line 48-53: The build snippet after "cd build" calls
./tests/run_test.sh which is incorrect because it's relative to the build
directory; update the snippet to invoke ../tests/run_test.sh (or remove the
script invocation here and keep it only in the "Running Tests" section) so
run_test.sh is referenced from the repository root when executed from the build
directory; ensure the README's build commands show either the corrected
../tests/run_test.sh path or omit the script call altogether.
In `@src/executor/query_executor.cpp`:
- Around line 491-495: The code only loads old_tuple when log_manager_ is set,
but the index-removal code later uses old_tuple regardless, leaving dangling
secondary-index entries when WAL is disabled; make the tuple fetch unconditional
(remove the log_manager_ check) or otherwise ensure the index-removal path
always has a valid old_tuple by calling table.get(rid, old_tuple) whenever txn
!= nullptr (or before running the index-removal block). Update the logic around
old_tuple, log_manager_, txn, and the index-removal block so index cleanup does
not depend on WAL being enabled.
---
Nitpick comments:
In `@tests/logic/indexes.slt`:
- Around line 32-39: The DROP INDEX test is weak because the subsequent SELECT
only shows the table still works and doesn't prove idx_id was removed; update
the test to verify the index was actually dropped by either attempting to CREATE
INDEX idx_id ... after the DROP and asserting that creation succeeds (proving
the name was freed), or by attempting a duplicate CREATE INDEX idx_id ... before
the DROP and asserting it fails, then DROP idx_id and assert a subsequent CREATE
succeeds. Reference the index name idx_id and the table/query SELECT id, val
FROM idx_test WHERE id = 1 when adding the extra assertions.
In `@tests/logic/joins.slt`:
- Around line 42-59: The RIGHT and FULL join test cases only project
users_j.name and orders_j.amount which allows wrong right-side rows to slip by;
update the two queries (the RIGHT JOIN and the FULL JOIN in
tests/logic/joins.slt) to also select a unique identifier from the right table
(e.g., orders_j.id or orders_j.user_id) and update the expected result rows to
include that identifier so the assertions verify the exact unmatched right-side
tuple identity as well as the amount.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 30d19d69-b48e-42d4-a4bd-67780a2855f1
📒 Files selected for processing (19)
README.mddocs/phases/README.mdinclude/executor/operator.hppinclude/executor/query_executor.hppinclude/parser/statement.hppinclude/parser/token.hppinclude/storage/heap_table.hppinclude/transaction/transaction.hppsrc/executor/operator.cppsrc/executor/query_executor.cppsrc/network/server.cppsrc/parser/lexer.cppsrc/parser/parser.cppsrc/storage/heap_table.cppsrc/transaction/transaction_manager.cpptests/logic/indexes.slttests/logic/joins.slttests/logic/limit_offset.slttests/logic/transactions.slt
| void add_undo_log(UndoLog::Type type, const std::string& table_name, | ||
| const storage::HeapTable::TupleId& rid) { | ||
| undo_logs_.push_back({type, table_name, rid}); | ||
| const storage::HeapTable::TupleId& rid, | ||
| std::optional<storage::HeapTable::TupleId> old_rid = std::nullopt) { | ||
| undo_logs_.push_back({type, table_name, rid, old_rid}); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Make UPDATE undo logs impossible to record without old_rid.
src/transaction/transaction_manager.cpp only restores the old tuple version for UndoLog::Type::UPDATE when old_rid is present. With this signature, a caller can forget the fourth argument and rollback will silently remove the new tuple while leaving the old one hidden. This API should enforce that invariant instead of relying on every call site to remember it.
One way to enforce the invariant
void add_undo_log(UndoLog::Type type, const std::string& table_name,
const storage::HeapTable::TupleId& rid,
std::optional<storage::HeapTable::TupleId> old_rid = std::nullopt) {
+ if (type == UndoLog::Type::UPDATE && !old_rid.has_value()) {
+ throw std::invalid_argument("UPDATE undo log requires old_rid");
+ }
undo_logs_.push_back({type, table_name, rid, old_rid});
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/transaction/transaction.hpp` around lines 123 - 126, The add_undo_log
API currently allows creating an UPDATE undo log without old_rid; change the API
to enforce that UPDATE must provide old_rid by splitting into two overloads:
add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid) for non-UPDATE types and
add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid, const storage::HeapTable::TupleId& old_rid)
for UPDATE, and update both overloads to validate the invariant (e.g., assert or
throw if the caller uses the wrong overload: first must ensure type !=
UndoLog::Type::UPDATE, second should ensure type == UndoLog::Type::UPDATE)
before pushing into undo_logs_, so callers cannot omit old_rid for UPDATE and
rollback logic that checks old_rid remains correct.
| std::vector<uint16_t> col_positions; | ||
| common::ValueType key_type = common::ValueType::TYPE_NULL; | ||
|
|
||
| for (const auto& col_name : stmt.columns()) { | ||
| bool found = false; | ||
| for (const auto& col : table_meta->columns) { | ||
| if (col.name == col_name) { | ||
| col_positions.push_back(col.position); | ||
| key_type = col.type; | ||
| found = true; | ||
| break; | ||
| } | ||
| } | ||
| if (!found) { | ||
| result.set_error("Column not found: " + col_name); | ||
| return result; | ||
| } | ||
| } | ||
|
|
||
| /* Update Catalog */ | ||
| const oid_t index_id = catalog_.create_index(stmt.index_name(), table_meta->table_id, | ||
| col_positions, IndexType::BTree, stmt.unique()); |
There was a problem hiding this comment.
Reject composite indexes until composite keys are actually supported.
This accepts multiple columns, but later in this function and the DML maintenance below only column_positions[0] is ever used as the physical key, while key_type is overwritten on each iteration. CREATE INDEX ... (a, b) therefore records a multi-column definition but builds a single-column B-tree, potentially with the wrong key type.
Suggested guard
std::vector<uint16_t> col_positions;
common::ValueType key_type = common::ValueType::TYPE_NULL;
+
+ if (stmt.columns().size() != 1) {
+ result.set_error("Only single-column indexes are supported");
+ return result;
+ }
for (const auto& col_name : stmt.columns()) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/executor/query_executor.cpp` around lines 282 - 303, The code currently
allows multiple columns in stmt.columns() but only uses column_positions[0] and
overwrites key_type, causing incorrect single-column index behavior; modify the
handler in the function that processes stmt.columns() (before building
col_positions/key_type and before calling catalog_.create_index) to reject
composite indexes: if stmt.columns().size() != 1 then call result.set_error with
a clear message (e.g., "Composite indexes not supported") and return result;
otherwise proceed and set col_positions and key_type from the single column only
(ensuring key_type is taken from that one column) before calling
catalog_.create_index(index_name(), table_meta->table_id, col_positions,
IndexType::BTree, stmt.unique()).
| /* Populate Index with existing data */ | ||
| Schema schema; | ||
| for (const auto& col : table_meta->columns) { | ||
| schema.add_column(col.name, col.type); | ||
| } | ||
| storage::HeapTable table(stmt.table_name(), bpm_, schema); | ||
| auto iter = table.scan(); | ||
| storage::HeapTable::TupleMeta meta; | ||
| while (iter.next_meta(meta)) { | ||
| if (meta.xmax == 0) { | ||
| /* Extract key from tuple */ | ||
| if (!col_positions.empty()) { | ||
| const common::Value& key = meta.tuple.get(col_positions[0]); | ||
| index.insert(key, iter.current_id()); |
There was a problem hiding this comment.
Fail the statement when secondary-index writes fail.
Every storage::BTreeIndex::insert/remove here returns bool, but the result is ignored. A failed backfill or DML-side index update leaves the heap and index out of sync, and execute_create_index can still return success with a partially populated index. The insert/delete/update maintenance paths need the same guard, ideally behind a shared helper.
Example fix for the backfill path
- index.insert(key, iter.current_id());
+ if (!index.insert(key, iter.current_id())) {
+ static_cast<void>(index.drop());
+ static_cast<void>(catalog_.drop_index(index_id));
+ result.set_error("Failed to populate index");
+ return result;
+ }Also applies to: 399-407, 498-508, 583-612
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/executor/query_executor.cpp` around lines 317 - 330, The code ignores the
boolean return value of storage::BTreeIndex::insert/remove during the index
backfill and DML maintenance paths (e.g., in execute_create_index where the
backfill loop calls index.insert, and the delete/update paths around the other
noted regions); modify those call sites to check the returned bool and abort the
statement (propagate an error/return failure) if any index write fails.
Implement a small shared helper (e.g., bool apply_index_write(BTreeIndex&, const
common::Value& key, tuple_id_t id, Operation op)) used by execute_create_index
and the insert/delete/update maintenance code to perform the insert/remove, log
the failure with context, and return false so the caller can fail the statement
and avoid heap/index divergence.
| static_cast<void>(send(client_fd, ready.data(), ready.size(), 0)); | ||
|
|
||
| // 2. Query Loop | ||
| executor::QueryExecutor exec(catalog_, bpm_, lock_manager_, transaction_manager_); |
There was a problem hiding this comment.
Rollback any open transaction on disconnect.
Making exec connection-scoped means an explicit transaction can now survive across statements. If the client drops the socket before COMMIT/ROLLBACK, handle_connection() just destroys exec; include/executor/query_executor.hpp:73 and src/executor/query_executor.cpp:176-198 show there is no destructor cleanup path, so the transaction stays registered in TransactionManager and keeps its locks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/network/server.cpp` at line 338, The connection-scoped QueryExecutor
(executor::QueryExecutor exec(...)) lets an explicit transaction outlive
statements and leaks if the client disconnects; modify handle_connection() so
that on socket disconnect/connection teardown you detect any active transaction
on exec and explicitly abort/rollback it (e.g., call a new or existing method
such as QueryExecutor::rollbackActiveTransaction() or invoke
TransactionManager's abort/rollback using the executor's current txn id), or
alternatively make exec statement-scoped instead of connection-scoped; locate
the instantiation "executor::QueryExecutor exec(catalog_, bpm_, lock_manager_,
transaction_manager_)" and add cleanup logic on all disconnect/error paths to
call rollback on the active transaction (or add a QueryExecutor destructor that
performs this rollback and ensure it's invoked on drop).
| } else if (consume(TokenType::Right)) { | ||
| if (!consume(TokenType::Join)) { | ||
| return nullptr; | ||
| } | ||
| join_type = SelectStatement::JoinType::Right; | ||
| } else if (consume(TokenType::Full)) { | ||
| if (!consume(TokenType::Join)) { | ||
| return nullptr; | ||
| } | ||
| join_type = SelectStatement::JoinType::Full; |
There was a problem hiding this comment.
Accept the optional OUTER keyword here.
These branches require JOIN immediately after RIGHT/FULL, so valid forms like RIGHT OUTER JOIN and FULL OUTER JOIN still fail to parse even though TokenType::Outer already exists.
Suggested fix
} else if (consume(TokenType::Right)) {
+ static_cast<void>(consume(TokenType::Outer));
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Right;
} else if (consume(TokenType::Full)) {
+ static_cast<void>(consume(TokenType::Outer));
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Full;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if (consume(TokenType::Right)) { | |
| if (!consume(TokenType::Join)) { | |
| return nullptr; | |
| } | |
| join_type = SelectStatement::JoinType::Right; | |
| } else if (consume(TokenType::Full)) { | |
| if (!consume(TokenType::Join)) { | |
| return nullptr; | |
| } | |
| join_type = SelectStatement::JoinType::Full; | |
| } else if (consume(TokenType::Right)) { | |
| static_cast<void>(consume(TokenType::Outer)); | |
| if (!consume(TokenType::Join)) { | |
| return nullptr; | |
| } | |
| join_type = SelectStatement::JoinType::Right; | |
| } else if (consume(TokenType::Full)) { | |
| static_cast<void>(consume(TokenType::Outer)); | |
| if (!consume(TokenType::Join)) { | |
| return nullptr; | |
| } | |
| join_type = SelectStatement::JoinType::Full; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/parser/parser.cpp` around lines 136 - 145, The parser currently demands
JOIN immediately after consuming TokenType::Right or TokenType::Full so forms
like "RIGHT OUTER JOIN" fail; update the branches handling TokenType::Right and
TokenType::Full in parser.cpp (the code using consume(...) and setting join_type
to SelectStatement::JoinType::Right/Full) to optionally accept TokenType::Outer
before requiring TokenType::Join — i.e., after consuming Right/Full, call
consume(TokenType::Outer) if present (ignore its return if false) and then
ensure consume(TokenType::Join) succeeds, returning nullptr on failure as
before.
| case UndoLog::Type::DELETE: | ||
| /* TODO: Implement DELETE undo */ | ||
| static_cast<void>(0); | ||
| static_cast<void>(table.undo_remove(log.rid)); | ||
| break; | ||
| case UndoLog::Type::UPDATE: | ||
| /* TODO: Implement UPDATE undo */ | ||
| static_cast<void>(1); | ||
| /* For update, we physically remove the new version and reset xmax on the old one */ | ||
| static_cast<void>(table.physical_remove(log.rid)); | ||
| if (log.old_rid.has_value()) { | ||
| static_cast<void>(table.undo_remove(log.old_rid.value())); | ||
| } |
There was a problem hiding this comment.
Don't swallow undo failures during abort.
table.undo_remove() and table.physical_remove() both return bool, but this code drops the result and continues. If any undo step fails, the transaction is still marked aborted after a partial rollback.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/transaction/transaction_manager.cpp` around lines 180 - 188, The
undo/rollback code is ignoring the bool results from table.undo_remove(...) and
table.physical_remove(...), so failures are swallowed; capture each call’s
boolean return and propagate failure instead of discarding it (e.g., store the
result of table.undo_remove(log.rid) and table.physical_remove(log.rid), check
for false, log/record the failure and abort the rollback routine or return false
to the caller). Update the handling in the UndoLog::Type::DELETE and
UndoLog::Type::UPDATE branches to check the return values for
table.undo_remove(log.rid), table.physical_remove(log.rid) and
table.undo_remove(log.old_rid.value()), and ensure a failed undo causes the
abort path to stop and report failure rather than continuing silently.
| case UndoLog::Type::UPDATE: | ||
| /* TODO: Implement UPDATE undo */ | ||
| static_cast<void>(1); | ||
| /* For update, we physically remove the new version and reset xmax on the old one */ | ||
| static_cast<void>(table.physical_remove(log.rid)); | ||
| if (log.old_rid.has_value()) { | ||
| static_cast<void>(table.undo_remove(log.old_rid.value())); | ||
| } |
There was a problem hiding this comment.
UPDATE rollback does not repair indexes.
src/executor/query_executor.cpp:595-626 inserts index entries for new_tid before this undo record is written. This rollback branch only removes the new heap tuple and undeletes the old heap version; it never removes the new index entry, so indexed reads can still return a RID that was rolled back.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/transaction/transaction_manager.cpp` around lines 183 - 188, The UPDATE
undo branch (UndoLog::Type::UPDATE) removes the new heap tuple and undeletes the
old one but never updates indexes, leaving stale index entries for log.rid; fix
it by invoking the index cleanup/repair APIs around the existing calls (e.g.,
call the index manager to remove index entries for log.rid before/after
table.physical_remove(log.rid)) and, if log.old_rid.has_value(), re-insert or
restore index entries for log.old_rid (using the project's index insertion API)
so indexes match the restored heap; update the code around table.physical_remove
and table.undo_remove to call those index functions (use the project’s existing
methods such as remove_index_entries_for_rid and insert_index_entries_for_tuple
or their equivalents).
| # 7. Limit greater than available rows | ||
| query IT | ||
| SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10; | ||
| ---- | ||
| 4 | ||
| 5 |
There was a problem hiding this comment.
This case won't catch an OFFSET ... LIMIT ... bug.
After OFFSET 3, only two rows remain, so the result is 4, 5 whether LIMIT 10 is applied or silently ignored. That makes this test too weak to validate the offset-before-limit path.
Suggested fix
-SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10;
+SELECT id FROM lim_off_test ORDER BY id OFFSET 1 LIMIT 1;
----
-4
-5
+2📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 7. Limit greater than available rows | |
| query IT | |
| SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10; | |
| ---- | |
| 4 | |
| 5 | |
| # 7. Limit greater than available rows | |
| query IT | |
| SELECT id FROM lim_off_test ORDER BY id OFFSET 1 LIMIT 1; | |
| ---- | |
| 2 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/logic/limit_offset.slt` around lines 47 - 52, The test uses "OFFSET 3
LIMIT 10" which doesn't validate offset-before-limit because only two rows
remain so LIMIT 10 is non-binding; change the query in
tests/logic/limit_offset.slt to use a LIMIT that is smaller than the remaining
rows (for example replace "OFFSET 3 LIMIT 10" with "OFFSET 3 LIMIT 1") and
update the expected output from "4\n5" to just "4" so the test will fail if
OFFSET is applied after LIMIT.
This follow-up PR further stabilizes the execution engine and expands test coverage:
HashJoinOperatorwith proper matched-row tracking.UPDATEandDELETEoperations, utilizing a newly expandedUndoLogandHeapTablerollback primitives.QueryExecutorper connection, enabling multi-statement transactions.LIMIT 0andOFFSETedge case handling across the parser and executor.transactions.slt,indexes.slt, andlimit_offset.slt.Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests