feat: advanced join semantics, transactional integrity, and expanded SLT suite by poyrazK · Pull Request #13 · poyrazK/cloudSQL

poyrazK · 2026-03-10T12:28:54Z

This follow-up PR further stabilizes the execution engine and expands test coverage:

FULL/RIGHT Join Support: Completed outer join semantics in HashJoinOperator with proper matched-row tracking.
Transactional ROLLBACK: Implemented full undo logic for UPDATE and DELETE operations, utilizing a newly expanded UndoLog and HeapTable rollback primitives.
Connection-Aware Execution: Refactored the PostgreSQL server to maintain a persistent QueryExecutor per connection, enabling multi-statement transactions.
Pagination Fixes: Corrected LIMIT 0 and OFFSET edge case handling across the parser and executor.
SLT Expansion: Added 40+ new logic tests across transactions.slt, indexes.slt, and limit_offset.slt.
Documentation: Updated the project roadmap and features to reflect these Phase 9 advancements.

Summary by CodeRabbit

Release Notes

New Features
- Added support for RIGHT and FULL outer joins alongside existing INNER and LEFT joins.
- Implemented B+ Tree indexing for persistent index creation and optimized query execution.
- Enhanced LIMIT and OFFSET support with improved handling of edge cases.
- Improved transaction rollback support for DELETE and UPDATE operations.
Documentation
- Updated feature descriptions and build instructions.
Tests
- Added comprehensive test suites for indexes, advanced joins, LIMIT/OFFSET, and transaction semantics.

…oLog

…ator

…l integrity

coderabbitai · 2026-03-10T12:29:11Z

📝 Walkthrough

Walkthrough

This PR adds index creation support with B-tree indexing, extends transaction rollback semantics for DELETE/UPDATE operations via undo logs, introduces FULL and RIGHT join parsing, changes LIMIT/OFFSET to signed integers for unlimited support, and adds comprehensive test coverage for indexes, transactions, and limit/offset behavior.

Changes

Cohort / File(s)	Summary
Documentation `README.md`, `docs/phases/README.md`	Updated build instructions and test runner guidance; documented Advanced Execution Engine features including FULL outer joins, B+ tree indexing, and transaction rollback support.
Parser Token & Join Support `include/parser/token.hpp`, `src/parser/lexer.cpp`, `src/parser/parser.cpp`	Added TokenType::Full enum value; recognized "FULL" keyword in lexer; extended parser to handle RIGHT and FULL join types alongside existing INNER and LEFT joins.
LIMIT/OFFSET Type Changes `include/parser/statement.hpp`, `include/executor/operator.hpp`, `src/executor/operator.cpp`	Changed LimitOperator and SelectStatement limit/offset fields from uint64_t to int64_t; updated initialization defaults to -1 and has_limit() logic to support unlimited (negative) limits; added type-safe comparisons in operator execution.
Index Creation Support `include/executor/query_executor.hpp`, `src/executor/query_executor.cpp`	Added execute_create_index() method; enhanced build_plan() to detect indexed columns and use IndexScan when appropriate; implemented index file creation, catalog updates, and existing data population; updated IndexScanOperator to use rid-derived metadata for MVCC visibility.
Transaction & Storage Rollback `include/transaction/transaction.hpp`, `src/transaction/transaction_manager.cpp`, `include/storage/heap_table.hpp`, `src/storage/heap_table.cpp`	Added optional old_rid field to UndoLog; implemented undo logic for DELETE (via undo_remove) and UPDATE (physical_remove + conditional undo_remove); added HeapTable::undo_remove() method to reset xmax for rollback.
Server & Executor Lifecycle `src/network/server.cpp`	Moved QueryExecutor construction outside per-statement loop to persist executor across statements within a single query session.
Test Suite Additions `tests/logic/indexes.slt`, `tests/logic/transactions.slt`, `tests/logic/limit_offset.slt`, `tests/logic/joins.slt`	Added comprehensive test coverage: index point/range lookups and post-drop verification; transaction commit, rollback, update visibility, and delete rollback scenarios; LIMIT/OFFSET edge cases with WHERE filters; RIGHT and FULL join test cases with NULL row validation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant QueryExecutor
    participant Catalog
    participant Storage as FileStorage
    participant HeapTable
    
    Client->>QueryExecutor: execute(CreateIndexStatement)
    activate QueryExecutor
    QueryExecutor->>Catalog: validate table exists
    Catalog-->>QueryExecutor: table metadata
    
    QueryExecutor->>Catalog: resolve column positions/types
    Catalog-->>QueryExecutor: column info
    
    QueryExecutor->>Catalog: add index entry
    Catalog-->>QueryExecutor: index registered
    
    QueryExecutor->>Storage: create B-tree index file
    alt File Creation Success
        Storage-->>QueryExecutor: file created
        QueryExecutor->>HeapTable: scan for existing data
        activate HeapTable
        loop For each tuple
            HeapTable-->>QueryExecutor: tuple
            QueryExecutor->>Storage: insert into index
            Storage-->>QueryExecutor: ack
        end
        deactivate HeapTable
        QueryExecutor-->>Client: success
    else File Creation Fails
        Storage-->>QueryExecutor: error
        QueryExecutor->>Catalog: remove index entry (rollback)
        Catalog-->>QueryExecutor: ack
        QueryExecutor-->>Client: error
    end
    deactivate QueryExecutor

sequenceDiagram
    participant TransactionMgr as TransactionManager
    participant UndoLog
    participant HeapTable
    participant BTreeIndex as IndexStructure
    
    Note over TransactionMgr: ROLLBACK initiated
    TransactionMgr->>UndoLog: iterate undo logs (reverse order)
    activate UndoLog
    
    loop For each undo log entry
        alt Type == DELETE
            UndoLog->>HeapTable: undo_remove(rid)
            activate HeapTable
            HeapTable->>HeapTable: reset xmax to 0
            HeapTable-->>UndoLog: success
            deactivate HeapTable
        else Type == UPDATE
            UndoLog->>HeapTable: physical_remove(rid)
            activate HeapTable
            HeapTable-->>UndoLog: removed new version
            deactivate HeapTable
            
            alt old_rid exists
                UndoLog->>HeapTable: undo_remove(old_rid)
                activate HeapTable
                HeapTable->>HeapTable: restore old version
                HeapTable-->>UndoLog: success
                deactivate HeapTable
            end
        else Type == INSERT
            UndoLog->>HeapTable: physical_remove(rid)
            activate HeapTable
            HeapTable-->>UndoLog: success
            deactivate HeapTable
        end
        
        alt Index present
            UndoLog->>BTreeIndex: update index (remove new, reinsert old if applicable)
            BTreeIndex-->>UndoLog: ack
        end
    end
    deactivate UndoLog
    TransactionMgr-->>TransactionMgr: transaction rolled back

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Improve Stability and Achieve Zero-Warning Build #1: Adds index creation and B-tree support alongside buffer pool manager enhancements; overlaps with this PR's index functionality and query executor integration.
chore: migrate to GoogleTest and restore coverage #3: Modifies transaction subsystem with undo/rollback mechanisms; directly related to this PR's transaction rollback and undo log changes.
feat: Implement LEFT JOIN and fix NULL handling in PostgreSQL protocol #12: Extends join type support (FULL/RIGHT) in the parser and HashJoinOperator; parallels this PR's join parsing and operator enhancements.

Suggested labels

enhancement, feature, testing

Poem

🐰 Hoppy hops through indexes bright,
Joins now LEFT and RIGHT and FULL with might,
Rollback tales of DELETE's delight,
LIMIT signed for unlimited flight!
Tests assert our DB's tight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.23% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: advanced join semantics (RIGHT/FULL joins), transactional integrity (ROLLBACK support), and expanded test suite (SLT tests). All three aspects reflect substantial contributions across the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/fix-joins-and-nulls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

README.md (1)

48-53: ⚠️ Potential issue | 🟡 Minor

Fix the run_test.sh path in this build snippet.

After cd build, ./tests/run_test.sh is no longer relative to the repo root, so the command shown here will fail. Use ../tests/run_test.sh from this directory, or keep the script invocation in the separate “Running Tests” section only.
Doc fix
-make -j$(nproc) # Or ./tests/run_test.sh for automated multi-OS build
+make -j$(nproc) # Or ../tests/run_test.sh for automated multi-OS build
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 48 - 53, The build snippet after "cd build" calls
./tests/run_test.sh which is incorrect because it's relative to the build
directory; update the snippet to invoke ../tests/run_test.sh (or remove the
script invocation here and keep it only in the "Running Tests" section) so
run_test.sh is referenced from the repository root when executed from the build
directory; ensure the README's build commands show either the corrected
../tests/run_test.sh path or omit the script call altogether.

src/executor/query_executor.cpp (1)

491-495: ⚠️ Potential issue | 🔴 Critical

Don't couple delete-side index cleanup to WAL.

old_tuple is only loaded when log_manager_ is present, but Line 499's index-removal block also depends on it. With WAL disabled, the heap delete succeeds and each secondary index keeps a dangling entry for the deleted row.

Local fix

-        /* Retrieve old tuple for logging */
-        Tuple old_tuple;
-        if (log_manager_ != nullptr && txn != nullptr) {
-            static_cast<void>(table.get(rid, old_tuple));
-        }
+        /* Retrieve old tuple for logging/index maintenance */
+        Tuple old_tuple;
+        if ((!table_meta->indexes.empty() || (log_manager_ != nullptr && txn != nullptr)) &&
+            !table.get(rid, old_tuple)) {
+            result.set_error("Failed to read tuple before delete");
+            return result;
+        }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 491 - 495, The code only loads
old_tuple when log_manager_ is set, but the index-removal code later uses
old_tuple regardless, leaving dangling secondary-index entries when WAL is
disabled; make the tuple fetch unconditional (remove the log_manager_ check) or
otherwise ensure the index-removal path always has a valid old_tuple by calling
table.get(rid, old_tuple) whenever txn != nullptr (or before running the
index-removal block). Update the logic around old_tuple, log_manager_, txn, and
the index-removal block so index cleanup does not depend on WAL being enabled.

🧹 Nitpick comments (2)

tests/logic/joins.slt (1)

42-59: Assert the unmatched right-side tuple identity, not just its amount.

These RIGHT/FULL cases only project users_j.name and orders_j.amount, so an implementation bug could still pass while returning the wrong right-side row. Including orders_j.id or orders_j.user_id would make the new outer-join coverage much tighter.

Suggested assertion upgrade

-query TR
-SELECT users_j.name, orders_j.amount FROM users_j RIGHT JOIN orders_j ON users_j.id = orders_j.user_id ORDER BY orders_j.amount;
+query TIR
+SELECT users_j.name, orders_j.id, orders_j.amount
+FROM users_j RIGHT JOIN orders_j ON users_j.id = orders_j.user_id
+ORDER BY orders_j.amount;
 ----
-Alice 25.0
-Alice 50.0
-NULL 75.0
-Bob 100.0
+Alice 102 25.0
+Alice 101 50.0
+NULL 104 75.0
+Bob 103 100.0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/logic/joins.slt` around lines 42 - 59, The RIGHT and FULL join test
cases only project users_j.name and orders_j.amount which allows wrong
right-side rows to slip by; update the two queries (the RIGHT JOIN and the FULL
JOIN in tests/logic/joins.slt) to also select a unique identifier from the right
table (e.g., orders_j.id or orders_j.user_id) and update the expected result
rows to include that identifier so the assertions verify the exact unmatched
right-side tuple identity as well as the amount.

tests/logic/indexes.slt (1)

32-39: Strengthen the DROP INDEX assertion.

The follow-up SELECT only proves the table is still queryable; it would also pass if DROP INDEX silently left idx_id in place. Recreating idx_id after Line 34, or asserting duplicate-name failure before the drop, would verify that the index entry was actually removed.

Suggested SLT tightening

 # 5. Drop index and verify it still works
 statement ok
 DROP INDEX idx_id;
 
+statement ok
+CREATE INDEX idx_id ON idx_test (id);
+
+statement ok
+DROP INDEX idx_id;
+
 query IT
 SELECT id, val FROM idx_test WHERE id = 1;
 ----
 1 one

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/logic/indexes.slt` around lines 32 - 39, The DROP INDEX test is weak
because the subsequent SELECT only shows the table still works and doesn't prove
idx_id was removed; update the test to verify the index was actually dropped by
either attempting to CREATE INDEX idx_id ... after the DROP and asserting that
creation succeeds (proving the name was freed), or by attempting a duplicate
CREATE INDEX idx_id ... before the DROP and asserting it fails, then DROP idx_id
and assert a subsequent CREATE succeeds. Reference the index name idx_id and the
table/query SELECT id, val FROM idx_test WHERE id = 1 when adding the extra
assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/transaction/transaction.hpp`:
- Around line 123-126: The add_undo_log API currently allows creating an UPDATE
undo log without old_rid; change the API to enforce that UPDATE must provide
old_rid by splitting into two overloads: add_undo_log(UndoLog::Type type, const
std::string& table_name, const storage::HeapTable::TupleId& rid) for non-UPDATE
types and add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid, const storage::HeapTable::TupleId& old_rid)
for UPDATE, and update both overloads to validate the invariant (e.g., assert or
throw if the caller uses the wrong overload: first must ensure type !=
UndoLog::Type::UPDATE, second should ensure type == UndoLog::Type::UPDATE)
before pushing into undo_logs_, so callers cannot omit old_rid for UPDATE and
rollback logic that checks old_rid remains correct.

In `@src/executor/query_executor.cpp`:
- Around line 317-330: The code ignores the boolean return value of
storage::BTreeIndex::insert/remove during the index backfill and DML maintenance
paths (e.g., in execute_create_index where the backfill loop calls index.insert,
and the delete/update paths around the other noted regions); modify those call
sites to check the returned bool and abort the statement (propagate an
error/return failure) if any index write fails. Implement a small shared helper
(e.g., bool apply_index_write(BTreeIndex&, const common::Value& key, tuple_id_t
id, Operation op)) used by execute_create_index and the insert/delete/update
maintenance code to perform the insert/remove, log the failure with context, and
return false so the caller can fail the statement and avoid heap/index
divergence.
- Around line 282-303: The code currently allows multiple columns in
stmt.columns() but only uses column_positions[0] and overwrites key_type,
causing incorrect single-column index behavior; modify the handler in the
function that processes stmt.columns() (before building col_positions/key_type
and before calling catalog_.create_index) to reject composite indexes: if
stmt.columns().size() != 1 then call result.set_error with a clear message
(e.g., "Composite indexes not supported") and return result; otherwise proceed
and set col_positions and key_type from the single column only (ensuring
key_type is taken from that one column) before calling
catalog_.create_index(index_name(), table_meta->table_id, col_positions,
IndexType::BTree, stmt.unique()).

In `@src/network/server.cpp`:
- Line 338: The connection-scoped QueryExecutor (executor::QueryExecutor
exec(...)) lets an explicit transaction outlive statements and leaks if the
client disconnects; modify handle_connection() so that on socket
disconnect/connection teardown you detect any active transaction on exec and
explicitly abort/rollback it (e.g., call a new or existing method such as
QueryExecutor::rollbackActiveTransaction() or invoke TransactionManager's
abort/rollback using the executor's current txn id), or alternatively make exec
statement-scoped instead of connection-scoped; locate the instantiation
"executor::QueryExecutor exec(catalog_, bpm_, lock_manager_,
transaction_manager_)" and add cleanup logic on all disconnect/error paths to
call rollback on the active transaction (or add a QueryExecutor destructor that
performs this rollback and ensure it's invoked on drop).

In `@src/parser/parser.cpp`:
- Around line 136-145: The parser currently demands JOIN immediately after
consuming TokenType::Right or TokenType::Full so forms like "RIGHT OUTER JOIN"
fail; update the branches handling TokenType::Right and TokenType::Full in
parser.cpp (the code using consume(...) and setting join_type to
SelectStatement::JoinType::Right/Full) to optionally accept TokenType::Outer
before requiring TokenType::Join — i.e., after consuming Right/Full, call
consume(TokenType::Outer) if present (ignore its return if false) and then
ensure consume(TokenType::Join) succeeds, returning nullptr on failure as
before.

In `@src/transaction/transaction_manager.cpp`:
- Around line 180-188: The undo/rollback code is ignoring the bool results from
table.undo_remove(...) and table.physical_remove(...), so failures are
swallowed; capture each call’s boolean return and propagate failure instead of
discarding it (e.g., store the result of table.undo_remove(log.rid) and
table.physical_remove(log.rid), check for false, log/record the failure and
abort the rollback routine or return false to the caller). Update the handling
in the UndoLog::Type::DELETE and UndoLog::Type::UPDATE branches to check the
return values for table.undo_remove(log.rid), table.physical_remove(log.rid) and
table.undo_remove(log.old_rid.value()), and ensure a failed undo causes the
abort path to stop and report failure rather than continuing silently.
- Around line 183-188: The UPDATE undo branch (UndoLog::Type::UPDATE) removes
the new heap tuple and undeletes the old one but never updates indexes, leaving
stale index entries for log.rid; fix it by invoking the index cleanup/repair
APIs around the existing calls (e.g., call the index manager to remove index
entries for log.rid before/after table.physical_remove(log.rid)) and, if
log.old_rid.has_value(), re-insert or restore index entries for log.old_rid
(using the project's index insertion API) so indexes match the restored heap;
update the code around table.physical_remove and table.undo_remove to call those
index functions (use the project’s existing methods such as
remove_index_entries_for_rid and insert_index_entries_for_tuple or their
equivalents).

In `@tests/logic/limit_offset.slt`:
- Around line 47-52: The test uses "OFFSET 3 LIMIT 10" which doesn't validate
offset-before-limit because only two rows remain so LIMIT 10 is non-binding;
change the query in tests/logic/limit_offset.slt to use a LIMIT that is smaller
than the remaining rows (for example replace "OFFSET 3 LIMIT 10" with "OFFSET 3
LIMIT 1") and update the expected output from "4\n5" to just "4" so the test
will fail if OFFSET is applied after LIMIT.

---

Outside diff comments:
In `@README.md`:
- Around line 48-53: The build snippet after "cd build" calls
./tests/run_test.sh which is incorrect because it's relative to the build
directory; update the snippet to invoke ../tests/run_test.sh (or remove the
script invocation here and keep it only in the "Running Tests" section) so
run_test.sh is referenced from the repository root when executed from the build
directory; ensure the README's build commands show either the corrected
../tests/run_test.sh path or omit the script call altogether.

In `@src/executor/query_executor.cpp`:
- Around line 491-495: The code only loads old_tuple when log_manager_ is set,
but the index-removal code later uses old_tuple regardless, leaving dangling
secondary-index entries when WAL is disabled; make the tuple fetch unconditional
(remove the log_manager_ check) or otherwise ensure the index-removal path
always has a valid old_tuple by calling table.get(rid, old_tuple) whenever txn
!= nullptr (or before running the index-removal block). Update the logic around
old_tuple, log_manager_, txn, and the index-removal block so index cleanup does
not depend on WAL being enabled.

---

Nitpick comments:
In `@tests/logic/indexes.slt`:
- Around line 32-39: The DROP INDEX test is weak because the subsequent SELECT
only shows the table still works and doesn't prove idx_id was removed; update
the test to verify the index was actually dropped by either attempting to CREATE
INDEX idx_id ... after the DROP and asserting that creation succeeds (proving
the name was freed), or by attempting a duplicate CREATE INDEX idx_id ... before
the DROP and asserting it fails, then DROP idx_id and assert a subsequent CREATE
succeeds. Reference the index name idx_id and the table/query SELECT id, val
FROM idx_test WHERE id = 1 when adding the extra assertions.

In `@tests/logic/joins.slt`:
- Around line 42-59: The RIGHT and FULL join test cases only project
users_j.name and orders_j.amount which allows wrong right-side rows to slip by;
update the two queries (the RIGHT JOIN and the FULL JOIN in
tests/logic/joins.slt) to also select a unique identifier from the right table
(e.g., orders_j.id or orders_j.user_id) and update the expected result rows to
include that identifier so the assertions verify the exact unmatched right-side
tuple identity as well as the amount.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30d19d69-b48e-42d4-a4bd-67780a2855f1

📥 Commits

Reviewing files that changed from the base of the PR and between e8f2c69 and 2a22adf.

📒 Files selected for processing (19)

README.md
docs/phases/README.md
include/executor/operator.hpp
include/executor/query_executor.hpp
include/parser/statement.hpp
include/parser/token.hpp
include/storage/heap_table.hpp
include/transaction/transaction.hpp
src/executor/operator.cpp
src/executor/query_executor.cpp
src/network/server.cpp
src/parser/lexer.cpp
src/parser/parser.cpp
src/storage/heap_table.cpp
src/transaction/transaction_manager.cpp
tests/logic/indexes.slt
tests/logic/joins.slt
tests/logic/limit_offset.slt
tests/logic/transactions.slt

coderabbitai · 2026-03-10T12:40:47Z

include/transaction/transaction.hpp

    void add_undo_log(UndoLog::Type type, const std::string& table_name,
-                      const storage::HeapTable::TupleId& rid) {
-        undo_logs_.push_back({type, table_name, rid});
+                      const storage::HeapTable::TupleId& rid,
+                      std::optional<storage::HeapTable::TupleId> old_rid = std::nullopt) {
+        undo_logs_.push_back({type, table_name, rid, old_rid});


🛠️ Refactor suggestion | 🟠 Major

Make UPDATE undo logs impossible to record without old_rid.

src/transaction/transaction_manager.cpp only restores the old tuple version for UndoLog::Type::UPDATE when old_rid is present. With this signature, a caller can forget the fourth argument and rollback will silently remove the new tuple while leaving the old one hidden. This API should enforce that invariant instead of relying on every call site to remember it.

One way to enforce the invariant

void add_undo_log(UndoLog::Type type, const std::string& table_name, const storage::HeapTable::TupleId& rid, std::optional<storage::HeapTable::TupleId> old_rid = std::nullopt) { + if (type == UndoLog::Type::UPDATE && !old_rid.has_value()) { + throw std::invalid_argument("UPDATE undo log requires old_rid"); + } undo_logs_.push_back({type, table_name, rid, old_rid}); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@include/transaction/transaction.hpp` around lines 123 - 126, The add_undo_log API currently allows creating an UPDATE undo log without old_rid; change the API to enforce that UPDATE must provide old_rid by splitting into two overloads: add_undo_log(UndoLog::Type type, const std::string& table_name, const storage::HeapTable::TupleId& rid) for non-UPDATE types and add_undo_log(UndoLog::Type type, const std::string& table_name, const storage::HeapTable::TupleId& rid, const storage::HeapTable::TupleId& old_rid) for UPDATE, and update both overloads to validate the invariant (e.g., assert or throw if the caller uses the wrong overload: first must ensure type != UndoLog::Type::UPDATE, second should ensure type == UndoLog::Type::UPDATE) before pushing into undo_logs_, so callers cannot omit old_rid for UPDATE and rollback logic that checks old_rid remains correct.

coderabbitai · 2026-03-10T12:40:47Z

src/executor/query_executor.cpp

+    std::vector<uint16_t> col_positions;
+    common::ValueType key_type = common::ValueType::TYPE_NULL;
+
+    for (const auto& col_name : stmt.columns()) {
+        bool found = false;
+        for (const auto& col : table_meta->columns) {
+            if (col.name == col_name) {
+                col_positions.push_back(col.position);
+                key_type = col.type;
+                found = true;
+                break;
+            }
+        }
+        if (!found) {
+            result.set_error("Column not found: " + col_name);
+            return result;
+        }
+    }
+
+    /* Update Catalog */
+    const oid_t index_id = catalog_.create_index(stmt.index_name(), table_meta->table_id,
+                                                 col_positions, IndexType::BTree, stmt.unique());


⚠️ Potential issue | 🟠 Major

Reject composite indexes until composite keys are actually supported.

This accepts multiple columns, but later in this function and the DML maintenance below only column_positions[0] is ever used as the physical key, while key_type is overwritten on each iteration. CREATE INDEX ... (a, b) therefore records a multi-column definition but builds a single-column B-tree, potentially with the wrong key type.

Suggested guard

std::vector<uint16_t> col_positions; common::ValueType key_type = common::ValueType::TYPE_NULL; + + if (stmt.columns().size() != 1) { + result.set_error("Only single-column indexes are supported"); + return result; + } for (const auto& col_name : stmt.columns()) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/executor/query_executor.cpp` around lines 282 - 303, The code currently allows multiple columns in stmt.columns() but only uses column_positions[0] and overwrites key_type, causing incorrect single-column index behavior; modify the handler in the function that processes stmt.columns() (before building col_positions/key_type and before calling catalog_.create_index) to reject composite indexes: if stmt.columns().size() != 1 then call result.set_error with a clear message (e.g., "Composite indexes not supported") and return result; otherwise proceed and set col_positions and key_type from the single column only (ensuring key_type is taken from that one column) before calling catalog_.create_index(index_name(), table_meta->table_id, col_positions, IndexType::BTree, stmt.unique()).

coderabbitai · 2026-03-10T12:40:47Z

src/executor/query_executor.cpp

+    /* Populate Index with existing data */
+    Schema schema;
+    for (const auto& col : table_meta->columns) {
+        schema.add_column(col.name, col.type);
+    }
+    storage::HeapTable table(stmt.table_name(), bpm_, schema);
+    auto iter = table.scan();
+    storage::HeapTable::TupleMeta meta;
+    while (iter.next_meta(meta)) {
+        if (meta.xmax == 0) {
+            /* Extract key from tuple */
+            if (!col_positions.empty()) {
+                const common::Value& key = meta.tuple.get(col_positions[0]);
+                index.insert(key, iter.current_id());


⚠️ Potential issue | 🔴 Critical

Fail the statement when secondary-index writes fail.

Every storage::BTreeIndex::insert/remove here returns bool, but the result is ignored. A failed backfill or DML-side index update leaves the heap and index out of sync, and execute_create_index can still return success with a partially populated index. The insert/delete/update maintenance paths need the same guard, ideally behind a shared helper.

Example fix for the backfill path

- index.insert(key, iter.current_id()); + if (!index.insert(key, iter.current_id())) { + static_cast<void>(index.drop()); + static_cast<void>(catalog_.drop_index(index_id)); + result.set_error("Failed to populate index"); + return result; + }

Also applies to: 399-407, 498-508, 583-612

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/executor/query_executor.cpp` around lines 317 - 330, The code ignores the boolean return value of storage::BTreeIndex::insert/remove during the index backfill and DML maintenance paths (e.g., in execute_create_index where the backfill loop calls index.insert, and the delete/update paths around the other noted regions); modify those call sites to check the returned bool and abort the statement (propagate an error/return failure) if any index write fails. Implement a small shared helper (e.g., bool apply_index_write(BTreeIndex&, const common::Value& key, tuple_id_t id, Operation op)) used by execute_create_index and the insert/delete/update maintenance code to perform the insert/remove, log the failure with context, and return false so the caller can fail the statement and avoid heap/index divergence.

coderabbitai · 2026-03-10T12:40:47Z

src/network/server.cpp

    static_cast<void>(send(client_fd, ready.data(), ready.size(), 0));

    // 2. Query Loop
+    executor::QueryExecutor exec(catalog_, bpm_, lock_manager_, transaction_manager_);


⚠️ Potential issue | 🔴 Critical

Rollback any open transaction on disconnect.

Making exec connection-scoped means an explicit transaction can now survive across statements. If the client drops the socket before COMMIT/ROLLBACK, handle_connection() just destroys exec; include/executor/query_executor.hpp:73 and src/executor/query_executor.cpp:176-198 show there is no destructor cleanup path, so the transaction stays registered in TransactionManager and keeps its locks.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/network/server.cpp` at line 338, The connection-scoped QueryExecutor (executor::QueryExecutor exec(...)) lets an explicit transaction outlive statements and leaks if the client disconnects; modify handle_connection() so that on socket disconnect/connection teardown you detect any active transaction on exec and explicitly abort/rollback it (e.g., call a new or existing method such as QueryExecutor::rollbackActiveTransaction() or invoke TransactionManager's abort/rollback using the executor's current txn id), or alternatively make exec statement-scoped instead of connection-scoped; locate the instantiation "executor::QueryExecutor exec(catalog_, bpm_, lock_manager_, transaction_manager_)" and add cleanup logic on all disconnect/error paths to call rollback on the active transaction (or add a QueryExecutor destructor that performs this rollback and ensure it's invoked on drop).

coderabbitai · 2026-03-10T12:40:47Z

src/parser/parser.cpp

+            } else if (consume(TokenType::Right)) {
+                if (!consume(TokenType::Join)) {
+                    return nullptr;
+                }
+                join_type = SelectStatement::JoinType::Right;
+            } else if (consume(TokenType::Full)) {
+                if (!consume(TokenType::Join)) {
+                    return nullptr;
+                }
+                join_type = SelectStatement::JoinType::Full;


⚠️ Potential issue | 🟠 Major

Accept the optional OUTER keyword here.

These branches require JOIN immediately after RIGHT/FULL, so valid forms like RIGHT OUTER JOIN and FULL OUTER JOIN still fail to parse even though TokenType::Outer already exists.

Suggested fix

} else if (consume(TokenType::Right)) { + static_cast<void>(consume(TokenType::Outer)); if (!consume(TokenType::Join)) { return nullptr; } join_type = SelectStatement::JoinType::Right; } else if (consume(TokenType::Full)) { + static_cast<void>(consume(TokenType::Outer)); if (!consume(TokenType::Join)) { return nullptr; } join_type = SelectStatement::JoinType::Full;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} else if (consume(TokenType::Right)) {

if (!consume(TokenType::Join)) {

return nullptr;

}

join_type = SelectStatement::JoinType::Right;

} else if (consume(TokenType::Full)) {

if (!consume(TokenType::Join)) {

return nullptr;

}

join_type = SelectStatement::JoinType::Full;

} else if (consume(TokenType::Right)) {

static_cast<void>(consume(TokenType::Outer));

if (!consume(TokenType::Join)) {

return nullptr;

}

join_type = SelectStatement::JoinType::Right;

} else if (consume(TokenType::Full)) {

static_cast<void>(consume(TokenType::Outer));

if (!consume(TokenType::Join)) {

return nullptr;

}

join_type = SelectStatement::JoinType::Full;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/parser/parser.cpp` around lines 136 - 145, The parser currently demands JOIN immediately after consuming TokenType::Right or TokenType::Full so forms like "RIGHT OUTER JOIN" fail; update the branches handling TokenType::Right and TokenType::Full in parser.cpp (the code using consume(...) and setting join_type to SelectStatement::JoinType::Right/Full) to optionally accept TokenType::Outer before requiring TokenType::Join — i.e., after consuming Right/Full, call consume(TokenType::Outer) if present (ignore its return if false) and then ensure consume(TokenType::Join) succeeds, returning nullptr on failure as before.

coderabbitai · 2026-03-10T12:40:47Z

src/transaction/transaction_manager.cpp

            case UndoLog::Type::DELETE:
-                /* TODO: Implement DELETE undo */
-                static_cast<void>(0);
+                static_cast<void>(table.undo_remove(log.rid));
                break;
            case UndoLog::Type::UPDATE:
-                /* TODO: Implement UPDATE undo */
-                static_cast<void>(1);
+                /* For update, we physically remove the new version and reset xmax on the old one */
+                static_cast<void>(table.physical_remove(log.rid));
+                if (log.old_rid.has_value()) {
+                    static_cast<void>(table.undo_remove(log.old_rid.value()));
+                }


⚠️ Potential issue | 🟠 Major

Don't swallow undo failures during abort.

table.undo_remove() and table.physical_remove() both return bool, but this code drops the result and continues. If any undo step fails, the transaction is still marked aborted after a partial rollback.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/transaction/transaction_manager.cpp` around lines 180 - 188, The undo/rollback code is ignoring the bool results from table.undo_remove(...) and table.physical_remove(...), so failures are swallowed; capture each call’s boolean return and propagate failure instead of discarding it (e.g., store the result of table.undo_remove(log.rid) and table.physical_remove(log.rid), check for false, log/record the failure and abort the rollback routine or return false to the caller). Update the handling in the UndoLog::Type::DELETE and UndoLog::Type::UPDATE branches to check the return values for table.undo_remove(log.rid), table.physical_remove(log.rid) and table.undo_remove(log.old_rid.value()), and ensure a failed undo causes the abort path to stop and report failure rather than continuing silently.

coderabbitai · 2026-03-10T12:40:47Z

src/transaction/transaction_manager.cpp

            case UndoLog::Type::UPDATE:
-                /* TODO: Implement UPDATE undo */
-                static_cast<void>(1);
+                /* For update, we physically remove the new version and reset xmax on the old one */
+                static_cast<void>(table.physical_remove(log.rid));
+                if (log.old_rid.has_value()) {
+                    static_cast<void>(table.undo_remove(log.old_rid.value()));
+                }


⚠️ Potential issue | 🟠 Major

UPDATE rollback does not repair indexes.

src/executor/query_executor.cpp:595-626 inserts index entries for new_tid before this undo record is written. This rollback branch only removes the new heap tuple and undeletes the old heap version; it never removes the new index entry, so indexed reads can still return a RID that was rolled back.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/transaction/transaction_manager.cpp` around lines 183 - 188, The UPDATE undo branch (UndoLog::Type::UPDATE) removes the new heap tuple and undeletes the old one but never updates indexes, leaving stale index entries for log.rid; fix it by invoking the index cleanup/repair APIs around the existing calls (e.g., call the index manager to remove index entries for log.rid before/after table.physical_remove(log.rid)) and, if log.old_rid.has_value(), re-insert or restore index entries for log.old_rid (using the project's index insertion API) so indexes match the restored heap; update the code around table.physical_remove and table.undo_remove to call those index functions (use the project’s existing methods such as remove_index_entries_for_rid and insert_index_entries_for_tuple or their equivalents).

coderabbitai · 2026-03-10T12:40:48Z

tests/logic/limit_offset.slt

+# 7. Limit greater than available rows
+query IT
+SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10;
+----
+4
+5


⚠️ Potential issue | 🟡 Minor

This case won't catch an OFFSET ... LIMIT ... bug.

After OFFSET 3, only two rows remain, so the result is 4, 5 whether LIMIT 10 is applied or silently ignored. That makes this test too weak to validate the offset-before-limit path.

Suggested fix

-SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10; +SELECT id FROM lim_off_test ORDER BY id OFFSET 1 LIMIT 1; ---- -4 -5 +2

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# 7. Limit greater than available rows

query IT

SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10;

----

4

5

# 7. Limit greater than available rows

query IT

SELECT id FROM lim_off_test ORDER BY id OFFSET 1 LIMIT 1;

----

2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/logic/limit_offset.slt` around lines 47 - 52, The test uses "OFFSET 3 LIMIT 10" which doesn't validate offset-before-limit because only two rows remain so LIMIT 10 is non-binding; change the query in tests/logic/limit_offset.slt to use a LIMIT that is smaller than the remaining rows (for example replace "OFFSET 3 LIMIT 10" with "OFFSET 3 LIMIT 1") and update the expected output from "4\n5" to just "4" so the test will fail if OFFSET is applied after LIMIT.

poyrazK added 6 commits March 10, 2026 15:27

feat(storage): implement rollback support in HeapTable and expand Und…

4a6dc1c

…oLog

feat(txn): implement DML undo logic and persistent execution state

9ec25d0

feat(parser): add RIGHT/FULL join support and fix pagination parsing

1291e55

feat(executor): implement FULL/RIGHT join semantics and fix LimitOper…

980eed3

…ator

test(logic): expand SLT coverage for Transactions, Indexes, and Joins

af42245

docs: update features and roadmap for advanced joins and transactiona…

ef9dd0c

…l integrity

style: automated clang-format fixes

2a22adf

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

poyrazK closed this Mar 10, 2026

Conversation

poyrazK commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poyrazK commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading