Skip to content

feat: advanced join semantics, transactional integrity, and expanded SLT suite#13

Closed
poyrazK wants to merge 7 commits intomainfrom
feature/fix-joins-and-nulls
Closed

feat: advanced join semantics, transactional integrity, and expanded SLT suite#13
poyrazK wants to merge 7 commits intomainfrom
feature/fix-joins-and-nulls

Conversation

@poyrazK
Copy link
Owner

@poyrazK poyrazK commented Mar 10, 2026

This follow-up PR further stabilizes the execution engine and expands test coverage:

  1. FULL/RIGHT Join Support: Completed outer join semantics in HashJoinOperator with proper matched-row tracking.
  2. Transactional ROLLBACK: Implemented full undo logic for UPDATE and DELETE operations, utilizing a newly expanded UndoLog and HeapTable rollback primitives.
  3. Connection-Aware Execution: Refactored the PostgreSQL server to maintain a persistent QueryExecutor per connection, enabling multi-statement transactions.
  4. Pagination Fixes: Corrected LIMIT 0 and OFFSET edge case handling across the parser and executor.
  5. SLT Expansion: Added 40+ new logic tests across transactions.slt, indexes.slt, and limit_offset.slt.
  6. Documentation: Updated the project roadmap and features to reflect these Phase 9 advancements.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for RIGHT and FULL outer joins alongside existing INNER and LEFT joins.
    • Implemented B+ Tree indexing for persistent index creation and optimized query execution.
    • Enhanced LIMIT and OFFSET support with improved handling of edge cases.
    • Improved transaction rollback support for DELETE and UPDATE operations.
  • Documentation

    • Updated feature descriptions and build instructions.
  • Tests

    • Added comprehensive test suites for indexes, advanced joins, LIMIT/OFFSET, and transaction semantics.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

This PR adds index creation support with B-tree indexing, extends transaction rollback semantics for DELETE/UPDATE operations via undo logs, introduces FULL and RIGHT join parsing, changes LIMIT/OFFSET to signed integers for unlimited support, and adds comprehensive test coverage for indexes, transactions, and limit/offset behavior.

Changes

Cohort / File(s) Summary
Documentation
README.md, docs/phases/README.md
Updated build instructions and test runner guidance; documented Advanced Execution Engine features including FULL outer joins, B+ tree indexing, and transaction rollback support.
Parser Token & Join Support
include/parser/token.hpp, src/parser/lexer.cpp, src/parser/parser.cpp
Added TokenType::Full enum value; recognized "FULL" keyword in lexer; extended parser to handle RIGHT and FULL join types alongside existing INNER and LEFT joins.
LIMIT/OFFSET Type Changes
include/parser/statement.hpp, include/executor/operator.hpp, src/executor/operator.cpp
Changed LimitOperator and SelectStatement limit/offset fields from uint64_t to int64_t; updated initialization defaults to -1 and has_limit() logic to support unlimited (negative) limits; added type-safe comparisons in operator execution.
Index Creation Support
include/executor/query_executor.hpp, src/executor/query_executor.cpp
Added execute_create_index() method; enhanced build_plan() to detect indexed columns and use IndexScan when appropriate; implemented index file creation, catalog updates, and existing data population; updated IndexScanOperator to use rid-derived metadata for MVCC visibility.
Transaction & Storage Rollback
include/transaction/transaction.hpp, src/transaction/transaction_manager.cpp, include/storage/heap_table.hpp, src/storage/heap_table.cpp
Added optional old_rid field to UndoLog; implemented undo logic for DELETE (via undo_remove) and UPDATE (physical_remove + conditional undo_remove); added HeapTable::undo_remove() method to reset xmax for rollback.
Server & Executor Lifecycle
src/network/server.cpp
Moved QueryExecutor construction outside per-statement loop to persist executor across statements within a single query session.
Test Suite Additions
tests/logic/indexes.slt, tests/logic/transactions.slt, tests/logic/limit_offset.slt, tests/logic/joins.slt
Added comprehensive test coverage: index point/range lookups and post-drop verification; transaction commit, rollback, update visibility, and delete rollback scenarios; LIMIT/OFFSET edge cases with WHERE filters; RIGHT and FULL join test cases with NULL row validation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant QueryExecutor
    participant Catalog
    participant Storage as FileStorage
    participant HeapTable
    
    Client->>QueryExecutor: execute(CreateIndexStatement)
    activate QueryExecutor
    QueryExecutor->>Catalog: validate table exists
    Catalog-->>QueryExecutor: table metadata
    
    QueryExecutor->>Catalog: resolve column positions/types
    Catalog-->>QueryExecutor: column info
    
    QueryExecutor->>Catalog: add index entry
    Catalog-->>QueryExecutor: index registered
    
    QueryExecutor->>Storage: create B-tree index file
    alt File Creation Success
        Storage-->>QueryExecutor: file created
        QueryExecutor->>HeapTable: scan for existing data
        activate HeapTable
        loop For each tuple
            HeapTable-->>QueryExecutor: tuple
            QueryExecutor->>Storage: insert into index
            Storage-->>QueryExecutor: ack
        end
        deactivate HeapTable
        QueryExecutor-->>Client: success
    else File Creation Fails
        Storage-->>QueryExecutor: error
        QueryExecutor->>Catalog: remove index entry (rollback)
        Catalog-->>QueryExecutor: ack
        QueryExecutor-->>Client: error
    end
    deactivate QueryExecutor
Loading
sequenceDiagram
    participant TransactionMgr as TransactionManager
    participant UndoLog
    participant HeapTable
    participant BTreeIndex as IndexStructure
    
    Note over TransactionMgr: ROLLBACK initiated
    TransactionMgr->>UndoLog: iterate undo logs (reverse order)
    activate UndoLog
    
    loop For each undo log entry
        alt Type == DELETE
            UndoLog->>HeapTable: undo_remove(rid)
            activate HeapTable
            HeapTable->>HeapTable: reset xmax to 0
            HeapTable-->>UndoLog: success
            deactivate HeapTable
        else Type == UPDATE
            UndoLog->>HeapTable: physical_remove(rid)
            activate HeapTable
            HeapTable-->>UndoLog: removed new version
            deactivate HeapTable
            
            alt old_rid exists
                UndoLog->>HeapTable: undo_remove(old_rid)
                activate HeapTable
                HeapTable->>HeapTable: restore old version
                HeapTable-->>UndoLog: success
                deactivate HeapTable
            end
        else Type == INSERT
            UndoLog->>HeapTable: physical_remove(rid)
            activate HeapTable
            HeapTable-->>UndoLog: success
            deactivate HeapTable
        end
        
        alt Index present
            UndoLog->>BTreeIndex: update index (remove new, reinsert old if applicable)
            BTreeIndex-->>UndoLog: ack
        end
    end
    deactivate UndoLog
    TransactionMgr-->>TransactionMgr: transaction rolled back
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

enhancement, feature, testing

Poem

🐰 Hoppy hops through indexes bright,
Joins now LEFT and RIGHT and FULL with might,
Rollback tales of DELETE's delight,
LIMIT signed for unlimited flight!
Tests assert our DB's tight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: advanced join semantics (RIGHT/FULL joins), transactional integrity (ROLLBACK support), and expanded test suite (SLT tests). All three aspects reflect substantial contributions across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/fix-joins-and-nulls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
README.md (1)

48-53: ⚠️ Potential issue | 🟡 Minor

Fix the run_test.sh path in this build snippet.

After cd build, ./tests/run_test.sh is no longer relative to the repo root, so the command shown here will fail. Use ../tests/run_test.sh from this directory, or keep the script invocation in the separate “Running Tests” section only.

Doc fix
-make -j$(nproc) # Or ./tests/run_test.sh for automated multi-OS build
+make -j$(nproc) # Or ../tests/run_test.sh for automated multi-OS build
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 48 - 53, The build snippet after "cd build" calls
./tests/run_test.sh which is incorrect because it's relative to the build
directory; update the snippet to invoke ../tests/run_test.sh (or remove the
script invocation here and keep it only in the "Running Tests" section) so
run_test.sh is referenced from the repository root when executed from the build
directory; ensure the README's build commands show either the corrected
../tests/run_test.sh path or omit the script call altogether.
src/executor/query_executor.cpp (1)

491-495: ⚠️ Potential issue | 🔴 Critical

Don't couple delete-side index cleanup to WAL.

old_tuple is only loaded when log_manager_ is present, but Line 499's index-removal block also depends on it. With WAL disabled, the heap delete succeeds and each secondary index keeps a dangling entry for the deleted row.

Local fix
-        /* Retrieve old tuple for logging */
-        Tuple old_tuple;
-        if (log_manager_ != nullptr && txn != nullptr) {
-            static_cast<void>(table.get(rid, old_tuple));
-        }
+        /* Retrieve old tuple for logging/index maintenance */
+        Tuple old_tuple;
+        if ((!table_meta->indexes.empty() || (log_manager_ != nullptr && txn != nullptr)) &&
+            !table.get(rid, old_tuple)) {
+            result.set_error("Failed to read tuple before delete");
+            return result;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 491 - 495, The code only loads
old_tuple when log_manager_ is set, but the index-removal code later uses
old_tuple regardless, leaving dangling secondary-index entries when WAL is
disabled; make the tuple fetch unconditional (remove the log_manager_ check) or
otherwise ensure the index-removal path always has a valid old_tuple by calling
table.get(rid, old_tuple) whenever txn != nullptr (or before running the
index-removal block). Update the logic around old_tuple, log_manager_, txn, and
the index-removal block so index cleanup does not depend on WAL being enabled.
🧹 Nitpick comments (2)
tests/logic/joins.slt (1)

42-59: Assert the unmatched right-side tuple identity, not just its amount.

These RIGHT/FULL cases only project users_j.name and orders_j.amount, so an implementation bug could still pass while returning the wrong right-side row. Including orders_j.id or orders_j.user_id would make the new outer-join coverage much tighter.

Suggested assertion upgrade
-query TR
-SELECT users_j.name, orders_j.amount FROM users_j RIGHT JOIN orders_j ON users_j.id = orders_j.user_id ORDER BY orders_j.amount;
+query TIR
+SELECT users_j.name, orders_j.id, orders_j.amount
+FROM users_j RIGHT JOIN orders_j ON users_j.id = orders_j.user_id
+ORDER BY orders_j.amount;
 ----
-Alice 25.0
-Alice 50.0
-NULL 75.0
-Bob 100.0
+Alice 102 25.0
+Alice 101 50.0
+NULL 104 75.0
+Bob 103 100.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/logic/joins.slt` around lines 42 - 59, The RIGHT and FULL join test
cases only project users_j.name and orders_j.amount which allows wrong
right-side rows to slip by; update the two queries (the RIGHT JOIN and the FULL
JOIN in tests/logic/joins.slt) to also select a unique identifier from the right
table (e.g., orders_j.id or orders_j.user_id) and update the expected result
rows to include that identifier so the assertions verify the exact unmatched
right-side tuple identity as well as the amount.
tests/logic/indexes.slt (1)

32-39: Strengthen the DROP INDEX assertion.

The follow-up SELECT only proves the table is still queryable; it would also pass if DROP INDEX silently left idx_id in place. Recreating idx_id after Line 34, or asserting duplicate-name failure before the drop, would verify that the index entry was actually removed.

Suggested SLT tightening
 # 5. Drop index and verify it still works
 statement ok
 DROP INDEX idx_id;
 
+statement ok
+CREATE INDEX idx_id ON idx_test (id);
+
+statement ok
+DROP INDEX idx_id;
+
 query IT
 SELECT id, val FROM idx_test WHERE id = 1;
 ----
 1 one
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/logic/indexes.slt` around lines 32 - 39, The DROP INDEX test is weak
because the subsequent SELECT only shows the table still works and doesn't prove
idx_id was removed; update the test to verify the index was actually dropped by
either attempting to CREATE INDEX idx_id ... after the DROP and asserting that
creation succeeds (proving the name was freed), or by attempting a duplicate
CREATE INDEX idx_id ... before the DROP and asserting it fails, then DROP idx_id
and assert a subsequent CREATE succeeds. Reference the index name idx_id and the
table/query SELECT id, val FROM idx_test WHERE id = 1 when adding the extra
assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/transaction/transaction.hpp`:
- Around line 123-126: The add_undo_log API currently allows creating an UPDATE
undo log without old_rid; change the API to enforce that UPDATE must provide
old_rid by splitting into two overloads: add_undo_log(UndoLog::Type type, const
std::string& table_name, const storage::HeapTable::TupleId& rid) for non-UPDATE
types and add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid, const storage::HeapTable::TupleId& old_rid)
for UPDATE, and update both overloads to validate the invariant (e.g., assert or
throw if the caller uses the wrong overload: first must ensure type !=
UndoLog::Type::UPDATE, second should ensure type == UndoLog::Type::UPDATE)
before pushing into undo_logs_, so callers cannot omit old_rid for UPDATE and
rollback logic that checks old_rid remains correct.

In `@src/executor/query_executor.cpp`:
- Around line 317-330: The code ignores the boolean return value of
storage::BTreeIndex::insert/remove during the index backfill and DML maintenance
paths (e.g., in execute_create_index where the backfill loop calls index.insert,
and the delete/update paths around the other noted regions); modify those call
sites to check the returned bool and abort the statement (propagate an
error/return failure) if any index write fails. Implement a small shared helper
(e.g., bool apply_index_write(BTreeIndex&, const common::Value& key, tuple_id_t
id, Operation op)) used by execute_create_index and the insert/delete/update
maintenance code to perform the insert/remove, log the failure with context, and
return false so the caller can fail the statement and avoid heap/index
divergence.
- Around line 282-303: The code currently allows multiple columns in
stmt.columns() but only uses column_positions[0] and overwrites key_type,
causing incorrect single-column index behavior; modify the handler in the
function that processes stmt.columns() (before building col_positions/key_type
and before calling catalog_.create_index) to reject composite indexes: if
stmt.columns().size() != 1 then call result.set_error with a clear message
(e.g., "Composite indexes not supported") and return result; otherwise proceed
and set col_positions and key_type from the single column only (ensuring
key_type is taken from that one column) before calling
catalog_.create_index(index_name(), table_meta->table_id, col_positions,
IndexType::BTree, stmt.unique()).

In `@src/network/server.cpp`:
- Line 338: The connection-scoped QueryExecutor (executor::QueryExecutor
exec(...)) lets an explicit transaction outlive statements and leaks if the
client disconnects; modify handle_connection() so that on socket
disconnect/connection teardown you detect any active transaction on exec and
explicitly abort/rollback it (e.g., call a new or existing method such as
QueryExecutor::rollbackActiveTransaction() or invoke TransactionManager's
abort/rollback using the executor's current txn id), or alternatively make exec
statement-scoped instead of connection-scoped; locate the instantiation
"executor::QueryExecutor exec(catalog_, bpm_, lock_manager_,
transaction_manager_)" and add cleanup logic on all disconnect/error paths to
call rollback on the active transaction (or add a QueryExecutor destructor that
performs this rollback and ensure it's invoked on drop).

In `@src/parser/parser.cpp`:
- Around line 136-145: The parser currently demands JOIN immediately after
consuming TokenType::Right or TokenType::Full so forms like "RIGHT OUTER JOIN"
fail; update the branches handling TokenType::Right and TokenType::Full in
parser.cpp (the code using consume(...) and setting join_type to
SelectStatement::JoinType::Right/Full) to optionally accept TokenType::Outer
before requiring TokenType::Join — i.e., after consuming Right/Full, call
consume(TokenType::Outer) if present (ignore its return if false) and then
ensure consume(TokenType::Join) succeeds, returning nullptr on failure as
before.

In `@src/transaction/transaction_manager.cpp`:
- Around line 180-188: The undo/rollback code is ignoring the bool results from
table.undo_remove(...) and table.physical_remove(...), so failures are
swallowed; capture each call’s boolean return and propagate failure instead of
discarding it (e.g., store the result of table.undo_remove(log.rid) and
table.physical_remove(log.rid), check for false, log/record the failure and
abort the rollback routine or return false to the caller). Update the handling
in the UndoLog::Type::DELETE and UndoLog::Type::UPDATE branches to check the
return values for table.undo_remove(log.rid), table.physical_remove(log.rid) and
table.undo_remove(log.old_rid.value()), and ensure a failed undo causes the
abort path to stop and report failure rather than continuing silently.
- Around line 183-188: The UPDATE undo branch (UndoLog::Type::UPDATE) removes
the new heap tuple and undeletes the old one but never updates indexes, leaving
stale index entries for log.rid; fix it by invoking the index cleanup/repair
APIs around the existing calls (e.g., call the index manager to remove index
entries for log.rid before/after table.physical_remove(log.rid)) and, if
log.old_rid.has_value(), re-insert or restore index entries for log.old_rid
(using the project's index insertion API) so indexes match the restored heap;
update the code around table.physical_remove and table.undo_remove to call those
index functions (use the project’s existing methods such as
remove_index_entries_for_rid and insert_index_entries_for_tuple or their
equivalents).

In `@tests/logic/limit_offset.slt`:
- Around line 47-52: The test uses "OFFSET 3 LIMIT 10" which doesn't validate
offset-before-limit because only two rows remain so LIMIT 10 is non-binding;
change the query in tests/logic/limit_offset.slt to use a LIMIT that is smaller
than the remaining rows (for example replace "OFFSET 3 LIMIT 10" with "OFFSET 3
LIMIT 1") and update the expected output from "4\n5" to just "4" so the test
will fail if OFFSET is applied after LIMIT.

---

Outside diff comments:
In `@README.md`:
- Around line 48-53: The build snippet after "cd build" calls
./tests/run_test.sh which is incorrect because it's relative to the build
directory; update the snippet to invoke ../tests/run_test.sh (or remove the
script invocation here and keep it only in the "Running Tests" section) so
run_test.sh is referenced from the repository root when executed from the build
directory; ensure the README's build commands show either the corrected
../tests/run_test.sh path or omit the script call altogether.

In `@src/executor/query_executor.cpp`:
- Around line 491-495: The code only loads old_tuple when log_manager_ is set,
but the index-removal code later uses old_tuple regardless, leaving dangling
secondary-index entries when WAL is disabled; make the tuple fetch unconditional
(remove the log_manager_ check) or otherwise ensure the index-removal path
always has a valid old_tuple by calling table.get(rid, old_tuple) whenever txn
!= nullptr (or before running the index-removal block). Update the logic around
old_tuple, log_manager_, txn, and the index-removal block so index cleanup does
not depend on WAL being enabled.

---

Nitpick comments:
In `@tests/logic/indexes.slt`:
- Around line 32-39: The DROP INDEX test is weak because the subsequent SELECT
only shows the table still works and doesn't prove idx_id was removed; update
the test to verify the index was actually dropped by either attempting to CREATE
INDEX idx_id ... after the DROP and asserting that creation succeeds (proving
the name was freed), or by attempting a duplicate CREATE INDEX idx_id ... before
the DROP and asserting it fails, then DROP idx_id and assert a subsequent CREATE
succeeds. Reference the index name idx_id and the table/query SELECT id, val
FROM idx_test WHERE id = 1 when adding the extra assertions.

In `@tests/logic/joins.slt`:
- Around line 42-59: The RIGHT and FULL join test cases only project
users_j.name and orders_j.amount which allows wrong right-side rows to slip by;
update the two queries (the RIGHT JOIN and the FULL JOIN in
tests/logic/joins.slt) to also select a unique identifier from the right table
(e.g., orders_j.id or orders_j.user_id) and update the expected result rows to
include that identifier so the assertions verify the exact unmatched right-side
tuple identity as well as the amount.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30d19d69-b48e-42d4-a4bd-67780a2855f1

📥 Commits

Reviewing files that changed from the base of the PR and between e8f2c69 and 2a22adf.

📒 Files selected for processing (19)
  • README.md
  • docs/phases/README.md
  • include/executor/operator.hpp
  • include/executor/query_executor.hpp
  • include/parser/statement.hpp
  • include/parser/token.hpp
  • include/storage/heap_table.hpp
  • include/transaction/transaction.hpp
  • src/executor/operator.cpp
  • src/executor/query_executor.cpp
  • src/network/server.cpp
  • src/parser/lexer.cpp
  • src/parser/parser.cpp
  • src/storage/heap_table.cpp
  • src/transaction/transaction_manager.cpp
  • tests/logic/indexes.slt
  • tests/logic/joins.slt
  • tests/logic/limit_offset.slt
  • tests/logic/transactions.slt

Comment on lines 123 to +126
void add_undo_log(UndoLog::Type type, const std::string& table_name,
const storage::HeapTable::TupleId& rid) {
undo_logs_.push_back({type, table_name, rid});
const storage::HeapTable::TupleId& rid,
std::optional<storage::HeapTable::TupleId> old_rid = std::nullopt) {
undo_logs_.push_back({type, table_name, rid, old_rid});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make UPDATE undo logs impossible to record without old_rid.

src/transaction/transaction_manager.cpp only restores the old tuple version for UndoLog::Type::UPDATE when old_rid is present. With this signature, a caller can forget the fourth argument and rollback will silently remove the new tuple while leaving the old one hidden. This API should enforce that invariant instead of relying on every call site to remember it.

One way to enforce the invariant
 void add_undo_log(UndoLog::Type type, const std::string& table_name,
                   const storage::HeapTable::TupleId& rid,
                   std::optional<storage::HeapTable::TupleId> old_rid = std::nullopt) {
+    if (type == UndoLog::Type::UPDATE && !old_rid.has_value()) {
+        throw std::invalid_argument("UPDATE undo log requires old_rid");
+    }
     undo_logs_.push_back({type, table_name, rid, old_rid});
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/transaction/transaction.hpp` around lines 123 - 126, The add_undo_log
API currently allows creating an UPDATE undo log without old_rid; change the API
to enforce that UPDATE must provide old_rid by splitting into two overloads:
add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid) for non-UPDATE types and
add_undo_log(UndoLog::Type type, const std::string& table_name, const
storage::HeapTable::TupleId& rid, const storage::HeapTable::TupleId& old_rid)
for UPDATE, and update both overloads to validate the invariant (e.g., assert or
throw if the caller uses the wrong overload: first must ensure type !=
UndoLog::Type::UPDATE, second should ensure type == UndoLog::Type::UPDATE)
before pushing into undo_logs_, so callers cannot omit old_rid for UPDATE and
rollback logic that checks old_rid remains correct.

Comment on lines +282 to +303
std::vector<uint16_t> col_positions;
common::ValueType key_type = common::ValueType::TYPE_NULL;

for (const auto& col_name : stmt.columns()) {
bool found = false;
for (const auto& col : table_meta->columns) {
if (col.name == col_name) {
col_positions.push_back(col.position);
key_type = col.type;
found = true;
break;
}
}
if (!found) {
result.set_error("Column not found: " + col_name);
return result;
}
}

/* Update Catalog */
const oid_t index_id = catalog_.create_index(stmt.index_name(), table_meta->table_id,
col_positions, IndexType::BTree, stmt.unique());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject composite indexes until composite keys are actually supported.

This accepts multiple columns, but later in this function and the DML maintenance below only column_positions[0] is ever used as the physical key, while key_type is overwritten on each iteration. CREATE INDEX ... (a, b) therefore records a multi-column definition but builds a single-column B-tree, potentially with the wrong key type.

Suggested guard
     std::vector<uint16_t> col_positions;
     common::ValueType key_type = common::ValueType::TYPE_NULL;
+
+    if (stmt.columns().size() != 1) {
+        result.set_error("Only single-column indexes are supported");
+        return result;
+    }
 
     for (const auto& col_name : stmt.columns()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 282 - 303, The code currently
allows multiple columns in stmt.columns() but only uses column_positions[0] and
overwrites key_type, causing incorrect single-column index behavior; modify the
handler in the function that processes stmt.columns() (before building
col_positions/key_type and before calling catalog_.create_index) to reject
composite indexes: if stmt.columns().size() != 1 then call result.set_error with
a clear message (e.g., "Composite indexes not supported") and return result;
otherwise proceed and set col_positions and key_type from the single column only
(ensuring key_type is taken from that one column) before calling
catalog_.create_index(index_name(), table_meta->table_id, col_positions,
IndexType::BTree, stmt.unique()).

Comment on lines +317 to +330
/* Populate Index with existing data */
Schema schema;
for (const auto& col : table_meta->columns) {
schema.add_column(col.name, col.type);
}
storage::HeapTable table(stmt.table_name(), bpm_, schema);
auto iter = table.scan();
storage::HeapTable::TupleMeta meta;
while (iter.next_meta(meta)) {
if (meta.xmax == 0) {
/* Extract key from tuple */
if (!col_positions.empty()) {
const common::Value& key = meta.tuple.get(col_positions[0]);
index.insert(key, iter.current_id());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fail the statement when secondary-index writes fail.

Every storage::BTreeIndex::insert/remove here returns bool, but the result is ignored. A failed backfill or DML-side index update leaves the heap and index out of sync, and execute_create_index can still return success with a partially populated index. The insert/delete/update maintenance paths need the same guard, ideally behind a shared helper.

Example fix for the backfill path
-                index.insert(key, iter.current_id());
+                if (!index.insert(key, iter.current_id())) {
+                    static_cast<void>(index.drop());
+                    static_cast<void>(catalog_.drop_index(index_id));
+                    result.set_error("Failed to populate index");
+                    return result;
+                }

Also applies to: 399-407, 498-508, 583-612

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 317 - 330, The code ignores the
boolean return value of storage::BTreeIndex::insert/remove during the index
backfill and DML maintenance paths (e.g., in execute_create_index where the
backfill loop calls index.insert, and the delete/update paths around the other
noted regions); modify those call sites to check the returned bool and abort the
statement (propagate an error/return failure) if any index write fails.
Implement a small shared helper (e.g., bool apply_index_write(BTreeIndex&, const
common::Value& key, tuple_id_t id, Operation op)) used by execute_create_index
and the insert/delete/update maintenance code to perform the insert/remove, log
the failure with context, and return false so the caller can fail the statement
and avoid heap/index divergence.

static_cast<void>(send(client_fd, ready.data(), ready.size(), 0));

// 2. Query Loop
executor::QueryExecutor exec(catalog_, bpm_, lock_manager_, transaction_manager_);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Rollback any open transaction on disconnect.

Making exec connection-scoped means an explicit transaction can now survive across statements. If the client drops the socket before COMMIT/ROLLBACK, handle_connection() just destroys exec; include/executor/query_executor.hpp:73 and src/executor/query_executor.cpp:176-198 show there is no destructor cleanup path, so the transaction stays registered in TransactionManager and keeps its locks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/network/server.cpp` at line 338, The connection-scoped QueryExecutor
(executor::QueryExecutor exec(...)) lets an explicit transaction outlive
statements and leaks if the client disconnects; modify handle_connection() so
that on socket disconnect/connection teardown you detect any active transaction
on exec and explicitly abort/rollback it (e.g., call a new or existing method
such as QueryExecutor::rollbackActiveTransaction() or invoke
TransactionManager's abort/rollback using the executor's current txn id), or
alternatively make exec statement-scoped instead of connection-scoped; locate
the instantiation "executor::QueryExecutor exec(catalog_, bpm_, lock_manager_,
transaction_manager_)" and add cleanup logic on all disconnect/error paths to
call rollback on the active transaction (or add a QueryExecutor destructor that
performs this rollback and ensure it's invoked on drop).

Comment on lines +136 to +145
} else if (consume(TokenType::Right)) {
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Right;
} else if (consume(TokenType::Full)) {
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Full;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Accept the optional OUTER keyword here.

These branches require JOIN immediately after RIGHT/FULL, so valid forms like RIGHT OUTER JOIN and FULL OUTER JOIN still fail to parse even though TokenType::Outer already exists.

Suggested fix
             } else if (consume(TokenType::Right)) {
+                static_cast<void>(consume(TokenType::Outer));
                 if (!consume(TokenType::Join)) {
                     return nullptr;
                 }
                 join_type = SelectStatement::JoinType::Right;
             } else if (consume(TokenType::Full)) {
+                static_cast<void>(consume(TokenType::Outer));
                 if (!consume(TokenType::Join)) {
                     return nullptr;
                 }
                 join_type = SelectStatement::JoinType::Full;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if (consume(TokenType::Right)) {
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Right;
} else if (consume(TokenType::Full)) {
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Full;
} else if (consume(TokenType::Right)) {
static_cast<void>(consume(TokenType::Outer));
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Right;
} else if (consume(TokenType::Full)) {
static_cast<void>(consume(TokenType::Outer));
if (!consume(TokenType::Join)) {
return nullptr;
}
join_type = SelectStatement::JoinType::Full;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/parser.cpp` around lines 136 - 145, The parser currently demands
JOIN immediately after consuming TokenType::Right or TokenType::Full so forms
like "RIGHT OUTER JOIN" fail; update the branches handling TokenType::Right and
TokenType::Full in parser.cpp (the code using consume(...) and setting join_type
to SelectStatement::JoinType::Right/Full) to optionally accept TokenType::Outer
before requiring TokenType::Join — i.e., after consuming Right/Full, call
consume(TokenType::Outer) if present (ignore its return if false) and then
ensure consume(TokenType::Join) succeeds, returning nullptr on failure as
before.

Comment on lines 180 to +188
case UndoLog::Type::DELETE:
/* TODO: Implement DELETE undo */
static_cast<void>(0);
static_cast<void>(table.undo_remove(log.rid));
break;
case UndoLog::Type::UPDATE:
/* TODO: Implement UPDATE undo */
static_cast<void>(1);
/* For update, we physically remove the new version and reset xmax on the old one */
static_cast<void>(table.physical_remove(log.rid));
if (log.old_rid.has_value()) {
static_cast<void>(table.undo_remove(log.old_rid.value()));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't swallow undo failures during abort.

table.undo_remove() and table.physical_remove() both return bool, but this code drops the result and continues. If any undo step fails, the transaction is still marked aborted after a partial rollback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/transaction/transaction_manager.cpp` around lines 180 - 188, The
undo/rollback code is ignoring the bool results from table.undo_remove(...) and
table.physical_remove(...), so failures are swallowed; capture each call’s
boolean return and propagate failure instead of discarding it (e.g., store the
result of table.undo_remove(log.rid) and table.physical_remove(log.rid), check
for false, log/record the failure and abort the rollback routine or return false
to the caller). Update the handling in the UndoLog::Type::DELETE and
UndoLog::Type::UPDATE branches to check the return values for
table.undo_remove(log.rid), table.physical_remove(log.rid) and
table.undo_remove(log.old_rid.value()), and ensure a failed undo causes the
abort path to stop and report failure rather than continuing silently.

Comment on lines 183 to +188
case UndoLog::Type::UPDATE:
/* TODO: Implement UPDATE undo */
static_cast<void>(1);
/* For update, we physically remove the new version and reset xmax on the old one */
static_cast<void>(table.physical_remove(log.rid));
if (log.old_rid.has_value()) {
static_cast<void>(table.undo_remove(log.old_rid.value()));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

UPDATE rollback does not repair indexes.

src/executor/query_executor.cpp:595-626 inserts index entries for new_tid before this undo record is written. This rollback branch only removes the new heap tuple and undeletes the old heap version; it never removes the new index entry, so indexed reads can still return a RID that was rolled back.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/transaction/transaction_manager.cpp` around lines 183 - 188, The UPDATE
undo branch (UndoLog::Type::UPDATE) removes the new heap tuple and undeletes the
old one but never updates indexes, leaving stale index entries for log.rid; fix
it by invoking the index cleanup/repair APIs around the existing calls (e.g.,
call the index manager to remove index entries for log.rid before/after
table.physical_remove(log.rid)) and, if log.old_rid.has_value(), re-insert or
restore index entries for log.old_rid (using the project's index insertion API)
so indexes match the restored heap; update the code around table.physical_remove
and table.undo_remove to call those index functions (use the project’s existing
methods such as remove_index_entries_for_rid and insert_index_entries_for_tuple
or their equivalents).

Comment on lines +47 to +52
# 7. Limit greater than available rows
query IT
SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10;
----
4
5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This case won't catch an OFFSET ... LIMIT ... bug.

After OFFSET 3, only two rows remain, so the result is 4, 5 whether LIMIT 10 is applied or silently ignored. That makes this test too weak to validate the offset-before-limit path.

Suggested fix
-SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10;
+SELECT id FROM lim_off_test ORDER BY id OFFSET 1 LIMIT 1;
 ----
-4
-5
+2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 7. Limit greater than available rows
query IT
SELECT id FROM lim_off_test ORDER BY id OFFSET 3 LIMIT 10;
----
4
5
# 7. Limit greater than available rows
query IT
SELECT id FROM lim_off_test ORDER BY id OFFSET 1 LIMIT 1;
----
2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/logic/limit_offset.slt` around lines 47 - 52, The test uses "OFFSET 3
LIMIT 10" which doesn't validate offset-before-limit because only two rows
remain so LIMIT 10 is non-binding; change the query in
tests/logic/limit_offset.slt to use a LIMIT that is smaller than the remaining
rows (for example replace "OFFSET 3 LIMIT 10" with "OFFSET 3 LIMIT 1") and
update the expected output from "4\n5" to just "4" so the test will fail if
OFFSET is applied after LIMIT.

@poyrazK poyrazK closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant