Skip to content

Add skipDuplicates parameter on @entity directive#6458

Open
isum wants to merge 10 commits intomasterfrom
ion/skip-duplicates
Open

Add skipDuplicates parameter on @entity directive#6458
isum wants to merge 10 commits intomasterfrom
ion/skip-duplicates

Conversation

@isum
Copy link
Copy Markdown
Member

@isum isum commented Mar 26, 2026

Summary

Add skipDuplicates parameter to the @entity directive for immutable entities. @entity(immutable: true, skipDuplicates: true) skips duplicate inserts with warnings instead of failing indexing. This unblocks subgraph composition on Amp-powered subgraphs where SQL queries can produce the same entities across block ranges.

Changes

  • Parse and validate skipDuplicates from @entity directive (requires immutable: true)
  • Write-batch layer: warn and skip instead of failing on cross-block duplicate inserts
  • Store layer: warn and skip on unexpected update/delete for skip-duplicates entities
  • DB layer: ON CONFLICT (id) DO NOTHING for skip-duplicates INSERT queries, with affected-rows logging
  • Runner test for end-to-end validation

Default behavior (skipDuplicates: false) is unchanged - duplicates still fail indexing.

isum added 4 commits March 26, 2026 16:02
Add kw::SKIP_DUPLICATES constant, skip_duplicates bool field to ObjectType,
and parsing logic in ObjectType::new() defaulting to false when absent.
Added SkipDuplicatesRequiresImmutable error variant, bool_arg validation
for skipDuplicates in validate_entity_directives(), and three test functions
covering non-boolean value, mutable entity, and timeseries+skipDuplicates.
Add TypeInfo::skip_duplicates(), InputSchema::skip_duplicates(), and
EntityType::skip_duplicates() following the is_immutable() three-layer
delegation pattern. Object types return immutable && skip_duplicates;
Interface and Aggregation types return false.
Add skip_duplicates: bool field to RowGroup struct alongside immutable,
update RowGroup::new() to accept the parameter, and wire it from
entity_type.skip_duplicates() in RowGroups::group_entry(). All other
call sites (RowGroupForPerfTest, test helpers, example) pass false.
@isum isum self-assigned this Mar 26, 2026
@fordN fordN requested a review from lutter March 26, 2026 15:51
isum added 6 commits March 26, 2026 19:31
Modify RowGroup::append_row() so that when immutable=true and
skip_duplicates=true, cross-block duplicate inserts and
Overwrite/Remove operations log warnings and return Ok(()) instead
of failing. Same-block duplicates remain allowed. Default behavior
(skip_duplicates=false) is preserved exactly.

Added Logger field to RowGroup/RowGroups with CacheWeight impl,
threaded through all construction sites. 5 unit tests covering all
scenarios.
Added skip_duplicates: bool field to Table struct in relational.rs,
wired from EntityType::skip_duplicates() in Table::new(), copied in
Table::new_like(), and defaulted to false in make_poi_table().
Added logger parameter to Layout::update() and Layout::delete() in
relational.rs. When table.immutable && table.skip_duplicates, these
methods now log a warning and return Ok(0) instead of returning an
error. Default immutable behavior (skip_duplicates=false) is preserved.
Updated all callers including deployment_store.rs and test files.
Added 4 unit tests with SkipDupMink entity type to verify both
skip_duplicates and default immutable behavior.
Add conditional ON CONFLICT (primary_key) DO NOTHING clause to
InsertQuery::walk_ast() when table.immutable && table.skip_duplicates.
This handles cross-batch duplicates where an entity committed in a
previous batch is re-inserted due to the immutable entity cache
skipping store lookups.

Two unit tests verify: skip_duplicates inserts include ON CONFLICT,
default immutable inserts do not.
Restructured Layout::insert() from if-let-Err to match to capture
affected_rows from InsertQuery::execute(). Tracks expected vs actual
row counts across both the main batch path and the row-by-row fallback
path. Logs a warning when affected_rows < expected for skip_duplicates
immutable tables.
Add end-to-end runner test exercising @entity(immutable: true,
skipDuplicates: true) with duplicate entity inserts across blocks.

- tests/runner-tests/skip-duplicates/: subgraph with Ping entity using
  skipDuplicates, block handler that saves same ID every block
- tests/tests/runner_tests.rs: skip_duplicates() test syncs 4 blocks
  and verifies entity is queryable (indexing did not fail)
@isum isum force-pushed the ion/skip-duplicates branch from 3a426e4 to 78c451c Compare March 26, 2026 17:31
Copy link
Copy Markdown
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider here, too, is that there is really a strict hierarchy of mutability: mutable entities, immutable entities, and immutable with skipUpdates entities. That's currently expressed with 2 booleans. It might be clearer to express that with an enum

warn!(self.logger, "Skipping duplicate insert for immutable entity";
"entity" => row.key().to_string(),
"block" => row.block(),
"previous_block" => prev.block());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be a warning; it's what the user asked for. I actually don't think we should log here and below and can therefore avoid giving a logger to the RowGroup. It seems these logs are mostly good during development to make sure what we expect to happen does happen.

if self.skip_duplicates {
warn!(self.logger, "Skipping unsupported operation for immutable entity";
"entity_type" => self.entity_type.to_string(),
"operation" => format!("{:?}", row));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not worth a warning either - it is expected behavior


fn skip_duplicates(&self) -> bool {
match self {
TypeInfo::Object(obj_type) => obj_type.immutable && obj_type.skip_duplicates,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we should have checked enough that we don't need the obj_type.immutable guard

}

#[test]
fn validate_entity_directives_skip_duplicates_non_boolean() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a whole setup where you can just create test schemas and say what result validation should provide in schema/test_schemas. Just add these as new test cases there. It's all driven by the badly named agg() test.

"expected_rows" => expected_rows,
"affected_rows" => affected_rows,
"skipped" => expected_rows - affected_rows);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't log this as a warning; maybe as a debug! log - there's nothing wrong, and we don't need to alarm anybody with this log. The message should make it clear that there is no reason to worry.

let ids = group.ids().join(", ");
warn!(logger, "Skipping immutable entity update in store layer";
"entity_type" => group.entity_type.to_string(),
"ids" => ids);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this log

let ids = group.ids().join(", ");
warn!(logger, "Skipping immutable entity delete in store layer";
"entity_type" => group.entity_type.to_string(),
"ids" => ids);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants