SDSTOR-21989: check index_table existence before using it by JacksonYao287 · Pull Request #409 · eBay/HomeObject

JacksonYao287 · 2026-05-06T01:50:06Z

if homeobject crashes after index_table is destroyed in destroy_pg and before the pg is created in baseline resync, we can not find pg index table when restarts. so we need add a sanity check to see if index_table exists. if not, skip refreshing pg metrics

codecov-commenter · 2026-05-06T04:09:35Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 31.57895% with 13 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (stable/v4.x@2de356a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/lib/homestore_backend/hs_pg_manager.cpp	33.33%	6 Missing and 2 partials ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp	28.57%	3 Missing and 2 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@              Coverage Diff               @@
##             stable/v4.x     #409   +/-   ##
==============================================
  Coverage               ?   52.71%           
==============================================
  Files                  ?       36           
  Lines                  ?     5332           
  Branches               ?      662           
==============================================
  Hits                   ?     2811           
  Misses                 ?     2225           
  Partials               ?      296

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xiaoxichen · 2026-05-07T04:17:00Z

In general LGTM

During recovery, should we better handling destroyed PG? Currently we only partially handled the case that Index is destroyed on destroyed PG. (

HomeObject/src/lib/homestore_backend/hs_pg_manager.cpp

Lines 951 to 955 in 77ba5c6

    
           } else { 
        
               RELEASE_ASSERT(hs_pg->pg_sb_->state == PGState::DESTROYED, "IndexTable should be recovered before PG"); 
        
               hs_pg->index_table_ = nullptr; 
        
               LOGI("Index table not found for destroyed pg={}, index_table_uuid={}", pg_id, uuid_str); 
        
           }

)

I think we should redo the HSHomeObject::pg_destroy during recovery.

the check index_table logic spread across the code, is it better to add an accessor into HS_PG? which checks the existence of index and logging (if the PG is destroyed) or assert.

Besroy · 2026-05-07T07:21:06Z

Agree, checking the PG state seems safer and cleaner. @yuwmao @koujl Do you recall the previous solution for handling crashes during destroy_pg? Perhaps we can integrate it with this fix as a unified solution

JacksonYao287 · 2026-05-08T03:50:56Z

agree, we need to redo pg_destroy to make sure all the pg resource are cleared before pg is recreated by BR or permanently destroyed. cc @yuwmao @koujl if you want to do this, feel free to close this PR.

yuwmao · 2026-05-08T15:21:47Z

No, I haven't met the destroy_pg crash issue before in my memory. But also agree on redo pg_destroy.

Agree, checking the PG state seems safer and cleaner. @yuwmao @koujl Do you recall the previous solution for handling crashes during destroy_pg? Perhaps we can integrate it with this fix as a unified solution

JacksonYao287 · 2026-05-11T08:01:53Z

I have added an accessor to index_table. redo pg destory will come in a later separate PR

Besroy · 2026-05-11T08:42:06Z

-    auto pg_index_table = hs_pg->index_table_;
-    RELEASE_ASSERT(pg_index_table, "Index table not found for PG pg_id={}", pg_id);
+    auto pg_index_table = hs_pg->get_index_table();
+    if (!pg_index_table) { return false; }


If the PG was destroyed, will GC/BR/put_blob/get_blob operations still occur after a restart? If not, can we limit the changes to refresh_pg_statistics?
The reason is when I reviewing the replace_blob_index caller, I noticed that returning false causes a progress assertion failure, as do most functions that access it after retrieval.

sorry, I don`t fully get the point, can you pls put more details here?

actually, assertion failure is necessary. there are two case that false will be returned.

1 pg index table is not found. this should not happen since if there is an on-going gc task, pg can not be destroyed. pls refer to drain_pg_pending_gc_task, which will wait until all the gc task of this pg is completed

2 failed to update pg index table. this means partial updated(some are updated successfully and others not) probably happens and it is very dangerous since data loss will happen if we don`t assert.

If the PG was destroyed, will GC/BR/put_blob/get_blob operations still occur after a restart

pg is destroyed in 3 cases: br (only at follower), destroy repl_dev(all member) and replace_member(only at out member)

if br case , pg will be recreated again, and thus GC/BR/put_blob/get_blob operations probably occurs after restart.

What I mean is: keep the existing assert logic in those places unchanged, because:

Correct me if I’m wrong, but GC / put_blob / get_blob are not expected to run on a destroyed PG. BR will destroy the PG again, so this case shouldn’t happen at that step either. This scenario would only show up during a restart when the PG state is inconsistent

Even if you get nullptr in those places, you should still keep the original asserts. Because even if you don’t assert there, we will still assert later based on the return value, or we’ll crash when dereferencing a null pointer anyway

@Besroy , done, ptal

xiaoxichen

lgtm considering this is a temporary fix

JacksonYao287 requested a review from Besroy May 6, 2026 01:50

JacksonYao287 force-pushed the 21989 branch 2 times, most recently from 0a92251 to dca49f2 Compare May 6, 2026 03:26

JacksonYao287 force-pushed the 21989 branch from dca49f2 to a46fd5c Compare May 6, 2026 06:34

JacksonYao287 requested review from koujl and yuwmao May 8, 2026 03:54

check index_table existence before using it

d18a9cf

JacksonYao287 force-pushed the 21989 branch from a46fd5c to fca6548 Compare May 11, 2026 07:37

JacksonYao287 requested a review from xiaoxichen May 11, 2026 08:01

Besroy reviewed May 11, 2026

View reviewed changes

JacksonYao287 force-pushed the 21989 branch from fca6548 to 7848a6b Compare May 11, 2026 15:28

fix comments

2c46bfb

JacksonYao287 force-pushed the 21989 branch from 7848a6b to 2c46bfb Compare May 13, 2026 00:46

JacksonYao287 requested a review from Besroy May 13, 2026 00:47

JacksonYao287 mentioned this pull request May 14, 2026

SDSTOR-21408 : refine index_kv constructor #414

Open

xiaoxichen approved these changes May 14, 2026

View reviewed changes

JacksonYao287 merged commit f604993 into eBay:stable/v4.x May 14, 2026
25 checks passed

JacksonYao287 deleted the 21989 branch May 14, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDSTOR-21989: check index_table existence before using it#409

SDSTOR-21989: check index_table existence before using it#409
JacksonYao287 merged 2 commits into
eBay:stable/v4.xfrom
JacksonYao287:21989

JacksonYao287 commented May 6, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 6, 2026 •

edited

Loading

Uh oh!

xiaoxichen commented May 7, 2026

Uh oh!

Besroy commented May 7, 2026 •

edited

Loading

Uh oh!

JacksonYao287 commented May 8, 2026 •

edited

Loading

Uh oh!

yuwmao commented May 8, 2026

Uh oh!

JacksonYao287 commented May 11, 2026

Uh oh!

Besroy May 11, 2026 •

edited

Loading

Uh oh!

JacksonYao287 May 11, 2026

Uh oh!

JacksonYao287 May 11, 2026

Uh oh!

Besroy May 12, 2026

Uh oh!

JacksonYao287 May 13, 2026 •

edited

Loading

Uh oh!

xiaoxichen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

JacksonYao287 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xiaoxichen commented May 7, 2026

Uh oh!

Besroy commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JacksonYao287 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuwmao commented May 8, 2026

Uh oh!

JacksonYao287 commented May 11, 2026

Uh oh!

Besroy May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Besroy May 12, 2026

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JacksonYao287 commented May 6, 2026 •

edited

Loading

codecov-commenter commented May 6, 2026 •

edited

Loading

Besroy commented May 7, 2026 •

edited

Loading

JacksonYao287 commented May 8, 2026 •

edited

Loading

Besroy May 11, 2026 •

edited

Loading

JacksonYao287 May 13, 2026 •

edited

Loading