Skip to content

fix: deduplicate BCOS directory project loads#5732

Open
ItsOtherMauridian wants to merge 1 commit into
Scottcjn:mainfrom
ItsOtherMauridian:fix-bcos-directory-project-upsert
Open

fix: deduplicate BCOS directory project loads#5732
ItsOtherMauridian wants to merge 1 commit into
Scottcjn:mainfrom
ItsOtherMauridian:fix-bcos-directory-project-upsert

Conversation

@ItsOtherMauridian
Copy link
Copy Markdown

Summary

Fixes #5731.

load_projects_from_json() used INSERT OR REPLACE, but the projects table only had an autoincrement id primary key. Re-running the JSON loader for the same github_repo therefore appended duplicate BCOS directory rows instead of replacing/updating the existing project entry.

This PR:

  • deduplicates existing projects rows by github_repo during init_db() before adding the constraint;
  • adds a unique index on projects(github_repo) so future INSERT OR REPLACE calls have a real conflict target;
  • adds regression coverage for repeated loads and pre-existing duplicate cleanup.

Validation

PYTHONPATH=/tmp/rustchain-flask-deps:. python3 -B -m pytest -q tests/test_bcos_directory_security.py --tb=short -p no:cacheprovider
# 6 passed

PYTHONPATH=/tmp/rustchain-flask-deps:. python3 -B -m py_compile bcos_directory.py tests/test_bcos_directory_security.py
# passed

Note: local environment needed Flask supplied via a temporary dependency target because the base runner here does not have Flask installed globally.

@github-actions github-actions Bot added BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) tests Test suite changes size/M PR: 51-200 lines labels May 19, 2026
Copy link
Copy Markdown

@TJCurnutte TJCurnutte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved — this fixes the duplicate BCOS directory load path rather than only adding test coverage.

Validation I ran against head 04d278c9574c50b66eba607d7a66ce5f4a3b714c:

  • git diff --check origin/main...HEAD -- bcos_directory.py tests/test_bcos_directory_security.py
  • python3 -B -m py_compile bcos_directory.py tests/test_bcos_directory_security.py
  • PYTHONPATH=. python3 -B -m pytest -q tests/test_bcos_directory_security.py --tb=short -p no:cacheprovider6 passed in 0.05s
  • Direct SQLite origin-main-vs-PR probe: origin/main loaded the same data/projects.json twice into 2 rows for Scottcjn/Rustchain and had no unique index; this PR loaded it into 1 row and created unique idx_projects_github_repo.
  • Pre-existing duplicate cleanup probe: origin/main left both old and new rows; this PR kept the latest row (latest_sha='new') and then created the unique index without error.

Reasoning: the previous INSERT OR REPLACE had no conflict target because only the autoincrement id was unique, so repeated JSON loads appended duplicate project rows. Adding a unique index on github_repo, after a one-time cleanup of already-duplicated rows, gives the loader a real conflict target and keeps the migration safe for existing DBs. I do not see a blocker in the changed path.

Copy link
Copy Markdown
Contributor

@jaxint jaxint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work on this PR. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) size/M PR: 51-200 lines tests Test suite changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: BCOS directory project loader duplicates rows because INSERT OR REPLACE has no unique key

3 participants