Skip to content

feat(app-deployments): improve app:create performance with v2 format#7565

Open
adambenhassen wants to merge 24 commits intomainfrom
adam/improve-app-deployment-perf
Open

feat(app-deployments): improve app:create performance with v2 format#7565
adambenhassen wants to merge 24 commits intomainfrom
adam/improve-app-deployment-perf

Conversation

@adambenhassen
Copy link
Copy Markdown
Collaborator

@adambenhassen adambenhassen commented Jan 27, 2026

Background

App deployment uploads can be slow when deploying many documents, especially when most documents haven't changed between versions. Currently, every document is re-uploaded on each deployment, even if it already exists from a previous version.

Description

Implements a new v2 storage format for app deployments that enables delta uploads: only uploading documents that don't already exist. This dramatically improves deployment speed for incremental updates.

Notable Changes

CLI:

  • Auto-detects v2 format when all hashes are SHA256 (no --format flag needed). Logs a message pointing to docs if hashes aren't SHA256.
  • --format flag is still available as an explicit override.
  • Delta optimization is built into createAppDeployment: existing hashes are returned in the mutation response, no separate query needed. Single round-trip.
  • --target is no longer required for delta optimization (server resolves from access token).
  • Added --showTiming flag to display per-batch timing breakdown (requires server-side FEATURE_FLAGS_APP_DEPLOYMENT_TIMINGS_ENABLED=1).

Server:

  • createAppDeployment accepts optional hashes input and returns existingHashes: enables delta uploads in a single round-trip.
  • appDeploymentDocumentHashes query on Target type accepts hashes input and returns only the matching subset (server-side filtering via ClickHouse).
  • Uses cSql.longArray for safe handling of large hash arrays.
  • Delta query includes documents from pending (not yet activated) deployments, fixing dedup on re-runs of app:create.
  • Prevents mixing v1 and v2 formats on the same deployment version.
  • Upload timings use generic label/duration pairs, gated behind FEATURE_FLAGS_APP_DEPLOYMENT_TIMINGS_ENABLED env var + client showTimings opt-in.
  • Renamed S3_UPLOAD_CONCURRENCY to APP_DEPLOYMENT_S3_UPLOAD_CONCURRENCY.
  • Format and hash manifest written during createAppDeployment (handles 100% and partial dedup)

CDN:

  • Format is encoded in the apps-enabled S3 key body (v1-inactive/v2-inactive during upload, v1/v2 after activation). CDN reads the format via GET (previously HEAD) and resolves documents directly: no more v2-then-v1 fallback.
  • V2 hash manifest written during activation (app-v2-manifest/{targetId}/{appName}/{appVersion}) for version isolation. CDN fetches document and manifest in parallel, rejects hashes not belonging to the requested version.
  • Manifest is written before the apps-enabled active flag to prevent a race condition.

Backwards Compatibility

  • Existing deployments with apps-enabled body '1' are treated as active v1.
  • Legacy v2 deployments without a hash manifest skip the version isolation check (breadcrumb logged for monitoring).
  • V1 format continues to work unchanged.

Security

  • V2 documents are now version-isolated: a hash from a retired version cannot be accessed via an active version's path. The CDN verifies each hash against a per-version manifest.

Limitations

  • documents are only deduped for published apps; not created ones, due to clickhouse's app_deployments table only containing activated deployments, so basically we can’t know which app_deployment_id values belong to certain target/appName for cross-version pending deployments. requires postgres data joining.

Benchmarks

All changes were benchmarked with the script provided in load-tests/app-deployments.

1000 Documents

Scenario Docs Uploaded Parse Validate Coords ClickHouse S3 Total
V1 1000 17ms 401ms 45ms 1790ms 8274ms 9.1s
V2 Initial 1000 10ms 276ms 48ms 936ms 7040ms 7.6s
V2 Delta 50 (950 skipped) 1ms 25ms 5ms 51ms 323ms 376ms

Format Comparison

Feature V1 V2 (SHA256)
Hash format Any alphanumeric SHA256 only
Hash validation Format only Format + content match
S3 key app/{target}/{name}/{version}/{hash} app-v2/{target}/{name}/{hash}
Cross-version dedup No Yes
Delta uploads No Yes
Version isolation Built-in (version in key) Hash manifest per version

Integration tests added

  • v2 format rejects non-sha256 hashes
  • v2 format rejects sha256 hash that does not match document content
  • v2 format accepts sha256 hash with sha256: prefix
  • v1 format (legacy) accepts any hash format
  • v2 format accepts sha256 hashes and CDN access works
  • appDeploymentDocumentHashes returns existing hashes for delta upload
  • v2 format prevents hash collision by rejecting non-sha256 hashes
  • v2 format enables cross-version document sharing
  • v1 format documents are accessible via CDN using format from apps-enabled key
  • appDeploymentDocumentHashes returns empty array for app with no previous documents
  • rejects mixing v1 and v2 document formats on the same deployment
  • re-running app:create deduplicates documents from the pending deployment
  • v2 version isolation: CDN rejects hash not belonging to the requested version
  • CDN uses format from apps-enabled key to resolve documents without fallback
  • v2 deployment with 100% dedup still resolves correctly via CDN
  • v2 deployment with partial dedup resolves both shared and new hashes

Manual e2e tests ran

CLI:

  • v2 auto-detection: detected sha256 hashes, used v2 format
  • Initial upload: 2 new documents, 0 skipped
  • Re-run dedup (same version): all documents already exist, nothing uploaded
  • Publish: published successfully
  • Cross-version dedup: 1 new, 2 skipped (shared with previous version)
  • Explicit v1 format: uploaded without auto-detect message
  • v1 auto-detection (non-sha256 hashes): detected as v1, logged docs link
  • Format mixing prevention: rejected v2 upload on v1 deployment
  • 100% dedup (v3.0.0): all docs exist, nothing uploaded, publish succeeds
  • 100% dedup (v4.0.0): all docs exist, nothing uploaded, publish succeeds
  • Timing output only when flag is enabled
  • Hash content mismatch: rejected with clear error shoing expected vs provided

CDN:

  • v2 document accessible via correct version: 200
  • Hash from another version rejected (version isolation): 404
  • Shared hash accessible via both versions: 200
  • Hash not belonging to version rejected: 404
  • v1 inactive deployment not accessible: 404
  • v2 inactive deployment not accessible: 404
  • Backward compat (pre-format-encoding deployment): 200
  • v1 document accessible via CDN after publish: 200

To summarize what was tested:

  • Auto-detection (v1 and v2)
  • Delta uploads (same version, cross-version, 100% dedup)
  • Format mixing prevention
  • Hash content validation
  • CDN access (v1, v2, shared hashes)
  • Version isolation (reject hashes from other versions)
  • Inactive deployments rejected
  • Backward compatibility

Closes CONSOLE-1582

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @adambenhassen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance improvement for app deployments by implementing a new 'v2' storage format. This new format allows for efficient delta uploads, where only new or changed documents are uploaded, and enables cross-version deduplication, reducing storage and transfer overhead. The changes span across the CLI, API, and CDN worker, ensuring a seamless and faster deployment experience, especially for applications with many documents and frequent updates.

Highlights

  • New V2 Storage Format for App Deployments: Introduced a new 'v2' storage format that enables delta uploads and cross-version deduplication for app deployment documents, significantly improving performance for incremental updates. This format requires SHA256 hashes and includes content validation.
  • CLI Enhancements for V2 Format: The CLI's app:create command now supports a --format flag (defaulting to 'v1', with 'v2' for delta uploads). When using 'v2', the CLI queries existing document hashes and skips re-uploading unchanged documents. A --showTiming flag was also added to display detailed performance breakdowns.
  • Parallel Data Ingestion: Improved the ingestion pipeline by implementing parallel insertion of documents into ClickHouse and S3, further boosting deployment speed. The default S3_UPLOAD_CONCURRENCY environment variable was increased to 100.
  • CDN Backward Compatibility: The CDN logic has been updated to first attempt fetching documents using the new 'v2' key format and gracefully fall back to the 'v1' key format if the 'v2' key is not found, ensuring compatibility with existing deployments.
  • Enhanced Hash Validation: The 'v2' format enforces strict SHA256 hash validation, including a check that the provided hash matches the document's content, preventing hash collisions and ensuring data integrity. The 'v1' format retains its legacy behavior of accepting any alphanumeric hash.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new V2 storage format for app deployments, significantly improving performance through delta uploads and parallel processing. The changes include new CLI flags for format selection and timing, robust SHA256 hash validation, and optimized parallel uploads to S3 and ClickHouse. Extensive integration tests have been added to cover the new functionality, and a benchmark script with a detailed README provides clear performance comparisons. The changes are well-implemented and effectively address the stated performance goals. There is a minor style guide adherence opportunity regarding environment variable access in a utility script and a backend service provider.

Comment thread load-tests/app-deployments/run-benchmark.ts Outdated
@adambenhassen adambenhassen force-pushed the adam/improve-app-deployment-perf branch from d1672f8 to 2db60d1 Compare January 27, 2026 03:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 27, 2026

🚀 Snapshot Release (alpha)

The latest changes of this PR are available as alpha on npm (based on the declared changesets):

Package Version Info
@graphql-hive/cli 0.58.0-alpha-20260128114751-69c3012b7a3f4155c12d74c957774ed07b375362 npm ↗︎ unpkg ↗︎
hive 9.3.0-alpha-20260128114751-69c3012b7a3f4155c12d74c957774ed07b375362 npm ↗︎ unpkg ↗︎

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 27, 2026

📚 Storybook Deployment

The latest changes are available as preview in: https://pr-7565.hive-storybook.pages.dev

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 27, 2026

🐋 This PR was built and pushed to the following Docker images:

Targets: build

Platforms: linux/amd64

Image Tag: 69c3012b7a3f4155c12d74c957774ed07b375362

@github-actions
Copy link
Copy Markdown
Contributor

💻 Website Preview

The latest changes are available as preview in: https://pr-7565.hive-landing-page.pages.dev

@adambenhassen adambenhassen requested a review from n1ru4l January 28, 2026 11:44
@adambenhassen adambenhassen force-pushed the adam/improve-app-deployment-perf branch from 133d445 to 69c3012 Compare January 28, 2026 11:46
Comment thread docker/docker-compose.dev.yml
Comment thread docker/docker-compose.dev.yml
Copy link
Copy Markdown
Contributor

@n1ru4l n1ru4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments on the functionality, happy to discuss these a more if needed.

The main issue for me is the fact that operations are no longer scoped to the app version. I proposed a potential solution to the problem.

Aside from that, the user experience from the CLI standpoint could be improved, as it is desired that using sha256 hashed would become the default way people upload persisted documents.

Comment thread packages/libraries/cli/src/commands/app/create.ts Outdated
Comment thread packages/services/api/src/modules/app-deployments/module.graphql.ts Outdated
Comment thread packages/services/api/src/modules/app-deployments/module.graphql.ts Outdated
const response = await this.request({
key,
// Try v2 key first
const keyV2 = buildOperationS3BucketKeyV2(targetId, appName, hash);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying v2 first and then fallback back to v1, maybe we encode the format within the is app deployment active key to avoid the extra roundtrip?

Copy link
Copy Markdown
Collaborator Author

@adambenhassen adambenhassen Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds good. let's have that.

Update: The only problem is that apps-enabled is only written during activation, and at that point we don't know the document format. The format is only known during addDocumentToAppDeployment.

I propose the following approach:

  1. Upload: write v1-inactive or v2-inactive to apps-enabled
  2. Activation: overwrite with v1 or v2 (read existing value, strip -inactive, write back)
  3. CDN: HEAD -> GET, read body. If it contains -inactive or is 0 -> not active. If v1/v2/1 -> active, use format accordingly.

Backward compat: existing deployments have 1 -> treated as active v1.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check my proposed changes above with the latest changes: 5cc6655

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a better user experience, we could auto-detect whether the hashes are sha256 and then automatically use the new format for uploading. If the format is not identified as sha256 we could print some information about this (e.g. mention the speed improvements and pointing to our documentation that shows how to configure relay or codegen).

We can still keep the format parameter, so the user could enforce the old format if they need to, just in case.

Copy link
Copy Markdown
Collaborator Author

@adambenhassen adambenhassen Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this automates it. Also requires some changes to automatically resolve target as well, but it's worth it. Implementing.

Comment thread packages/services/api/src/modules/app-deployments/providers/app-deployments.ts Outdated
Comment thread packages/libraries/cli/src/commands/app/create.ts
Comment thread packages/libraries/cli/src/commands/app/create.ts
Comment thread packages/services/cdn-worker/src/artifact-storage-reader.ts Outdated
Comment thread packages/services/cdn-worker/src/artifact-storage-reader.ts Outdated
@adambenhassen adambenhassen force-pushed the adam/improve-app-deployment-perf branch from 8991e50 to dd7a95a Compare April 8, 2026 18:14
@adambenhassen
Copy link
Copy Markdown
Collaborator Author

Pending conflict resolution once all conversations are resolved.

@adambenhassen adambenhassen requested a review from n1ru4l April 8, 2026 19:44
Copy link
Copy Markdown
Contributor

@n1ru4l n1ru4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more comments on the implementation and gave some suggestions.

Aside from that I also noticed that skipped uploads documents are not visible via the Hive Console UI since we skip the Clickhouse insert, which would also kind of lead of versions with re-used documents not showing up in the conditional breaking change detection? 😅 Do I understand that correct?

Image

Seems like it might make sense to do some more architectural decissions and eventual database adjustments or maybe still not refrain from inserting the documents into clickhouse (and sorely skip the s3 put request) or re-insert from clickhouse to clickhouse 0 not sure about that?

Maybe we should step away on the code iteration on the PR for a moment have a bit of a more high-level RFC document, that makes it faster to iterate on while thinking about all involved components? Digging through the code to understand the implementation details and discover flaws is quite time consuming for me.

@@ -67,6 +67,7 @@ export class AppDeploymentsManager {
name: string;
version: string;
};
hashes?: readonly string[] | null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an assertion that makes the hashes property mandatory for the v2 format?

Copy link
Copy Markdown
Collaborator Author

@adambenhassen adambenhassen Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No separate format field exists on CreateAppDeploymentInput; hashes presence is the v2 signal. Without hashes, the code skips v2 writes and returns empty existingHashes, safely degrading to v1. The CLI enforces this: v2 always sends hashes, v1 sends undefined. Should I add an explicit format field to make the contract stricter?

EDIT: Added.

[this.s3[0].endpoint, this.s3[0].bucket, manifestKey].join('/'),
{ method: 'HEAD', aws: { signQuery: true } },
);
if (existingManifest.statusCode !== 200) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't there be an existing manifest at that stage? It should have been written there during app create according to the logic?

If we then create it only based on clickhouse - it would not contain all hashes so produce an invalid manifest?

So my question is, why do we have this logic here for falling back in the first place?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, with the format field now, from comment above, required on CreateAppDeploymentInput, v2 deployments always provide hashes at create time, so the manifest is always written during createAppDeployment. I'll remove the fallback and throw if the manifest is missing for a v2 activation, that surfaces the bug instead of silently producing an invalid manifest.

Comment on lines 207 to 217
input CreateAppDeploymentInput {
target: TargetReferenceInput
appName: String!
appVersion: String!
"""
Optional list of document hashes the client intends to upload.
If provided, the response will include the subset that already exist on the server,
enabling delta uploads (skip uploading documents that already exist).
"""
hashes: [String!]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand the implementation providing hashes indicates that this will be v2 (sha256) app deployment version. In that case, should this be documented more explicitly here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: added an explicit format field to CreateAppDeploymentInput

Comment on lines +137 to +141
"""
Given a list of document hashes, returns the subset that already exist for this app (across all versions).
Used for delta uploads - CLI can skip uploading documents that already exist.
"""
appDeploymentDocumentHashes(appName: String!, hashes: [String!]!): [String!]!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed if it there is now CreateAppDeploymentOk.existingHashes?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's redundant now indeed. removing.

Comment on lines 35 to 49
"""
Fields available for sorting app deployments.
"""
enum AppDeploymentsSortField {
CREATED_AT
ACTIVATED_AT
LAST_USED
}

"""
Sort configuration for app deployments.
Storage format for app deployment documents.
"""
input AppDeploymentsSortInput {
field: AppDeploymentsSortField!
direction: SortDirectionType!
enum AppDeploymentFormatType {
"""
V1 format: version-scoped storage. Each version stores documents separately.
Allows any hash format. No cross-version deduplication.
"""
V1
"""
V2 format: content-addressed storage with SHA256 hashes.
Enables cross-version deduplication and delta uploads.
"""
V2
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should we use more descriptive naming like CUSTOM/ARBITRARY, SHA256 instead of V1 and V2?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it makes more sense indeed

* Check the format of existing documents for a deployment by sampling one hash.
* Returns 'v1', 'v2', or null if no documents exist.
*/
async getDeploymentDocumentFormat(appDeploymentId: string): Promise<'v1' | 'v2' | null> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store the app deployment version format (whether it is using arbitrary hashes or sha256 hashes) on the pg record (app_deployments table) when initially created via Mutation.createAppDeployment?

That way on both createAppDeployment.addDocumentToAppDeployment and createAppDeployment.activeAppDeployments we avoid the roundtrip to s3 or clickhouse in order to compute the format.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I tended to avoid changes involving postgres. Implementing.

Comment on lines -409 to +415
if (
false ===
(await deps.isAppDeploymentActive(params.targetId, params.appName, params.appVersion))
) {
const deploymentStatus = await deps.getAppDeploymentStatus(
params.targetId,
params.appName,
params.appVersion,
);

if (!deploymentStatus.enabled) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The apps-enabled key can be repurposed to store more information at the same time:

  • whether the app is enabled
  • what format it contains (arbitary hashes or sha256 hashes)
  • which documents belong to this app version

That means for legacy versions created before these adjustments it would always contain "0" or "1" (or be non-existent). For new ones it could be a JSON object that contains the above information e.g.

{
  "isActive": true,
  // arbitrary/custom or sha256
  "hashFormat": "sha256",
  "documents": [
    "hash1",
    "hash2",
    "hash3",
    // ....
  ]
}

This information could then be sufficient for the CDN handler to figure out on how to resolve the documents and it would reduce the need for one more S3 read.

Furthermore, it can act as a foundation for other parallel ongoing efforts.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but out of scope for this PR. Merging status + format + hashes into one S3 key is a separate refactor.

Comment on lines +466 to +469
const existingManifest = await this.s3[0].client.fetch(
[this.s3[0].endpoint, this.s3[0].bucket, manifestKey].join('/'),
{ method: 'HEAD', aws: { signQuery: true } },
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we guarantee that all documents that were specified in the manifest were uploaded/are available?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently verify that. The manifest controls which hashes are allowed for version isolation, but if a document wasn't uploaded, the CDN returns 404 naturally. Adding an activation-time check (compare manifest hashes against uploaded documents) would catch incomplete deployments, but that's a separate concern from this PR.

@adambenhassen
Copy link
Copy Markdown
Collaborator Author

Aside from that I also noticed that skipped uploads documents are not visible via the Hive Console UI since we skip the Clickhouse insert, which would also kind of lead of versions with re-used documents not showing up in the conditional breaking change detection? 😅 Do I understand that correct?

Correct. in custom format, every document goes through addDocumentsToAppDeployment which inserts into clickhouse, so the document list and breaking change detection are always complete. With SHA256 dedup, the CLI skips uploading existing documents, so they never get clickhouse records for the new version. Which is the whole point of dedup.

Proposal: after getExistingDocumentHashes identifies deduped hashes, copy their clickhouse records (body, operation name, coordinates) to the new deployment ID via INSERT...SELECT. One clickhouse query during createAppDeployment, no client involvement. This gives SHA256 the same clickhouse behavior as custom format.

Maybe we should step away on the code iteration on the PR for a moment have a bit of a more high-level RFC document, that makes it faster to iterate on while thinking about all involved components? Digging through the code to understand the implementation details and discover flaws is quite time consuming for me.

I'd rather if you continue reviewing this PR to the completion of its scope. the remaining items are additive improvements, not flaws. The core functionality is covered by extensive tests and the review hasn't surfaced any critical issues. The clickhouse copy for deduped documents and the merged S3 key (status + format + hashes) can be follow-ups without blocking what's already working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants