Skip to content

[bugfix] include image mode and size in tmp image cache key#9605

Open
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/image-cache-hash-include-dimensions
Open

[bugfix] include image mode and size in tmp image cache key#9605
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/image-cache-hash-include-dimensions

Conversation

@he-yufeng

Copy link
Copy Markdown

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Fixes #9360.

Template._save_pil_image() keyed the temp image cache on sha256(image.tobytes()). Image.tobytes() returns only the flattened pixel stream, without the image mode, width, or height. Two images that share the same pixel bytes but differ in shape (e.g. 120x80 and 80x120) therefore hash to the same cache path. Since the method skips saving when the path already exists, the second image silently reuses the first image's PNG, so multimodal inference/training can read the wrong image.

This includes the mode and size in the hash input so images with different dimensions get distinct cache files. Behavior for any single image is unchanged (the file is still written once and reused on repeat).

Experiment results

Added test_save_pil_image_dimension_collision in tests/general/test_template.py: it builds two RGB images with identical pixel bytes but transposed dimensions, saves both, and asserts the cache paths differ and each saved file keeps its own size. The test fails on the previous hash and passes after this change.

$ python -m pytest tests/general/test_template.py::test_save_pil_image_dimension_collision -q
1 passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request prevents cache collisions for images that share the same flattened pixel bytes but differ in mode or dimensions by prepending metadata (mode, width, and height) to the image bytes before hashing. A unit test has been added to verify this fix. The reviewer suggested using incremental hashing with hasher.update() to avoid unnecessary memory overhead from copying the entire image byte stream during concatenation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread swift/template/base.py
Comment on lines +817 to +818
meta = f'{image.mode}-{image.width}x{image.height}-'.encode()
img_hash = hashlib.sha256(meta + img_bytes).hexdigest()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Concatenating meta + img_bytes creates a new bytes object in memory, which copies the entire image byte stream. For large images, this can lead to unnecessary memory overhead and performance degradation. Instead, you can update the hash incrementally using hasher.update() to avoid this extra memory allocation.

Suggested change
meta = f'{image.mode}-{image.width}x{image.height}-'.encode()
img_hash = hashlib.sha256(meta + img_bytes).hexdigest()
meta = f'{image.mode}-{image.width}x{image.height}-'.encode()
hasher = hashlib.sha256(meta)
hasher.update(img_bytes)
img_hash = hasher.hexdigest()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image Cache Hash Collision via Missing Dimension Metadata

1 participant