Skip to content

Conversation

@usamoi
Copy link
Contributor

@usamoi usamoi commented Dec 14, 2025

Used to implement the _mm512_shuffle_epi8 intrinsic.

@rustbot
Copy link
Collaborator

rustbot commented Dec 14, 2025

Thank you for contributing to Miri! A reviewer will take a look at your PR, typically within a week or two.
Please remember to not force-push to the PR branch except when you need to rebase due to a conflict or when the reviewer asks you for it.

@rustbot rustbot added the S-waiting-on-review Status: Waiting for a review to complete label Dec 14, 2025
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Since this just generalizes existing operations, this seems reasonable. I may become a bit hesitant if we start to add avx512-exclusive operations...

View changes since this review

/// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_shuffle_epi8>
/// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_shuffle_epi8>
/// <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_shuffle_epi8>
fn pshufb<'tcx>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an explanation of what the types here are? Are these all u8 vectors?

Copy link
Contributor Author

@usamoi usamoi Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has currently been changed to:

/// Shuffles bytes from `left` using `right` as pattern.
///
/// `left` and `right` are both vectors of type `len` x i8. Only bits 0, 1, 2, 3 and 7 of each byte of
/// `right` matter; if bit 7 of each byte of `right` is set, the value of `dest` at the corresponding
/// byte will be set to 0.
///
/// Each 128-bit block is shuffled independently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also say what the other 4 bits do then. I guess it's something like this:
The first four bytes of right at index i indicate which of the left values from the same 16-element block is used for index i in dest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's

/// Shuffles bytes from `left` using `right` as pattern. Each 16-byte block is shuffled independently.
///
/// `left` and `right` are both vectors of type `len` x i8.
///
/// If the highest bit of a byte in `right` is not set, the corresponding byte in `dest` is taken from
/// same 16-byte block of `left` at the position indicated by the lowest 4 bits of this byte in `right`.
/// If the highest bit of a byte in `right` is set, the corresponding byte in `dest` is set to `0`.


let res = if right & 0x80 == 0 {
// Shuffle each 128-bit (16-byte) block independently.
let j = u64::from(right % 16).strict_add(i & !15);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the i & !15 here the same as i / 16? If yes, think that would be more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It's i / 16 * 16.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right.

Suggested change
let j = u64::from(right % 16).strict_add(i & !15);
let block_start = i & !15; // round down to previous multiple of 16
let j = block_start.strict_add((right % 16).into());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check this on real hardware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've tested intrinsics-x86-avx512.rs on real hardware. I also ran miri on my real cases.

@rustbot rustbot removed the S-waiting-on-review Status: Waiting for a review to complete label Dec 14, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 14, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rustbot rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Dec 14, 2025
@usamoi
Copy link
Contributor Author

usamoi commented Dec 15, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Dec 15, 2025
Comment on lines 155 to 158
let b = _mm512_set_epi8(-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using some indices other than 1 also seems like a good idea. In particular, please ensure the "wrap-around" is tested by also checking 127.

Why does index 1 read "14" for the first block? It seems to be indexing from the right...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does index 1 read "14" for the first block?

_mm512_set_epi8 sets the elements of a vector in reverse order. To set the elements in forward order, _mm512_setr_epi8 should be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow... what an odd choice for an API.

@RalfJung
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: Waiting for the PR author to address review comments and removed S-waiting-on-review Status: Waiting for a review to complete labels Dec 15, 2025
@usamoi
Copy link
Contributor Author

usamoi commented Dec 15, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Dec 15, 2025
@RalfJung
Copy link
Member

Thanks! I have done some more tweaking to the comments.

I'd appreciate if you could also submit a PR to stdarch to update the test there as well -- it's generally preferable to the tests to be in sync between the two projects.

@RalfJung RalfJung enabled auto-merge December 16, 2025 08:04
@RalfJung RalfJung dismissed their stale review December 16, 2025 08:04

resolved

@RalfJung RalfJung added this pull request to the merge queue Dec 16, 2025
Merged via the queue into rust-lang:master with commit 088c054 Dec 16, 2025
13 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: Waiting for a review to complete label Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants