-
Notifications
You must be signed in to change notification settings - Fork 37
Confine SIMD code to runtime-dispatched tiers (fixes #628) #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
andrewkern
wants to merge
6
commits into
MesserLab:master
Choose a base branch
from
andrewkern:fix/simd-runtime-dispatch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
9d515d1
Confine SIMD code to runtime-dispatched tiers (fixes #628)
andrewkern f0d2060
Update creation dates on the new SIMD dispatch files
andrewkern 0cf3c61
Test every SIMD tier so the dispatch kernels are fully covered
andrewkern df85904
Fix Windows build: stop config.h flags from clobbering the SIMD ISA f…
andrewkern 9d6e90d
date swap
andrewkern ae3ef67
Restore creation dates and set new files' copyright year to 2026
andrewkern File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| // | ||
| // eidos_simd.cpp | ||
| // Eidos | ||
| // | ||
| // Created by Andrew Kern on 5/21/2026. | ||
| // Copyright (c) 2026 Benjamin C. Haller. All rights reserved. | ||
| // A product of the Messer Lab, http://messerlab.org/slim/ | ||
| // | ||
|
|
||
| // This file is part of Eidos. | ||
| // | ||
| // Eidos is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by | ||
| // the Free Software Foundation, either version 3 of the License, or (at your option) any later version. | ||
| // | ||
| // Eidos is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. | ||
| // | ||
| // You should have received a copy of the GNU General Public License along with Eidos. If not, see <http://www.gnu.org/licenses/>. | ||
|
|
||
| /* | ||
|
|
||
| SIMD runtime dispatcher. This translation unit is compiled at the plain | ||
| baseline ABI (no instruction-set flags); it owns the public Eidos_SIMD | ||
| function pointers and selects a tier for them at startup. See eidos_simd.h. | ||
|
|
||
| */ | ||
|
|
||
| #include "eidos_simd.h" | ||
|
|
||
| #include <cstring> | ||
|
|
||
|
|
||
| // The public kernel pointers. They are statically initialized to the scalar | ||
| // tier so that a call is well-defined even if it somehow happens before | ||
| // Eidos_SIMD_Init() runs; the address of a function is a constant expression, | ||
| // so there is no static-initialization-order dependency here. | ||
| namespace Eidos_SIMD { | ||
| #define X(ret, name, params) ret (*name) params = &Eidos_SIMD_scalar::name; | ||
| EIDOS_SIMD_FUNCTION_TABLE | ||
| #undef X | ||
| } | ||
|
|
||
|
|
||
| enum class Eidos_SIMD_Tier { kScalar, kSSE42, kAVX2_FMA, kNEON }; | ||
|
|
||
| static Eidos_SIMD_Tier sActiveTier = Eidos_SIMD_Tier::kScalar; | ||
|
|
||
|
|
||
| bool Eidos_SIMD_SelectTier(const char *tier_name) | ||
| { | ||
| // The scalar tier is built on every platform and always available. | ||
| if (std::strcmp(tier_name, "scalar") == 0) | ||
| { | ||
| Eidos_SIMD_Fill_scalar(); | ||
| sActiveTier = Eidos_SIMD_Tier::kScalar; | ||
| return true; | ||
| } | ||
|
|
||
| #if EIDOS_SIMD_DISPATCH_X86 | ||
| // __builtin_cpu_supports() reads CPUID; it is available on GCC and Clang | ||
| // for x86 and works regardless of the flags this file was compiled with. | ||
| // AVX2 and FMA shipped together (Haswell), but we require both explicitly | ||
| // since the AVX2 tier and SLEEF both use FMA instructions. | ||
| if (std::strcmp(tier_name, "AVX2+FMA") == 0) | ||
| { | ||
| if (!(__builtin_cpu_supports("avx2") && __builtin_cpu_supports("fma"))) | ||
| return false; | ||
| Eidos_SIMD_Fill_avx2(); | ||
| sActiveTier = Eidos_SIMD_Tier::kAVX2_FMA; | ||
| return true; | ||
| } | ||
| if (std::strcmp(tier_name, "SSE4.2") == 0) | ||
| { | ||
| if (!__builtin_cpu_supports("sse4.2")) | ||
| return false; | ||
| Eidos_SIMD_Fill_sse42(); | ||
| sActiveTier = Eidos_SIMD_Tier::kSSE42; | ||
| return true; | ||
| } | ||
| #endif | ||
|
|
||
| #if EIDOS_SIMD_DISPATCH_ARM | ||
| // NEON is baseline on every ARM64 CPU, so it is always available here. | ||
| if (std::strcmp(tier_name, "NEON") == 0) | ||
| { | ||
| Eidos_SIMD_Fill_neon(); | ||
| sActiveTier = Eidos_SIMD_Tier::kNEON; | ||
| return true; | ||
| } | ||
| #endif | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| void Eidos_SIMD_Init(void) | ||
| { | ||
| // Install the fastest tier the CPU supports. This is idempotent: calling it | ||
| // again re-runs detection and re-installs the same tier, which is how the | ||
| // SIMD self-tests restore normal dispatch after cycling through every tier. | ||
| #if EIDOS_SIMD_DISPATCH_X86 | ||
| if (Eidos_SIMD_SelectTier("AVX2+FMA")) | ||
| return; | ||
| if (Eidos_SIMD_SelectTier("SSE4.2")) | ||
| return; | ||
| #endif | ||
| #if EIDOS_SIMD_DISPATCH_ARM | ||
| if (Eidos_SIMD_SelectTier("NEON")) | ||
| return; | ||
| #endif | ||
| // Fallback for pre-AVX2/pre-SSE4.2 x86, unknown architectures, MSVC, and | ||
| // USE_SIMD=OFF builds: the scalar tier, which runs on any CPU. | ||
| Eidos_SIMD_SelectTier("scalar"); | ||
| } | ||
|
|
||
| const char *Eidos_SIMD_ActiveTierName(void) | ||
| { | ||
| switch (sActiveTier) | ||
| { | ||
| case Eidos_SIMD_Tier::kAVX2_FMA: return "AVX2+FMA"; | ||
| case Eidos_SIMD_Tier::kSSE42: return "SSE4.2"; | ||
| case Eidos_SIMD_Tier::kNEON: return "NEON"; | ||
| case Eidos_SIMD_Tier::kScalar: return "scalar"; | ||
| } | ||
| return "scalar"; | ||
| } | ||
|
|
||
| bool Eidos_SIMD_SLEEFActive(void) | ||
| { | ||
| // SLEEF transcendentals are wired up only for the AVX2+FMA and NEON tiers. | ||
| return (sActiveTier == Eidos_SIMD_Tier::kAVX2_FMA) || (sActiveTier == Eidos_SIMD_Tier::kNEON); | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about compilers that don't support
__builtin_cpu_supports. I guess this is on GCC 4.8+ and Clang 9+. Clang 9 was only released in 2019, so there might be older versions floating around pretty commonly. GCC 4.8+ is fine, though; that's the first GCC version that supported C++11, so it is required anyway. (Clang supported C++11 in version 3.3, so suddenly requiring Clang 9 is a BIG jump forward in requirements.) Older GCC/Clang, and other compilers like MSVC and Intel, will not have__builtin_cpu_supports, though. So I'm worried that we're substituting a common problem (compilers not supporting __builtin_cpu_supports) in place of a rare problem (hardware so old that this solution is needed in the first place).I see that Google has a more general solution to this sort of problem: https://github.com/google/cpu_features. It is not simple, and would complexify SLiM's build significantly; it is not a single-header solution, unfortunately, it would require setting up a CMake subdirectory and stuff. Doubtless that would be a PITA to get working. But maybe it is what we have to do? Or we could detect compilers that don't have
__builtin_cpu_supports, based on not-GCC, not-Clang, or old-GCC-or-Clang, and fall back to scalar while emitting a warning, or some such. Ugh. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. A few thoughts.
(defined(__GNUC__) || defined(__clang__)), so on MSVC/Intel the entire dispatcher is#if'd out and the scalar path is the only oneI think this can all be handled with a CMake check like
Want me to add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Of course then we have to worry about what version of CMake added CheckCXXSourceCompiles. This sort of thing sometimes feels like an infinite regress. I have never figured out how to find out the first version of CMake that added a given feature; they don't make it easy to tell and even Google seems to have no idea. But I'd say add this, let's not worry about the CMake version. If it causes problems for someone, they can always turn off SIMD altogether for their build, and this check would be inside the SIMD stuff, so that would fix the problem if their CMake is that old. Anyway, updating to a newer CMake version is easier than updating to new hardware, so this is certainly a step forward overall. :-> Thanks!