file
Cpu.hNamespace Death::
Namespaces
- namespace Death
- Shared root namespace.
- namespace Death::Cpu
- Compile-time and runtime CPU instruction set detection and dispatch.
Classes
-
template<class T>struct Death::Cpu::TypeTraits
- Traits class for CPU detection tag types.
- struct Death::Cpu::ScalarT
- Scalar tag type.
- struct Death::Cpu::Sse2T
- SSE2 tag type.
- struct Death::Cpu::Sse3T
- SSE3 tag type.
- struct Death::Cpu::Ssse3T
- SSSE3 tag type.
- struct Death::Cpu::Sse41T
- SSE4.1 tag type.
- struct Death::Cpu::Sse42T
- SSE4.2 tag type.
- struct Death::Cpu::PopcntT
- POPCNT tag type.
- struct Death::Cpu::LzcntT
- LZCNT tag type.
- struct Death::Cpu::Bmi1T
- BMI1 tag type.
- struct Death::Cpu::Bmi2T
- BMI2 tag type Available only on x86. See the Cpu namespace and the Bmi2 tag for more information.
- struct Death::Cpu::AvxT
- AVX tag type.
- struct Death::Cpu::AvxF16cT
- AVX F16C tag type.
- struct Death::Cpu::AvxFmaT
- AVX FMA tag type.
- struct Death::Cpu::Avx2T
- AVX2 tag type.
- struct Death::Cpu::Avx512fT
- AVX-512 Foundation tag type.
- struct Death::Cpu::NeonT
- NEON tag type.
- struct Death::Cpu::NeonFmaT
- NEON FMA tag type.
- struct Death::Cpu::NeonFp16T
- NEON FP16 tag type.
- struct Death::Cpu::Simd128T
- SIMD128 tag type.
- class Death::Cpu::Features
- Feature set.
Typedefs
- using DefaultBaseT = ScalarT
- Default base tag type.
- using DefaultExtraT = Implementation::Tags<0>
- Default extra tag type.
- using DefaultT = Implementation::Tags<static_cast<unsigned int>(TypeTraits<DefaultBaseT>::Index)|DefaultExtraT::Value>
- Default tag type.
Functions
-
template<class T>auto tag() -> T constexpr
- Tag for a tag type.
-
template<class T>auto features() -> Features constexpr
- Feature set for a tag type.
-
template<class T, class U>auto operator|(T, U) -> Implementation::Tags<static_cast<unsigned int>(TypeTraits<T>::Index)|TypeTraits<U>::Index> constexpr
-
template<class T, unsigned int value>auto operator|(T, Implementation::Tags<value>) -> Implementation::Tags<TypeTraits<T>::Index|value> constexpr
-
template<class T, class U>auto operator&(T, U) -> Implementation::Tags<static_cast<unsigned int>(TypeTraits<T>::Index)&TypeTraits<U>::Index> constexpr
-
template<class T, unsigned int value>auto operator&(T, Implementation::Tags<value>) -> Implementation::Tags<TypeTraits<T>::Index&value> constexpr
-
template<class T, class U>auto operator^(T, U) -> Implementation::Tags<static_cast<unsigned int>(TypeTraits<T>::Index) ^ TypeTraits<U>::Index> constexpr
-
template<class T, unsigned int value>auto operator^(T, Implementation::Tags<value>) -> Implementation::Tags<TypeTraits<T>::Index^ value> constexpr
- auto compiledFeatures() -> Features constexpr
- CPU instruction sets enabled at compile time.
- auto runtimeFeatures() -> Features constexpr
- Detect available CPU instruction sets at runtime.
Variables
- ScalarT Scalar constexpr
- Scalar tag.
- Sse2T Sse2 constexpr
- SSE2 tag.
- Sse3T Sse3 constexpr
- SSE3 tag.
- Ssse3T Ssse3 constexpr
- SSSE3 tag.
- Sse41T Sse41 constexpr
- SSE4.1 tag.
- Sse42T Sse42 constexpr
- SSE4.2 tag.
- PopcntT Popcnt constexpr
- POPCNT tag.
- LzcntT Lzcnt constexpr
- LZCNT tag.
- Bmi1T Bmi1 constexpr
- BMI1 tag.
- Bmi2T Bmi2 constexpr
- BMI2 tag BMI2 instructions. Available only on x86. This instruction set is treated as an extra, i.e. is neither a superset of nor implied by any other instruction set. See Usage with extra instruction sets for more information.
- AvxT Avx constexpr
- AVX tag.
- AvxF16cT AvxF16c constexpr
- AVX F16C tag.
- AvxFmaT AvxFma constexpr
- AVX FMA tag.
- Avx2T Avx2 constexpr
- AVX2 tag.
- Avx512fT Avx512f constexpr
- AVX-512 Foundation tag.
- NeonT Neon constexpr
- NEON tag type.
- NeonFmaT NeonFma constexpr
- NEON FMA tag type.
- NeonFp16T NeonFp16 constexpr
- NEON FP16 tag type.
- Simd128T Simd128 constexpr
- SIMD128 tag type.
- DefaultBaseT DefaultBase constexpr
- Default base tag.
- DefaultExtraT DefaultExtra constexpr
- Default extra tags.
- DefaultT Default constexpr
- Default tags.
Defines
- #define DEATH_CPU_DECLARE(tag)
- Declare a CPU tag for a compile-time dispatch.
- #define DEATH_CPU_SELECT(tag)
- Select a CPU tag for a compile-time dispatch.
- #define DEATH_CPU_DISPATCHER_BASE(function)
- Create a function for a runtime dispatch on a base CPU instruction set.
- #define DEATH_CPU_DISPATCHER(function, ...)
- Create a function for a runtime dispatch on a base CPU instruction set and select extra instruction sets.
- #define DEATH_CPU_DISPATCHED(dispatcher, ...)
- Create a dispatched function according to the build configuration.
- #define DEATH_CPU_DISPATCHED_POINTER(dispatcher, ...)
- Create a runtime-dispatched function pointer.
- #define DEATH_CPU_DISPATCHED_IFUNC(dispatcher, ...)
- Create a runtime-dispatched function via GNU IFUNC.
- #define DEATH_ENABLE_SSE2
- Enable SSE2 for given function.
- #define DEATH_ENABLE_SSE3
- Enable SSE3 for given function.
- #define DEATH_ENABLE_SSSE3
- Enable SSSE3 for given function.
- #define DEATH_ENABLE_SSE41
- Enable SSE4.1 for given function.
- #define DEATH_ENABLE_SSE42
- Enable SSE4.2 for given function.
- #define DEATH_ENABLE_POPCNT
- Enable POPCNT for given function.
- #define DEATH_ENABLE_LZCNT
- Enable LZCNT for given function.
- #define DEATH_ENABLE_BMI1
- Enable BMI1 for given function.
- #define DEATH_ENABLE_BMI2
- Enable BMI2 for given function.
- #define DEATH_ENABLE_AVX
- Enable AVX for given function.
- #define DEATH_ENABLE_AVX_F16C
- Enable AVX F16C for given function.
- #define DEATH_ENABLE_AVX_FMA
- Enable AVX FMA for given function.
- #define DEATH_ENABLE_AVX2
- Enable AVX2 for given function.
- #define DEATH_ENABLE_AVX512F
- Enable AVX-512 Foundation for given function.
- #define DEATH_ENABLE_NEON
- Enable NEON for given function.
- #define DEATH_ENABLE_NEON_FMA
- Enable NEON FMA for given function.
- #define DEATH_ENABLE_NEON_FP16
- Enable NEON FP16 for given function.
- #define DEATH_ENABLE_SIMD128
- Enable SIMD128 for given function.
- #define DEATH_ENABLE(...)
- Enable multiple targets for given function.
Define documentation
#define DEATH_CPU_DECLARE(tag)
Declare a CPU tag for a compile-time dispatch.
Meant to be used to declare a function overload that uses given combination of CPU instruction sets. The DEATH_
Internally, this macro expands to two function parameter declarations separated by a comma, one that ensures only an overload matching the desired instruction sets get picked, and one that assigns an absolute priority to this overload.
#define DEATH_CPU_SELECT(tag)
Select a CPU tag for a compile-time dispatch.
Meant to be used to select among function overloads declared with DEATH_
Internally, this macro expands to two function parameter values separated by a comma, one that contains the desired instruction sets to filter the overloads against and another that converts the sets to an absolute priority to pick the best viable overload.
#define DEATH_CPU_DISPATCHER_BASE(function)
Create a function for a runtime dispatch on a base CPU instruction set.
Given a set of function overloads named function
that accept a CPU tag as a parameter, all returning a function pointer of the same type, creates a function with signature function(Cpu::Features)
which will select among the overloads using a runtime-specified Cpu::
This function works with just a single base CPU instruction tag such as Cpu::
#define DEATH_CPU_DISPATCHER(function, ...)
Create a function for a runtime dispatch on a base CPU instruction set and select extra instruction sets.
Given a set of function overloads named function
that accept a CPU tag combination wrapped in DEATH_function(Cpu::Features)
which will select among the overloads using a runtime-specified Cpu::
For a dispatch using just the base instruction set use DEATH_
#define DEATH_CPU_DISPATCHED(dispatcher, ...)
Create a dispatched function according to the build configuration.
Assuming a dispatcher
was defined with either DEATH_
See Usage for more information and overhead comparison.
#define DEATH_CPU_DISPATCHED_POINTER(dispatcher, ...)
Create a runtime-dispatched function pointer.
Assuming a dispatcher
was defined with either DEATH_dispatcher
for Cpu::
The pointer can be changed afterwards, such as for testing purposes, See also DEATH_
See Automatic cached dispatch for more information and overhead comparison.
#define DEATH_CPU_DISPATCHED_IFUNC(dispatcher, ...)
Create a runtime-dispatched function via GNU IFUNC.
Available only if DEATH_dispatcher
was defined with either DEATH_type
. The function uses the GNU IFUNC mechanism, which causes the function call to be resolved to a function pointer returned by dispatcher
for Cpu::
If DEATH_
See Automatic cached dispatch for more information and overhead comparison.
#define DEATH_ENABLE_SSE2
Enable SSE2 for given function.
On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("sse2")))
, allowing use of SSE2 and earlier SSE instructions inside a function annotated with this macro without having to specify -msse2
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.
As a special case, if DEATH_
Implied by DEATH_
#define DEATH_ENABLE_SSE3
Enable SSE3 for given function.
On x86 GCC and Clang expands to __attribute__((__target__("sse3")))
, allowing use of SSE3 and earlier SSE intrinsics inside a function annotated with this macro without having to specify -msse3
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_SSSE3
Enable SSSE3 for given function.
On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("ssse3")))
, allowing use of SSSE3 and earlier SSE instructions inside a function annotated with this macro without having to specify -mssse3
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_SSE41
Enable SSE4.1 for given function.
On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("sse4.1")))
, allowing use of SSE4.1 and earlier SSE instructions inside a function annotated with this macro without having to specify -msse4.1
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_SSE42
Enable SSE4.2 for given function.
On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("sse4.2")))
, allowing use of SSE4.2 and earlier SSE instructions inside a function annotated with this macro without having to specify -msse4.2
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_POPCNT
Enable POPCNT for given function.
On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("popcnt")))
, allowing use of the POPCNT instructions inside a function annotated with this macro without having to specify -mpopcnt
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside other instruction sets without running into linker errors. Not defined on other compilers or architectures.
As a special case, if DEATH_
Neither a superset nor implied by any other DEATH_ENABLE_*
macro, so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.
#define DEATH_ENABLE_LZCNT
Enable LZCNT for given function.
On x86 GCC and Clang expands to __attribute__((__target__("lzcnt")))
, allowing use of the LZCNT instructions inside a function annotated with this macro without having to specify -mlzcnt
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants and POPCNT this macro is not defined on clang-cl, as there LZCNT, BMI1, BMI2, AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside unrelated instruction sets without running into linker errors. Not defined on other compilers or architectures.
As a special case, if DEATH_
Neither a superset nor implied by any other DEATH_ENABLE_*
macro (not even DEATH_
#define DEATH_ENABLE_BMI1
Enable BMI1 for given function.
On x86 GCC, Clang expands to __attribute__((__target__("bmi")))
, allowing use of the BMI1 instructions inside a function annotated with this macro without having to specify -mbmi
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants and POPCNT this macro is not defined on clang-cl, as there LZCNT, BMI1, AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside unrelated instruction sets without running into linker errors. Not defined on other compilers or architectures.
As a special case, if DEATH_
Neither a superset nor implied by any other DEATH_ENABLE_*
macro, so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.
#define DEATH_ENABLE_BMI2
Enable BMI2 for given function.
On x86 GCC, Clang expands to __attribute__((__target__("bmi2")))
, allowing use of the BMI2 instructions inside a function annotated with this macro without having to specify -mbmi2
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants and POPCNT this macro is not defined on clang-cl, as there LZCNT, BMI1, BMI2, AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside unrelated instruction sets without running into linker errors. Not defined on other compilers or architectures.
As a special case, if DEATH_
Neither a superset nor implied by any other DEATH_ENABLE_*
macro (not even DEATH_
#define DEATH_ENABLE_AVX
Enable AVX for given function.
On x86 GCC and Clang expands to __attribute__((__target__("avx")))
, allowing use of AVX and all earlier SSE instructions inside a function annotated with this macro without having to specify -mavx
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_AVX_F16C
Enable AVX F16C for given function.
On x86 GCC and Clang expands to __attribute__((__target__("f16c")))
, allowing use of F16C instructions inside a function annotated with this macro without having to specify -mf16c
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside other instruction sets without running into linker errors. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_DEATH_ENABLE_*
macro so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.
#define DEATH_ENABLE_AVX_FMA
Enable AVX FMA for given function.
On x86 GCC and Clang expands to __attribute__((__target__("fma")))
, allowing use of FMA instructions inside a function annotated with this macro without having to specify -mfma
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside other instruction sets without running into linker errors. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_DEATH_ENABLE_*
macro so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.
#define DEATH_ENABLE_AVX2
Enable AVX2 for given function.
On x86 GCC and Clang expands to __attribute__((__target__("avx2")))
, allowing use of AVX2, FMA, F16C, AVX and all earlier SSE instructions inside a function annotated with this macro without having to specify -mavx2
for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_AVX512F
Enable AVX-512 Foundation for given function.
On x86 GCC 4.9+ and Clang expands to __attribute__((__target__("avx512f")))
, allowing use of AVX-512 Foundation and all earlier AVX and SSE instructions inside a function annotated with this macro without having to specify -mavx512f
for the whole compilation unit. On x86 MSVC 2017 15.3+ expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on other compilers, earlier compiler versions without AVX-512 support or other architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_NEON
Enable NEON for given function.
On 32-bit ARM GCC expands to __attribute__((__target__("fpu=neon")))
, allowing use of NEON instructions inside a function annotated with this macro without having to specify -mfpu=neon
for the whole compilation unit. On ARM MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. In contrast to GCC, this macro is not defined on Clang, as it makes the NEON intrinsics available only if enabled on compiler command line. Not defined on other compilers or architectures.
As a special case, if DEATH_-mfpu=neon
is unrecognized).
Implied by DEATH_
#define DEATH_ENABLE_NEON_FMA
Enable NEON FMA for given function.
On 32-bit ARM GCC expands to __attribute__((__target__("fpu=neon-vfpv4")))
, allowing use of NEON FMA instructions inside a function annotated with this macro without having to specify -mfpu=neon-vfpv4
for the whole compilation unit. On ARM MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. In contrast to GCC, this macro is not defined on Clang, as it makes the NEON FMA intrinsics available only if enabled on compiler command line. Not defined on other compilers or architectures.
As a special case, if DEATH_-mfpu=neon-vfpv4
is unrecognized).
Superset of DEATH_
#define DEATH_ENABLE_NEON_FP16
Enable NEON FP16 for given function.
On ARM GCC expands to __attribute__((__target__("arch=armv8.2-a+fp16")))
, allowing use of ARMv8.2-a NEON FP16 vector arithmetic inside a function annotated with this macro without having to specify -march=armv8.2-a+fp16
for the whole compilation unit. On ARM MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. In contrast to GCC, this macro is not defined on Clang, as it makes the NEON FP16 intrinsics available only if enabled on compiler command line. Not defined on other compilers or architectures.
As a special case, if DEATH_
Superset of DEATH_
#define DEATH_ENABLE_SIMD128
Enable SIMD128 for given function.
Given that it's currently not possible to selectively use 128-bit SIMD in a WebAssembly module without causing a compilation error on runtimes that don't support it, this macro is only defined if DEATH___attribute__((__target__("simd128")))
would be redundant if -msimd128
is passed on the command line.
The situation may change once the feature detection proposal is implemented, but likely only for instruction sets building on top of this one.
See Enabling instruction sets for particular functions for more information.
#define DEATH_ENABLE(...)
Enable multiple targets for given function.
Accepts a comma-separated list of DEATH_ENABLE_*
macro suffixes, effectively enabling given combination. For the macro to work, all DEATH_ENABLE_*
macros corresponding to the arguments have to be defined, the common usage pattern is thus in combination with an #ifdef
. See Enabling instruction sets for particular functions for more information and an example.
When multiple DEATH_ENABLE_*
macros are specified one after another, Clang 8+ would pick only the first specified, and Clang before version 8 and GCC before version 12 only the last specified, ignoring the others. There the macro expands into a single combined __attribute__((__target__(...)))
attribute. For GCC 12+ and other compilers except MSVC it's just a shorthand for multiple DEATH_ENABLE_*
macros one after another. On MSVC expands to nothing — there the functions aren't annotated in anyway and moreover the default preprocessor behavior would make this extremely tricky to implement.