Shared/Cpu.h file

Namespace Death::Cpu and related macros.

Namespaces

namespace Death
Shared root namespace.
namespace Death::Cpu
Compile-time and runtime CPU instruction set detection and dispatch.

Classes

template<class T>
struct Death::Cpu::TypeTraits
Traits class for CPU detection tag types.
struct Death::Cpu::ScalarT
Scalar tag type.
struct Death::Cpu::Sse2T
SSE2 tag type.
struct Death::Cpu::Sse3T
SSE3 tag type.
struct Death::Cpu::Ssse3T
SSSE3 tag type.
struct Death::Cpu::Sse41T
SSE4.1 tag type.
struct Death::Cpu::Sse42T
SSE4.2 tag type.
struct Death::Cpu::PopcntT
POPCNT tag type.
struct Death::Cpu::LzcntT
LZCNT tag type.
struct Death::Cpu::Bmi1T
BMI1 tag type.
struct Death::Cpu::Bmi2T
BMI2 tag type Available only on x86. See the Cpu namespace and the Bmi2 tag for more information.
struct Death::Cpu::AvxT
AVX tag type.
struct Death::Cpu::AvxF16cT
AVX F16C tag type.
struct Death::Cpu::AvxFmaT
AVX FMA tag type.
struct Death::Cpu::Avx2T
AVX2 tag type.
struct Death::Cpu::Avx512fT
AVX-512 Foundation tag type.
struct Death::Cpu::NeonT
NEON tag type.
struct Death::Cpu::NeonFmaT
NEON FMA tag type.
struct Death::Cpu::NeonFp16T
NEON FP16 tag type.
struct Death::Cpu::Simd128T
SIMD128 tag type.
class Death::Cpu::Features
Feature set.

Typedefs

using DefaultBaseT = ScalarT
Default base tag type.
using DefaultExtraT = Implementation::Tags<0>
Default extra tag type.
using DefaultT = Implementation::Tags<static_cast<unsigned int>(TypeTraits<DefaultBaseT>::Index)|DefaultExtraT::Value>
Default tag type.

Functions

template<class T>
auto tag() -> T constexpr
Tag for a tag type.
template<class T>
auto features() -> Features constexpr
Feature set for a tag type.
template<class T, class U>
auto operator|(T, U) -> Implementation::Tags<static_cast<unsigned int>(TypeTraits<T>::Index)|TypeTraits<U>::Index> constexpr
template<class T, unsigned int value>
auto operator|(T, Implementation::Tags<value>) -> Implementation::Tags<TypeTraits<T>::Index|value> constexpr
template<class T, class U>
auto operator&(T, U) -> Implementation::Tags<static_cast<unsigned int>(TypeTraits<T>::Index)&TypeTraits<U>::Index> constexpr
template<class T, unsigned int value>
auto operator&(T, Implementation::Tags<value>) -> Implementation::Tags<TypeTraits<T>::Index&value> constexpr
template<class T, class U>
auto operator^(T, U) -> Implementation::Tags<static_cast<unsigned int>(TypeTraits<T>::Index) ^ TypeTraits<U>::Index> constexpr
template<class T, unsigned int value>
auto operator^(T, Implementation::Tags<value>) -> Implementation::Tags<TypeTraits<T>::Index^ value> constexpr
auto compiledFeatures() -> Features constexpr
CPU instruction sets enabled at compile time.
auto runtimeFeatures() -> Features constexpr
Detect available CPU instruction sets at runtime.

Variables

ScalarT Scalar constexpr
Scalar tag.
Sse2T Sse2 constexpr
SSE2 tag.
Sse3T Sse3 constexpr
SSE3 tag.
Ssse3T Ssse3 constexpr
SSSE3 tag.
Sse41T Sse41 constexpr
SSE4.1 tag.
Sse42T Sse42 constexpr
SSE4.2 tag.
PopcntT Popcnt constexpr
POPCNT tag.
LzcntT Lzcnt constexpr
LZCNT tag.
Bmi1T Bmi1 constexpr
BMI1 tag.
Bmi2T Bmi2 constexpr
BMI2 tag BMI2 instructions. Available only on x86. This instruction set is treated as an extra, i.e. is neither a superset of nor implied by any other instruction set. See Usage with extra instruction sets for more information.
AvxT Avx constexpr
AVX tag.
AvxF16cT AvxF16c constexpr
AVX F16C tag.
AvxFmaT AvxFma constexpr
AVX FMA tag.
Avx2T Avx2 constexpr
AVX2 tag.
Avx512fT Avx512f constexpr
AVX-512 Foundation tag.
NeonT Neon constexpr
NEON tag type.
NeonFmaT NeonFma constexpr
NEON FMA tag type.
NeonFp16T NeonFp16 constexpr
NEON FP16 tag type.
Simd128T Simd128 constexpr
SIMD128 tag type.
DefaultBaseT DefaultBase constexpr
Default base tag.
DefaultExtraT DefaultExtra constexpr
Default extra tags.
DefaultT Default constexpr
Default tags.

Defines

#define DEATH_CPU_DECLARE(tag)
Declare a CPU tag for a compile-time dispatch.
#define DEATH_CPU_SELECT(tag)
Select a CPU tag for a compile-time dispatch.
#define DEATH_CPU_DISPATCHER_BASE(function)
Create a function for a runtime dispatch on a base CPU instruction set.
#define DEATH_CPU_DISPATCHER(function, ...)
Create a function for a runtime dispatch on a base CPU instruction set and select extra instruction sets.
#define DEATH_CPU_DISPATCHED(dispatcher, ...)
Create a dispatched function according to the build configuration.
#define DEATH_CPU_DISPATCHED_POINTER(dispatcher, ...)
Create a runtime-dispatched function pointer.
#define DEATH_CPU_DISPATCHED_IFUNC(dispatcher, ...)
Create a runtime-dispatched function via GNU IFUNC.
#define DEATH_ENABLE_SSE2
Enable SSE2 for given function.
#define DEATH_ENABLE_SSE3
Enable SSE3 for given function.
#define DEATH_ENABLE_SSSE3
Enable SSSE3 for given function.
#define DEATH_ENABLE_SSE41
Enable SSE4.1 for given function.
#define DEATH_ENABLE_SSE42
Enable SSE4.2 for given function.
#define DEATH_ENABLE_POPCNT
Enable POPCNT for given function.
#define DEATH_ENABLE_LZCNT
Enable LZCNT for given function.
#define DEATH_ENABLE_BMI1
Enable BMI1 for given function.
#define DEATH_ENABLE_BMI2
Enable BMI2 for given function.
#define DEATH_ENABLE_AVX
Enable AVX for given function.
#define DEATH_ENABLE_AVX_F16C
Enable AVX F16C for given function.
#define DEATH_ENABLE_AVX_FMA
Enable AVX FMA for given function.
#define DEATH_ENABLE_AVX2
Enable AVX2 for given function.
#define DEATH_ENABLE_AVX512F
Enable AVX-512 Foundation for given function.
#define DEATH_ENABLE_NEON
Enable NEON for given function.
#define DEATH_ENABLE_NEON_FMA
Enable NEON FMA for given function.
#define DEATH_ENABLE_NEON_FP16
Enable NEON FP16 for given function.
#define DEATH_ENABLE_SIMD128
Enable SIMD128 for given function.
#define DEATH_ENABLE(...)
Enable multiple targets for given function.

Define documentation

#define DEATH_CPU_DECLARE(tag)

Declare a CPU tag for a compile-time dispatch.

Meant to be used to declare a function overload that uses given combination of CPU instruction sets. The DEATH_CPU_SELECT() macro is a counterpart used to select among overloads declared with this macro. See Usage with extra instruction sets for more information and usage example.

Internally, this macro expands to two function parameter declarations separated by a comma, one that ensures only an overload matching the desired instruction sets get picked, and one that assigns an absolute priority to this overload.

#define DEATH_CPU_SELECT(tag)

Select a CPU tag for a compile-time dispatch.

Meant to be used to select among function overloads declared with DEATH_CPU_DECLARE() that best matches given combination of CPU instruction sets. See Usage with extra instruction sets for more information.

Internally, this macro expands to two function parameter values separated by a comma, one that contains the desired instruction sets to filter the overloads against and another that converts the sets to an absolute priority to pick the best viable overload.

#define DEATH_CPU_DISPATCHER_BASE(function)

Create a function for a runtime dispatch on a base CPU instruction set.

Given a set of function overloads named function that accept a CPU tag as a parameter, all returning a function pointer of the same type, creates a function with signature function(Cpu::Features) which will select among the overloads using a runtime-specified Cpu::Features, using the same rules as the compile-time overload selection. For this macro to work, at the very least there has to be an overload with a Cpu::ScalarT argument. See Automatic runtime dispatch for more information.

This function works with just a single base CPU instruction tag such as Cpu::Avx2 or Cpu::Neon, but not the extra instruction sets like Cpu::Lzcnt or Cpu::AvxFma. For a dispatch that takes extra instruction sets into account as well use DEATH_CPU_DISPATCHER() instead.

#define DEATH_CPU_DISPATCHER(function, ...)

Create a function for a runtime dispatch on a base CPU instruction set and select extra instruction sets.

Given a set of function overloads named function that accept a CPU tag combination wrapped in DEATH_CPU_DECLARE() as a parameter, all returning a function pointer of the same type, creates a function with signature function(Cpu::Features) which will select among the overloads using a runtime-specified Cpu::Features, using the same rules as the compile-time overload selection. The extra instruction sets considered in the overload selection are specified as additional parameters to the macro, specifying none is valid as well. For this macro to work, at the very least there has to be an overload with a DEATH_CPU_DECLARE(Cpu::Scalar) argument. See Automatic runtime dispatch for more information.

For a dispatch using just the base instruction set use DEATH_CPU_DISPATCHER_BASE() instead.

#define DEATH_CPU_DISPATCHED(dispatcher, ...)

Create a dispatched function according to the build configuration.

Assuming a dispatcher was defined with either DEATH_CPU_DISPATCHER() or DEATH_CPU_DISPATCHER_BASE(), will automatically use DEATH_CPU_DISPATCHED_POINTER(), DEATH_CPU_DISPATCHED_IFUNC() or nothing depending on the build configuration.

See Usage for more information and overhead comparison.

#define DEATH_CPU_DISPATCHED_POINTER(dispatcher, ...)

Create a runtime-dispatched function pointer.

Assuming a dispatcher was defined with either DEATH_CPU_DISPATCHER() or DEATH_CPU_DISPATCHER_BASE(), defines a function pointer variable with a signature specified in the second variadic argument. In a global constructor the variable is assigned a function pointer returned by dispatcher for Cpu::runtimeFeatures().

The pointer can be changed afterwards, such as for testing purposes, See also DEATH_CPU_DISPATCHED_IFUNC() which avoids the overhead of function pointer indirection.

See Automatic cached dispatch for more information and overhead comparison.

#define DEATH_CPU_DISPATCHED_IFUNC(dispatcher, ...)

Create a runtime-dispatched function via GNU IFUNC.

Available only if DEATH_CPU_USE_IFUNC is enabled. Assuming a dispatcher was defined with either DEATH_CPU_DISPATCHER() or DEATH_CPU_DISPATCHER_BASE(), defines a function with a signature specified via the third variadic argument. The signature has to match type. The function uses the GNU IFUNC mechanism, which causes the function call to be resolved to a function pointer returned by dispatcher for Cpu::runtimeFeatures(). The dispatch is performed by the dynamic linker during early startup and cannot be changed afterwards.

If DEATH_CPU_USE_IFUNC isn't available, is explicitly disabled or if you need to be able to subsequently change the dispatched-to function (such as for testing purposes), use DEATH_CPU_DISPATCHED_POINTER() instead.

See Automatic cached dispatch for more information and overhead comparison.

#define DEATH_ENABLE_SSE2

Enable SSE2 for given function.

On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("sse2"))), allowing use of SSE2 and earlier SSE instructions inside a function annotated with this macro without having to specify -msse2 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_SSE2 is present (meaning SSE2 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Implied by DEATH_ENABLE_SSE3. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_SSE3

Enable SSE3 for given function.

On x86 GCC and Clang expands to __attribute__((__target__("sse3"))), allowing use of SSE3 and earlier SSE intrinsics inside a function annotated with this macro without having to specify -msse3 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_SSE3 is present (meaning SSE3 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_SSE2, implied by DEATH_ENABLE_SSSE3. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_SSSE3

Enable SSSE3 for given function.

On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("ssse3"))), allowing use of SSSE3 and earlier SSE instructions inside a function annotated with this macro without having to specify -mssse3 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_SSSE3 is present (meaning SSSE3 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_SSE3, implied by DEATH_ENABLE_SSE41. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_SSE41

Enable SSE4.1 for given function.

On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("sse4.1"))), allowing use of SSE4.1 and earlier SSE instructions inside a function annotated with this macro without having to specify -msse4.1 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_SSE41 is present (meaning SSE4.1 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_SSSE3, implied by DEATH_ENABLE_SSE42. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_SSE42

Enable SSE4.2 for given function.

On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("sse4.2"))), allowing use of SSE4.2 and earlier SSE instructions inside a function annotated with this macro without having to specify -msse4.2 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_SSE42 is defined (meaning SSE4.2 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_SSE41, implied by DEATH_ENABLE_AVX. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_POPCNT

Enable POPCNT for given function.

On x86 GCC, Clang and clang-cl expands to __attribute__((__target__("popcnt"))), allowing use of the POPCNT instructions inside a function annotated with this macro without having to specify -mpopcnt for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside other instruction sets without running into linker errors. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_POPCNT is defined (meaning POCNT is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Neither a superset nor implied by any other DEATH_ENABLE_* macro, so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_LZCNT

Enable LZCNT for given function.

On x86 GCC and Clang expands to __attribute__((__target__("lzcnt"))), allowing use of the LZCNT instructions inside a function annotated with this macro without having to specify -mlzcnt for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants and POPCNT this macro is not defined on clang-cl, as there LZCNT, BMI1, BMI2, AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside unrelated instruction sets without running into linker errors. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_LZCNT is defined (meaning LZCNT is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Neither a superset nor implied by any other DEATH_ENABLE_* macro (not even DEATH_TARGET_BMI2, although the name would suggest that), so you may need to specify it together with others.

#define DEATH_ENABLE_BMI1

Enable BMI1 for given function.

On x86 GCC, Clang expands to __attribute__((__target__("bmi"))), allowing use of the BMI1 instructions inside a function annotated with this macro without having to specify -mbmi for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants and POPCNT this macro is not defined on clang-cl, as there LZCNT, BMI1, AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside unrelated instruction sets without running into linker errors. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_BMI1 is defined (meaning BMI1 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Neither a superset nor implied by any other DEATH_ENABLE_* macro, so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_BMI2

Enable BMI2 for given function.

On x86 GCC, Clang expands to __attribute__((__target__("bmi2"))), allowing use of the BMI2 instructions inside a function annotated with this macro without having to specify -mbmi2 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants and POPCNT this macro is not defined on clang-cl, as there LZCNT, BMI1, BMI2, AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside unrelated instruction sets without running into linker errors. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_BMI2 is defined (meaning BMI2 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Neither a superset nor implied by any other DEATH_ENABLE_* macro (not even DEATH_TARGET_BMI1, although the name would suggest that), so you may need to specify it together with others.

#define DEATH_ENABLE_AVX

Enable AVX for given function.

On x86 GCC and Clang expands to __attribute__((__target__("avx"))), allowing use of AVX and all earlier SSE instructions inside a function annotated with this macro without having to specify -mavx for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_AVX is present (meaning AVX is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_SSE42, implied by DEATH_ENABLE_AVX2. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_AVX_F16C

Enable AVX F16C for given function.

On x86 GCC and Clang expands to __attribute__((__target__("f16c"))), allowing use of F16C instructions inside a function annotated with this macro without having to specify -mf16c for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside other instruction sets without running into linker errors. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_AVX_F16C is present (meaning AVX F16C is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_AVX on both GCC and Clang. However not portably implied by any other DEATH_ENABLE_* macro so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_AVX_FMA

Enable AVX FMA for given function.

On x86 GCC and Clang expands to __attribute__((__target__("fma"))), allowing use of FMA instructions inside a function annotated with this macro without having to specify -mfma for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on GCC 4.8, as there it's not generally possible to enable it alongside other instruction sets without running into linker errors. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_AVX_FMA is present (meaning AVX with FMA is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_AVX on both GCC and Clang. However not portably implied by any other DEATH_ENABLE_* macro so you may need to specify it together with others. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_AVX2

Enable AVX2 for given function.

On x86 GCC and Clang expands to __attribute__((__target__("avx2"))), allowing use of AVX2, FMA, F16C, AVX and all earlier SSE instructions inside a function annotated with this macro without having to specify -mavx2 for the whole compilation unit. On x86 MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_AVX2 is present (meaning AVX2 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_AVX, implied by DEATH_ENABLE_AVX512F. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_AVX512F

Enable AVX-512 Foundation for given function.

On x86 GCC 4.9+ and Clang expands to __attribute__((__target__("avx512f"))), allowing use of AVX-512 Foundation and all earlier AVX and SSE instructions inside a function annotated with this macro without having to specify -mavx512f for the whole compilation unit. On x86 MSVC 2017 15.3+ expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. Unlike the SSE variants this macro is not defined on clang-cl, as there AVX and newer intrinsics are provided only if enabled on compiler command line. Not defined on other compilers, earlier compiler versions without AVX-512 support or other architectures.

As a special case, if DEATH_TARGET_AVX512F is present (meaning AVX-512 Foundation is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_AVX2. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_NEON

Enable NEON for given function.

On 32-bit ARM GCC expands to __attribute__((__target__("fpu=neon"))), allowing use of NEON instructions inside a function annotated with this macro without having to specify -mfpu=neon for the whole compilation unit. On ARM MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. In contrast to GCC, this macro is not defined on Clang, as it makes the NEON intrinsics available only if enabled on compiler command line. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_NEON is present (meaning NEON is enabled for the whole compilation unit), this macro is defined as empty on all compilers. This is also the case for ARM64, where NEON support is implicit (and where -mfpu=neon is unrecognized).

Implied by DEATH_ENABLE_NEON_FMA. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_NEON_FMA

Enable NEON FMA for given function.

On 32-bit ARM GCC expands to __attribute__((__target__("fpu=neon-vfpv4"))), allowing use of NEON FMA instructions inside a function annotated with this macro without having to specify -mfpu=neon-vfpv4 for the whole compilation unit. On ARM MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. In contrast to GCC, this macro is not defined on Clang, as it makes the NEON FMA intrinsics available only if enabled on compiler command line. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_NEON_FMA is present (meaning NEON FMA is enabled for the whole compilation unit), this macro is defined as empty on all compilers. This is also the case for ARM64, where NEON support is implicit (and where -mfpu=neon-vfpv4 is unrecognized).

Superset of DEATH_ENABLE_NEON, implied by DEATH_ENABLE_NEON_FP16. See Enabling instruction sets for particular functions for more information and usage example.

#define DEATH_ENABLE_NEON_FP16

Enable NEON FP16 for given function.

On ARM GCC expands to __attribute__((__target__("arch=armv8.2-a+fp16"))), allowing use of ARMv8.2-a NEON FP16 vector arithmetic inside a function annotated with this macro without having to specify -march=armv8.2-a+fp16 for the whole compilation unit. On ARM MSVC expands to nothing, as the compiler doesn't restrict use of intrinsics in any way. In contrast to GCC, this macro is not defined on Clang, as it makes the NEON FP16 intrinsics available only if enabled on compiler command line. Not defined on other compilers or architectures.

As a special case, if DEATH_TARGET_NEON_FP16 is present (meaning NEON FP16 is enabled for the whole compilation unit), this macro is defined as empty on all compilers.

Superset of DEATH_ENABLE_NEON_FMA. See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE_SIMD128

Enable SIMD128 for given function.

Given that it's currently not possible to selectively use 128-bit SIMD in a WebAssembly module without causing a compilation error on runtimes that don't support it, this macro is only defined if DEATH_TARGET_SIMD128 is present (meaning SIMD128 is explicitly enabled for the whole compilation unit), and is always empty, as __attribute__((__target__("simd128"))) would be redundant if -msimd128 is passed on the command line.

The situation may change once the feature detection proposal is implemented, but likely only for instruction sets building on top of this one.

See Enabling instruction sets for particular functions for more information.

#define DEATH_ENABLE(...)

Enable multiple targets for given function.

Accepts a comma-separated list of DEATH_ENABLE_* macro suffixes, effectively enabling given combination. For the macro to work, all DEATH_ENABLE_* macros corresponding to the arguments have to be defined, the common usage pattern is thus in combination with an #ifdef. See Enabling instruction sets for particular functions for more information and an example.

When multiple DEATH_ENABLE_* macros are specified one after another, Clang 8+ would pick only the first specified, and Clang before version 8 and GCC before version 12 only the last specified, ignoring the others. There the macro expands into a single combined __attribute__((__target__(...))) attribute. For GCC 12+ and other compilers except MSVC it's just a shorthand for multiple DEATH_ENABLE_* macros one after another. On MSVC expands to nothing — there the functions aren't annotated in anyway and moreover the default preprocessor behavior would make this extremely tricky to implement.