Skip to content

Commit 0431d6d

Browse files
authored
Clang: convert __m64 intrinsics to unconditionally use SSE2 instead of MMX. (#96540)
The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago. Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics. Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call `_mm_empty()` between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs. Worse, even if the user did write code that correctly calls `_mm_empty()` in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue. Eliminating the use of MMX registers eliminates this problem. This change also deletes the now-unnecessary MMX `__builtin_ia32_*` functions from Clang. Only 3 MMX-related builtins remain in use -- `__builtin_ia32_emms`, used by `_mm_empty`, and `__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and `_mm_extract_pi16`. Note particularly that the latter two lower to generic, non-MMX, IR. Support for the LLVM intrinsics underlying these removed builtins still remains, for the moment. The file `clang/www/builtins.py` has been updated with mappings from the newly-removed `__builtin_ia32` functions to the still-supported equivalents in `mmintrin.h`. (Originally uploaded at https://reviews.llvm.org/D86855 and https://reviews.llvm.org/D94252) Fixes issue #41665 Works towards #98272
1 parent 10ff2bc commit 0431d6d

19 files changed

+687
-702
lines changed

clang/docs/ReleaseNotes.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,20 @@ AMDGPU Support
172172
X86 Support
173173
^^^^^^^^^^^
174174

175+
- The MMX vector intrinsic functions from ``*mmintrin.h`` which
176+
operate on `__m64` vectors, such as ``_mm_add_pi8``, have been
177+
reimplemented to use the SSE2 instruction-set and XMM registers
178+
unconditionally. These intrinsics are therefore *no longer
179+
supported* if MMX is enabled without SSE2 -- either from targeting
180+
CPUs from the Pentium-MMX through the Pentium 3, or explicitly via
181+
passing arguments such as ``-mmmx -mno-sse2``.
182+
183+
- The compiler builtins such as ``__builtin_ia32_paddb`` which
184+
formerly implemented the above MMX intrinsic functions have been
185+
removed. Any uses of these removed functions should migrate to the
186+
functions defined by the ``*mmintrin.h`` headers. A mapping can be
187+
found in the file ``clang/www/builtins.py``.
188+
175189
Arm and AArch64 Support
176190
^^^^^^^^^^^^^^^^^^^^^^^
177191

clang/include/clang/Basic/BuiltinsX86.def

Lines changed: 2 additions & 101 deletions
Original file line numberDiff line numberDiff line change
@@ -47,107 +47,8 @@ TARGET_BUILTIN(__builtin_ia32_writeeflags_u32, "vUi", "n", "")
4747
// doesn't work in the presence of re-declaration of _mm_prefetch for windows.
4848
TARGET_BUILTIN(_mm_prefetch, "vcC*i", "nc", "mmx")
4949
TARGET_BUILTIN(__builtin_ia32_emms, "v", "n", "mmx")
50-
TARGET_BUILTIN(__builtin_ia32_paddb, "V8cV8cV8c", "ncV:64:", "mmx")
51-
TARGET_BUILTIN(__builtin_ia32_paddw, "V4sV4sV4s", "ncV:64:", "mmx")
52-
TARGET_BUILTIN(__builtin_ia32_paddd, "V2iV2iV2i", "ncV:64:", "mmx")
53-
TARGET_BUILTIN(__builtin_ia32_paddsb, "V8cV8cV8c", "ncV:64:", "mmx")
54-
TARGET_BUILTIN(__builtin_ia32_paddsw, "V4sV4sV4s", "ncV:64:", "mmx")
55-
TARGET_BUILTIN(__builtin_ia32_paddusb, "V8cV8cV8c", "ncV:64:", "mmx")
56-
TARGET_BUILTIN(__builtin_ia32_paddusw, "V4sV4sV4s", "ncV:64:", "mmx")
57-
TARGET_BUILTIN(__builtin_ia32_psubb, "V8cV8cV8c", "ncV:64:", "mmx")
58-
TARGET_BUILTIN(__builtin_ia32_psubw, "V4sV4sV4s", "ncV:64:", "mmx")
59-
TARGET_BUILTIN(__builtin_ia32_psubd, "V2iV2iV2i", "ncV:64:", "mmx")
60-
TARGET_BUILTIN(__builtin_ia32_psubsb, "V8cV8cV8c", "ncV:64:", "mmx")
61-
TARGET_BUILTIN(__builtin_ia32_psubsw, "V4sV4sV4s", "ncV:64:", "mmx")
62-
TARGET_BUILTIN(__builtin_ia32_psubusb, "V8cV8cV8c", "ncV:64:", "mmx")
63-
TARGET_BUILTIN(__builtin_ia32_psubusw, "V4sV4sV4s", "ncV:64:", "mmx")
64-
TARGET_BUILTIN(__builtin_ia32_pmulhw, "V4sV4sV4s", "ncV:64:", "mmx")
65-
TARGET_BUILTIN(__builtin_ia32_pmullw, "V4sV4sV4s", "ncV:64:", "mmx")
66-
TARGET_BUILTIN(__builtin_ia32_pmaddwd, "V2iV4sV4s", "ncV:64:", "mmx")
67-
TARGET_BUILTIN(__builtin_ia32_pand, "V1OiV1OiV1Oi", "ncV:64:", "mmx")
68-
TARGET_BUILTIN(__builtin_ia32_pandn, "V1OiV1OiV1Oi", "ncV:64:", "mmx")
69-
TARGET_BUILTIN(__builtin_ia32_por, "V1OiV1OiV1Oi", "ncV:64:", "mmx")
70-
TARGET_BUILTIN(__builtin_ia32_pxor, "V1OiV1OiV1Oi", "ncV:64:", "mmx")
71-
TARGET_BUILTIN(__builtin_ia32_psllw, "V4sV4sV1Oi", "ncV:64:", "mmx")
72-
TARGET_BUILTIN(__builtin_ia32_pslld, "V2iV2iV1Oi", "ncV:64:", "mmx")
73-
TARGET_BUILTIN(__builtin_ia32_psllq, "V1OiV1OiV1Oi", "ncV:64:", "mmx")
74-
TARGET_BUILTIN(__builtin_ia32_psrlw, "V4sV4sV1Oi", "ncV:64:", "mmx")
75-
TARGET_BUILTIN(__builtin_ia32_psrld, "V2iV2iV1Oi", "ncV:64:", "mmx")
76-
TARGET_BUILTIN(__builtin_ia32_psrlq, "V1OiV1OiV1Oi", "ncV:64:", "mmx")
77-
TARGET_BUILTIN(__builtin_ia32_psraw, "V4sV4sV1Oi", "ncV:64:", "mmx")
78-
TARGET_BUILTIN(__builtin_ia32_psrad, "V2iV2iV1Oi", "ncV:64:", "mmx")
79-
TARGET_BUILTIN(__builtin_ia32_psllwi, "V4sV4si", "ncV:64:", "mmx")
80-
TARGET_BUILTIN(__builtin_ia32_pslldi, "V2iV2ii", "ncV:64:", "mmx")
81-
TARGET_BUILTIN(__builtin_ia32_psllqi, "V1OiV1Oii", "ncV:64:", "mmx")
82-
TARGET_BUILTIN(__builtin_ia32_psrlwi, "V4sV4si", "ncV:64:", "mmx")
83-
TARGET_BUILTIN(__builtin_ia32_psrldi, "V2iV2ii", "ncV:64:", "mmx")
84-
TARGET_BUILTIN(__builtin_ia32_psrlqi, "V1OiV1Oii", "ncV:64:", "mmx")
85-
TARGET_BUILTIN(__builtin_ia32_psrawi, "V4sV4si", "ncV:64:", "mmx")
86-
TARGET_BUILTIN(__builtin_ia32_psradi, "V2iV2ii", "ncV:64:", "mmx")
87-
TARGET_BUILTIN(__builtin_ia32_packsswb, "V8cV4sV4s", "ncV:64:", "mmx")
88-
TARGET_BUILTIN(__builtin_ia32_packssdw, "V4sV2iV2i", "ncV:64:", "mmx")
89-
TARGET_BUILTIN(__builtin_ia32_packuswb, "V8cV4sV4s", "ncV:64:", "mmx")
90-
TARGET_BUILTIN(__builtin_ia32_punpckhbw, "V8cV8cV8c", "ncV:64:", "mmx")
91-
TARGET_BUILTIN(__builtin_ia32_punpckhwd, "V4sV4sV4s", "ncV:64:", "mmx")
92-
TARGET_BUILTIN(__builtin_ia32_punpckhdq, "V2iV2iV2i", "ncV:64:", "mmx")
93-
TARGET_BUILTIN(__builtin_ia32_punpcklbw, "V8cV8cV8c", "ncV:64:", "mmx")
94-
TARGET_BUILTIN(__builtin_ia32_punpcklwd, "V4sV4sV4s", "ncV:64:", "mmx")
95-
TARGET_BUILTIN(__builtin_ia32_punpckldq, "V2iV2iV2i", "ncV:64:", "mmx")
96-
TARGET_BUILTIN(__builtin_ia32_pcmpeqb, "V8cV8cV8c", "ncV:64:", "mmx")
97-
TARGET_BUILTIN(__builtin_ia32_pcmpeqw, "V4sV4sV4s", "ncV:64:", "mmx")
98-
TARGET_BUILTIN(__builtin_ia32_pcmpeqd, "V2iV2iV2i", "ncV:64:", "mmx")
99-
TARGET_BUILTIN(__builtin_ia32_pcmpgtb, "V8cV8cV8c", "ncV:64:", "mmx")
100-
TARGET_BUILTIN(__builtin_ia32_pcmpgtw, "V4sV4sV4s", "ncV:64:", "mmx")
101-
TARGET_BUILTIN(__builtin_ia32_pcmpgtd, "V2iV2iV2i", "ncV:64:", "mmx")
102-
TARGET_BUILTIN(__builtin_ia32_maskmovq, "vV8cV8cc*", "nV:64:", "mmx")
103-
TARGET_BUILTIN(__builtin_ia32_movntq, "vV1Oi*V1Oi", "nV:64:", "mmx")
104-
TARGET_BUILTIN(__builtin_ia32_vec_init_v2si, "V2iii", "ncV:64:", "mmx")
105-
TARGET_BUILTIN(__builtin_ia32_vec_init_v4hi, "V4sssss", "ncV:64:", "mmx")
106-
TARGET_BUILTIN(__builtin_ia32_vec_init_v8qi, "V8ccccccccc", "ncV:64:", "mmx")
107-
TARGET_BUILTIN(__builtin_ia32_vec_ext_v2si, "iV2ii", "ncV:64:", "mmx")
108-
109-
// MMX2 (MMX+SSE) intrinsics
110-
TARGET_BUILTIN(__builtin_ia32_cvtpi2ps, "V4fV4fV2i", "ncV:64:", "mmx,sse")
111-
TARGET_BUILTIN(__builtin_ia32_cvtps2pi, "V2iV4f", "ncV:64:", "mmx,sse")
112-
TARGET_BUILTIN(__builtin_ia32_cvttps2pi, "V2iV4f", "ncV:64:", "mmx,sse")
113-
TARGET_BUILTIN(__builtin_ia32_pavgb, "V8cV8cV8c", "ncV:64:", "mmx,sse")
114-
TARGET_BUILTIN(__builtin_ia32_pavgw, "V4sV4sV4s", "ncV:64:", "mmx,sse")
115-
TARGET_BUILTIN(__builtin_ia32_pmaxsw, "V4sV4sV4s", "ncV:64:", "mmx,sse")
116-
TARGET_BUILTIN(__builtin_ia32_pmaxub, "V8cV8cV8c", "ncV:64:", "mmx,sse")
117-
TARGET_BUILTIN(__builtin_ia32_pminsw, "V4sV4sV4s", "ncV:64:", "mmx,sse")
118-
TARGET_BUILTIN(__builtin_ia32_pminub, "V8cV8cV8c", "ncV:64:", "mmx,sse")
119-
TARGET_BUILTIN(__builtin_ia32_pmovmskb, "iV8c", "ncV:64:", "mmx,sse")
120-
TARGET_BUILTIN(__builtin_ia32_pmulhuw, "V4sV4sV4s", "ncV:64:", "mmx,sse")
121-
TARGET_BUILTIN(__builtin_ia32_psadbw, "V4sV8cV8c", "ncV:64:", "mmx,sse")
122-
TARGET_BUILTIN(__builtin_ia32_pshufw, "V4sV4sIc", "ncV:64:", "mmx,sse")
123-
TARGET_BUILTIN(__builtin_ia32_vec_ext_v4hi, "iV4sIi", "ncV:64:", "mmx,sse")
124-
TARGET_BUILTIN(__builtin_ia32_vec_set_v4hi, "V4sV4siIi", "ncV:64:", "mmx,sse")
125-
126-
// MMX+SSE2
127-
TARGET_BUILTIN(__builtin_ia32_cvtpd2pi, "V2iV2d", "ncV:64:", "mmx,sse2")
128-
TARGET_BUILTIN(__builtin_ia32_cvtpi2pd, "V2dV2i", "ncV:64:", "mmx,sse2")
129-
TARGET_BUILTIN(__builtin_ia32_cvttpd2pi, "V2iV2d", "ncV:64:", "mmx,sse2")
130-
TARGET_BUILTIN(__builtin_ia32_paddq, "V1OiV1OiV1Oi", "ncV:64:", "mmx,sse2")
131-
TARGET_BUILTIN(__builtin_ia32_pmuludq, "V1OiV2iV2i", "ncV:64:", "mmx,sse2")
132-
TARGET_BUILTIN(__builtin_ia32_psubq, "V1OiV1OiV1Oi", "ncV:64:", "mmx,sse2")
133-
134-
// MMX+SSSE3
135-
TARGET_BUILTIN(__builtin_ia32_pabsb, "V8cV8c", "ncV:64:", "mmx,ssse3")
136-
TARGET_BUILTIN(__builtin_ia32_pabsd, "V2iV2i", "ncV:64:", "mmx,ssse3")
137-
TARGET_BUILTIN(__builtin_ia32_pabsw, "V4sV4s", "ncV:64:", "mmx,ssse3")
138-
TARGET_BUILTIN(__builtin_ia32_palignr, "V8cV8cV8cIc", "ncV:64:", "mmx,ssse3")
139-
TARGET_BUILTIN(__builtin_ia32_phaddd, "V2iV2iV2i", "ncV:64:", "mmx,ssse3")
140-
TARGET_BUILTIN(__builtin_ia32_phaddsw, "V4sV4sV4s", "ncV:64:", "mmx,ssse3")
141-
TARGET_BUILTIN(__builtin_ia32_phaddw, "V4sV4sV4s", "ncV:64:", "mmx,ssse3")
142-
TARGET_BUILTIN(__builtin_ia32_phsubd, "V2iV2iV2i", "ncV:64:", "mmx,ssse3")
143-
TARGET_BUILTIN(__builtin_ia32_phsubsw, "V4sV4sV4s", "ncV:64:", "mmx,ssse3")
144-
TARGET_BUILTIN(__builtin_ia32_phsubw, "V4sV4sV4s", "ncV:64:", "mmx,ssse3")
145-
TARGET_BUILTIN(__builtin_ia32_pmaddubsw, "V8cV8cV8c", "ncV:64:", "mmx,ssse3")
146-
TARGET_BUILTIN(__builtin_ia32_pmulhrsw, "V4sV4sV4s", "ncV:64:", "mmx,ssse3")
147-
TARGET_BUILTIN(__builtin_ia32_pshufb, "V8cV8cV8c", "ncV:64:", "mmx,ssse3")
148-
TARGET_BUILTIN(__builtin_ia32_psignw, "V4sV4sV4s", "ncV:64:", "mmx,ssse3")
149-
TARGET_BUILTIN(__builtin_ia32_psignb, "V8cV8cV8c", "ncV:64:", "mmx,ssse3")
150-
TARGET_BUILTIN(__builtin_ia32_psignd, "V2iV2iV2i", "ncV:64:", "mmx,ssse3")
50+
TARGET_BUILTIN(__builtin_ia32_vec_ext_v4hi, "sV4sIi", "ncV:64:", "sse")
51+
TARGET_BUILTIN(__builtin_ia32_vec_set_v4hi, "V4sV4ssIi", "ncV:64:", "sse")
15152

15253
// SSE intrinsics.
15354
TARGET_BUILTIN(__builtin_ia32_comieq, "iV4fV4f", "ncV:128:", "sse")

clang/lib/CodeGen/CGBuiltin.cpp

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14523,12 +14523,7 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
1452314523
// TODO: If we had a "freeze" IR instruction to generate a fixed undef
1452414524
// value, we should use that here instead of a zero.
1452514525
return llvm::Constant::getNullValue(ConvertType(E->getType()));
14526-
case X86::BI__builtin_ia32_vec_init_v8qi:
14527-
case X86::BI__builtin_ia32_vec_init_v4hi:
14528-
case X86::BI__builtin_ia32_vec_init_v2si:
14529-
return Builder.CreateBitCast(BuildVector(Ops),
14530-
llvm::Type::getX86_MMXTy(getLLVMContext()));
14531-
case X86::BI__builtin_ia32_vec_ext_v2si:
14526+
case X86::BI__builtin_ia32_vec_ext_v4hi:
1453214527
case X86::BI__builtin_ia32_vec_ext_v16qi:
1453314528
case X86::BI__builtin_ia32_vec_ext_v8hi:
1453414529
case X86::BI__builtin_ia32_vec_ext_v4si:
@@ -14546,6 +14541,7 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
1454614541
// Otherwise we could just do this in the header file.
1454714542
return Builder.CreateExtractElement(Ops[0], Index);
1454814543
}
14544+
case X86::BI__builtin_ia32_vec_set_v4hi:
1454914545
case X86::BI__builtin_ia32_vec_set_v16qi:
1455014546
case X86::BI__builtin_ia32_vec_set_v8hi:
1455114547
case X86::BI__builtin_ia32_vec_set_v4si:

clang/lib/Headers/emmintrin.h

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,12 @@ typedef __bf16 __m128bh __attribute__((__vector_size__(16), __aligned__(16)));
5252
#define __DEFAULT_FN_ATTRS \
5353
__attribute__((__always_inline__, __nodebug__, \
5454
__target__("sse2,no-evex512"), __min_vector_width__(128)))
55-
#define __DEFAULT_FN_ATTRS_MMX \
56-
__attribute__((__always_inline__, __nodebug__, \
57-
__target__("mmx,sse2,no-evex512"), __min_vector_width__(64)))
55+
56+
#define __trunc64(x) \
57+
(__m64) __builtin_shufflevector((__v2di)(x), __extension__(__v2di){}, 0)
58+
#define __anyext128(x) \
59+
(__m128i) __builtin_shufflevector((__v2si)(x), __extension__(__v2si){}, 0, \
60+
1, -1, -1)
5861

5962
/// Adds lower double-precision values in both operands and returns the
6063
/// sum in the lower 64 bits of the result. The upper 64 bits of the result
@@ -1486,8 +1489,8 @@ static __inline__ int __DEFAULT_FN_ATTRS _mm_cvttsd_si32(__m128d __a) {
14861489
/// \param __a
14871490
/// A 128-bit vector of [2 x double].
14881491
/// \returns A 64-bit vector of [2 x i32] containing the converted values.
1489-
static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_cvtpd_pi32(__m128d __a) {
1490-
return (__m64)__builtin_ia32_cvtpd2pi((__v2df)__a);
1492+
static __inline__ __m64 __DEFAULT_FN_ATTRS _mm_cvtpd_pi32(__m128d __a) {
1493+
return __trunc64(__builtin_ia32_cvtpd2dq((__v2df)__a));
14911494
}
14921495

14931496
/// Converts the two double-precision floating-point elements of a
@@ -1505,8 +1508,8 @@ static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_cvtpd_pi32(__m128d __a) {
15051508
/// \param __a
15061509
/// A 128-bit vector of [2 x double].
15071510
/// \returns A 64-bit vector of [2 x i32] containing the converted values.
1508-
static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_cvttpd_pi32(__m128d __a) {
1509-
return (__m64)__builtin_ia32_cvttpd2pi((__v2df)__a);
1511+
static __inline__ __m64 __DEFAULT_FN_ATTRS _mm_cvttpd_pi32(__m128d __a) {
1512+
return __trunc64(__builtin_ia32_cvttpd2dq((__v2df)__a));
15101513
}
15111514

15121515
/// Converts the two signed 32-bit integer elements of a 64-bit vector of
@@ -1520,8 +1523,8 @@ static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_cvttpd_pi32(__m128d __a) {
15201523
/// \param __a
15211524
/// A 64-bit vector of [2 x i32].
15221525
/// \returns A 128-bit vector of [2 x double] containing the converted values.
1523-
static __inline__ __m128d __DEFAULT_FN_ATTRS_MMX _mm_cvtpi32_pd(__m64 __a) {
1524-
return __builtin_ia32_cvtpi2pd((__v2si)__a);
1526+
static __inline__ __m128d __DEFAULT_FN_ATTRS _mm_cvtpi32_pd(__m64 __a) {
1527+
return (__m128d) __builtin_convertvector((__v2si)__a, __v2df);
15251528
}
15261529

15271530
/// Returns the low-order element of a 128-bit vector of [2 x double] as
@@ -2108,9 +2111,8 @@ static __inline__ __m128i __DEFAULT_FN_ATTRS _mm_add_epi32(__m128i __a,
21082111
/// \param __b
21092112
/// A 64-bit integer.
21102113
/// \returns A 64-bit integer containing the sum of both parameters.
2111-
static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_add_si64(__m64 __a,
2112-
__m64 __b) {
2113-
return (__m64)__builtin_ia32_paddq((__v1di)__a, (__v1di)__b);
2114+
static __inline__ __m64 __DEFAULT_FN_ATTRS _mm_add_si64(__m64 __a, __m64 __b) {
2115+
return (__m64)(((unsigned long long)__a) + ((unsigned long long)__b));
21142116
}
21152117

21162118
/// Adds the corresponding elements of two 128-bit vectors of [2 x i64],
@@ -2431,9 +2433,9 @@ static __inline__ __m128i __DEFAULT_FN_ATTRS _mm_mullo_epi16(__m128i __a,
24312433
/// \param __b
24322434
/// A 64-bit integer containing one of the source operands.
24332435
/// \returns A 64-bit integer vector containing the product of both operands.
2434-
static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_mul_su32(__m64 __a,
2435-
__m64 __b) {
2436-
return __builtin_ia32_pmuludq((__v2si)__a, (__v2si)__b);
2436+
static __inline__ __m64 __DEFAULT_FN_ATTRS _mm_mul_su32(__m64 __a, __m64 __b) {
2437+
return __trunc64(__builtin_ia32_pmuludq128((__v4si)__anyext128(__a),
2438+
(__v4si)__anyext128(__b)));
24372439
}
24382440

24392441
/// Multiplies 32-bit unsigned integer values contained in the lower
@@ -2539,9 +2541,8 @@ static __inline__ __m128i __DEFAULT_FN_ATTRS _mm_sub_epi32(__m128i __a,
25392541
/// A 64-bit integer vector containing the subtrahend.
25402542
/// \returns A 64-bit integer vector containing the difference of the values in
25412543
/// the operands.
2542-
static __inline__ __m64 __DEFAULT_FN_ATTRS_MMX _mm_sub_si64(__m64 __a,
2543-
__m64 __b) {
2544-
return (__m64)__builtin_ia32_psubq((__v1di)__a, (__v1di)__b);
2544+
static __inline__ __m64 __DEFAULT_FN_ATTRS _mm_sub_si64(__m64 __a, __m64 __b) {
2545+
return (__m64)((unsigned long long)__a - (unsigned long long)__b);
25452546
}
25462547

25472548
/// Subtracts the corresponding elements of two [2 x i64] vectors.
@@ -4889,8 +4890,10 @@ void _mm_pause(void);
48894890
#if defined(__cplusplus)
48904891
} // extern "C"
48914892
#endif
4893+
4894+
#undef __anyext128
4895+
#undef __trunc64
48924896
#undef __DEFAULT_FN_ATTRS
4893-
#undef __DEFAULT_FN_ATTRS_MMX
48944897

48954898
#define _MM_SHUFFLE2(x, y) (((x) << 1) | (y))
48964899

0 commit comments

Comments
 (0)