-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[StrTable] Mechanically convert NVPTX builtins to use TableGen #122873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang Author: Chandler Carruth (chandlerc) ChangesThis switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend. The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version. Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity: https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66 Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in significantly less efficient string tables than other targets due to the long repeated feature strings. Patch is 121.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122873.diff 6 Files Affected:
diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def
deleted file mode 100644
index 969dd9e41ebfa3..00000000000000
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ /dev/null
@@ -1,1116 +0,0 @@
-//===--- BuiltinsPTX.def - PTX Builtin function database ----*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file defines the PTX-specific builtin function database. Users of
-// this file must define the BUILTIN macro to make use of this information.
-//
-//===----------------------------------------------------------------------===//
-
-// The format of this database matches clang/Basic/Builtins.def.
-
-#if defined(BUILTIN) && !defined(TARGET_BUILTIN)
-# define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) BUILTIN(ID, TYPE, ATTRS)
-#endif
-
-#pragma push_macro("SM_53")
-#pragma push_macro("SM_70")
-#pragma push_macro("SM_72")
-#pragma push_macro("SM_75")
-#pragma push_macro("SM_80")
-#pragma push_macro("SM_86")
-#pragma push_macro("SM_87")
-#pragma push_macro("SM_89")
-#pragma push_macro("SM_90")
-#pragma push_macro("SM_90a")
-#pragma push_macro("SM_100")
-#define SM_100 "sm_100"
-#define SM_90a "sm_90a"
-#define SM_90 "sm_90|" SM_90a "|" SM_100
-#define SM_89 "sm_89|" SM_90
-#define SM_87 "sm_87|" SM_89
-#define SM_86 "sm_86|" SM_87
-#define SM_80 "sm_80|" SM_86
-#define SM_75 "sm_75|" SM_80
-#define SM_72 "sm_72|" SM_75
-#define SM_70 "sm_70|" SM_72
-
-#pragma push_macro("SM_60")
-#define SM_60 "sm_60|sm_61|sm_62|" SM_70
-#define SM_53 "sm_53|" SM_60
-
-#pragma push_macro("PTX42")
-#pragma push_macro("PTX60")
-#pragma push_macro("PTX61")
-#pragma push_macro("PTX62")
-#pragma push_macro("PTX63")
-#pragma push_macro("PTX64")
-#pragma push_macro("PTX65")
-#pragma push_macro("PTX70")
-#pragma push_macro("PTX71")
-#pragma push_macro("PTX72")
-#pragma push_macro("PTX73")
-#pragma push_macro("PTX74")
-#pragma push_macro("PTX75")
-#pragma push_macro("PTX76")
-#pragma push_macro("PTX77")
-#pragma push_macro("PTX78")
-#pragma push_macro("PTX80")
-#pragma push_macro("PTX81")
-#pragma push_macro("PTX82")
-#pragma push_macro("PTX83")
-#pragma push_macro("PTX84")
-#pragma push_macro("PTX85")
-#pragma push_macro("PTX86")
-#define PTX86 "ptx86"
-#define PTX85 "ptx85|" PTX86
-#define PTX84 "ptx84|" PTX85
-#define PTX83 "ptx83|" PTX84
-#define PTX82 "ptx82|" PTX83
-#define PTX81 "ptx81|" PTX82
-#define PTX80 "ptx80|" PTX81
-#define PTX78 "ptx78|" PTX80
-#define PTX77 "ptx77|" PTX78
-#define PTX76 "ptx76|" PTX77
-#define PTX75 "ptx75|" PTX76
-#define PTX74 "ptx74|" PTX75
-#define PTX73 "ptx73|" PTX74
-#define PTX72 "ptx72|" PTX73
-#define PTX71 "ptx71|" PTX72
-#define PTX70 "ptx70|" PTX71
-#define PTX65 "ptx65|" PTX70
-#define PTX64 "ptx64|" PTX65
-#define PTX63 "ptx63|" PTX64
-#define PTX62 "ptx62|" PTX63
-#define PTX61 "ptx61|" PTX62
-#define PTX60 "ptx60|" PTX61
-#define PTX42 "ptx42|" PTX60
-
-#pragma push_macro("AND")
-#define AND(a, b) "(" a "),(" b ")"
-
-// Special Registers
-
-BUILTIN(__nvvm_read_ptx_sreg_tid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_tid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_tid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_tid_w, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_ntid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ntid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ntid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ntid_w, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_w, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_w, "i", "nc")
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctarank, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctarank, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_is_explicit_cluster, "b", "nc", AND(SM_90, PTX78))
-
-BUILTIN(__nvvm_read_ptx_sreg_laneid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_warpid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nwarpid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_warpsize, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_smid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nsmid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_gridid, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_eq, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_le, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_lt, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_ge, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_gt, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_clock, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_clock64, "LLi", "n")
-BUILTIN(__nvvm_read_ptx_sreg_globaltimer, "LLi", "n")
-
-BUILTIN(__nvvm_read_ptx_sreg_pm0, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_pm1, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_pm2, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
-
-// MISC
-
-BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
-BUILTIN(__nvvm_exit, "v", "r")
-BUILTIN(__nvvm_reflect, "UicC*", "r")
-TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", AND(SM_70, PTX63))
-
-// Min Max
-
-TARGET_BUILTIN(__nvvm_fmin_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_xorsign_abs_f16, "hhh", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_bf16, "yyy", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_bf16, "yyy", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_bf16x2, "V2yV2yV2y", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_bf16x2, "V2yV2yV2y", "",
- AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmin_f, "fff", "")
-BUILTIN(__nvvm_fmin_ftz_f, "fff", "")
-TARGET_BUILTIN(__nvvm_fmin_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmin_d, "ddd", "")
-
-TARGET_BUILTIN(__nvvm_fmax_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_xorsign_abs_f16, "hhh", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_bf16, "yyy", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_bf16, "yyy", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_bf16x2, "V2yV2yV2y", "",
- AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_bf16x2, "V2yV2yV2y", "",
- AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmax_f, "fff", "")
-BUILTIN(__nvvm_fmax_ftz_f, "fff", "")
-TARGET_BUILTIN(__nvvm_fmax_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmax_d, "ddd", "")
-
-// Multiplication
-
-BUILTIN(__nvvm_mulhi_i, "iii", "")
-BUILTIN(__nvvm_mulhi_ui, "UiUiUi", "")
-BUILTIN(__nvvm_mulhi_ll, "LLiLLiLLi", "")
-BUILTIN(__nvvm_mulhi_ull, "ULLiULLiULLi", "")
-
-BUILTIN(__nvvm_mul_rn_ftz_f, "fff", "")
-BUILTIN(__nvvm_mul_rn_f, "fff", "")
-BUILTIN(__nvvm_mul_rz_ftz_f, "fff", "")
-BUILTIN(__nvvm_mul_rz_f, "fff", "")
-BUILTIN(__nvvm_mul_rm_ftz_f, "fff", "")
-BUILTIN(__nvvm_mul_rm_f, "fff", "")
-BUILTIN(__nvvm_mul_rp_ftz_f, "fff", "")
-BUILTIN(__nvvm_mul_rp_f, "fff", "")
-
-BUILTIN(__nvvm_mul_rn_d, "ddd", "")
-BUILTIN(__nvvm_mul_rz_d, "ddd", "")
-BUILTIN(__nvvm_mul_rm_d, "ddd", "")
-BUILTIN(__nvvm_mul_rp_d, "ddd", "")
-
-BUILTIN(__nvvm_mul24_i, "iii", "")
-BUILTIN(__nvvm_mul24_ui, "UiUiUi", "")
-
-// Div
-
-BUILTIN(__nvvm_div_approx_ftz_f, "fff", "")
-BUILTIN(__nvvm_div_approx_f, "fff", "")
-
-BUILTIN(__nvvm_div_rn_ftz_f, "fff", "")
-BUILTIN(__nvvm_div_rn_f, "fff", "")
-BUILTIN(__nvvm_div_rz_ftz_f, "fff", "")
-BUILTIN(__nvvm_div_rz_f, "fff", "")
-BUILTIN(__nvvm_div_rm_ftz_f, "fff", "")
-BUILTIN(__nvvm_div_rm_f, "fff", "")
-BUILTIN(__nvvm_div_rp_ftz_f, "fff", "")
-BUILTIN(__nvvm_div_rp_f, "fff", "")
-
-BUILTIN(__nvvm_div_rn_d, "ddd", "")
-BUILTIN(__nvvm_div_rz_d, "ddd", "")
-BUILTIN(__nvvm_div_rm_d, "ddd", "")
-BUILTIN(__nvvm_div_rp_d, "ddd", "")
-
-// Sad
-
-BUILTIN(__nvvm_sad_i, "iiii", "")
-BUILTIN(__nvvm_sad_ui, "UiUiUiUi", "")
-
-// Floor, Ceil
-
-BUILTIN(__nvvm_floor_ftz_f, "ff", "")
-BUILTIN(__nvvm_floor_f, "ff", "")
-BUILTIN(__nvvm_floor_d, "dd", "")
-
-BUILTIN(__nvvm_ceil_ftz_f, "ff", "")
-BUILTIN(__nvvm_ceil_f, "ff", "")
-BUILTIN(__nvvm_ceil_d, "dd", "")
-
-// Abs
-
-BUILTIN(__nvvm_fabs_ftz_f, "ff", "")
-BUILTIN(__nvvm_fabs_f, "ff", "")
-BUILTIN(__nvvm_fabs_d, "dd", "")
-
-// Round
-
-BUILTIN(__nvvm_round_ftz_f, "ff", "")
-BUILTIN(__nvvm_round_f, "ff", "")
-BUILTIN(__nvvm_round_d, "dd", "")
-
-// Trunc
-
-BUILTIN(__nvvm_trunc_ftz_f, "ff", "")
-BUILTIN(__nvvm_trunc_f, "ff", "")
-BUILTIN(__nvvm_trunc_d, "dd", "")
-
-// Saturate
-
-BUILTIN(__nvvm_saturate_ftz_f, "ff", "")
-BUILTIN(__nvvm_saturate_f, "ff", "")
-BUILTIN(__nvvm_saturate_d, "dd", "")
-
-// Exp2, Log2
-
-BUILTIN(__nvvm_ex2_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_ex2_approx_f, "ff", "")
-BUILTIN(__nvvm_ex2_approx_d, "dd", "")
-TARGET_BUILTIN(__nvvm_ex2_approx_f16, "hh", "", AND(SM_75, PTX70))
-TARGET_BUILTIN(__nvvm_ex2_approx_f16x2, "V2hV2h", "", AND(SM_75, PTX70))
-
-BUILTIN(__nvvm_lg2_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_lg2_approx_f, "ff", "")
-BUILTIN(__nvvm_lg2_approx_d, "dd", "")
-
-// Sin, Cos
-
-BUILTIN(__nvvm_sin_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_sin_approx_f, "ff", "")
-
-BUILTIN(__nvvm_cos_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_cos_approx_f, "ff", "")
-
-// Fma
-
-TARGET_BUILTIN(__nvvm_fma_rn_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_sat_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_sat_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_f16, "hhhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_relu_f16, "hhhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_sat_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_sat_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_f16x2, "V2hV2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_relu_f16x2, "V2hV2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_bf16, "yyyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_bf16, "yyyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_bf16x2, "V2yV2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_bf16x2, "V2yV2yV2yV2y", "", AND(SM_80, PTX70))
-BUILTIN(__nvvm_fma_rn_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rn_f, "ffff", "")
-BUILTIN(__nvvm_fma_rz_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rm_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rm_f, "ffff", "")
-BUILTIN(__nvvm_fma_rp_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rp_f, "ffff", "")
-BUILTIN(__nvvm_fma_rn_d, "dddd", "")
-BUILTIN(__nvvm_fma_rz_d, "dddd", "")
-BUILTIN(__nvvm_fma_rm_d, "dddd", "")
-BUILTIN(__nvvm_fma_rp_d, "dddd", "")
-
-// Rcp
-
-BUILTIN(__nvvm_rcp_rn_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rn_f, "ff", "")
-BUILTIN(__nvvm_rcp_rz_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rm_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rm_f, "ff", "")
-BUILTIN(__nvvm_rcp_rp_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rp_f, "ff", "")
-
-BUILTIN(__nvvm_rcp_rn_d, "dd", "")
-BUILTIN(__nvvm_rcp_rz_d, "dd", "")
-BUILTIN(__nvvm_rcp_rm_d, "dd", "")
-BUILTIN(__nvvm_rcp_rp_d, "dd", "")
-
-BUILTIN(__nvvm_rcp_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_approx_ftz_d, "dd", "")
-
-// Sqrt
-
-BUILTIN(__nvvm_sqrt_rn_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rn_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rz_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rm_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rm_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rp_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rp_f, "ff", "")
-BUILTIN(__nvvm_sqrt_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_approx_f, "ff", "")
-
-BUILTIN(__nvvm_sqrt_rn_d, "dd", "")
-BUILTIN(__nvvm_sqrt_rz_d, "dd", "")
-BUILTIN(__nvvm_sqrt_rm_d, "dd", "")
-BUILTIN(__nvvm_sqrt_rp_d, "dd", "")
-
-// Rsqrt
-
-BUILTIN(__nvvm_rsqrt_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_rsqrt_approx_f, "ff", "")
-BUILTIN(__nvvm_rsqrt_approx_d, "dd", "")
-
-// Add
-
-BUILTIN(__nvvm_add_rn_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rn_f, "fff", "")
-BUILTIN(__nvvm_add_rz_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rz_f, "fff", "")
-BUILTIN(__nvvm_add_rm_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rm_f, "fff", "")
-BUILTIN(__nvvm_add_rp_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rp_f, "fff", "")
-
-BUILTIN(__nvvm_add_rn_d, "ddd", "")
-BUILTIN(__nvvm_add_rz_d, "ddd", "")
-BUILTIN(__nvvm_add_rm_d, "ddd", "")
-BUILTIN(__nvvm_add_rp_d, "ddd", "")
-
-// Convert
-
-BUILTIN(__nvvm_d2f_rn_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rn, "fd", "")
-BUILTIN(__nvvm_d2f_rz_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rz, "fd", "")
-BUILTIN(__nvvm_d2f_rm_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rm, "fd", "")
-BUILTIN(__nvvm_d2f_rp_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rp, "fd", "")
-
-BUILTIN(__nvvm_d2i_rn, "id", "")
-BUILTIN(__nvvm_d2i_rz, "id", "")
-BUILTIN(__nvvm_d2i_rm, "id", "")
-BUILTIN(__nvvm_d2i_rp, "id", "")
-
-BUILTIN(__nvvm_d2ui_rn, "Uid", "")
-BUILTIN(__nvvm_d2ui_rz, "Uid", "")
-BUILTIN(__nvvm_d2ui_rm, "Uid", "")
-BUILTIN(__nvvm_d2ui_rp, "Uid", "")
-
-BUILTIN(__nvvm_i2d_rn, "di", "")
-BUILTIN(__nvvm_i2d_rz, "di", "")
-BUILTIN(__nvvm_i2d_rm, "di", "")
-BUILTIN(__nvvm_i2d_rp, "di", "")
-
-BUILTIN(__nvvm_ui2d_rn, "dUi", "")
-BUILTIN(__nvvm_ui2d_rz, "dUi", "")
-BUILTIN(__nvvm_ui2d_rm, "dUi", "")
-BUILTIN(__nvvm_ui2d_rp, "dUi", "")
-
-BUILTIN(__nvvm_f2i_rn_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rn, "if", "")
-BUILTIN(__nvvm_f2i_rz_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rz, "if", "")
-BUILTIN(__nvvm_f2i_rm_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rm, "if", "")
-BUILTIN(__nvvm_f2i_rp_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rp, "if", "")
-
-BUILTIN(__nvvm_f2ui_rn_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rn, "Uif", "")
-BUILTIN(__nvvm_f2ui_rz_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rm_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rm, "Uif", "")
-BUILTIN(__nvvm_f2ui_rp_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rp, "Uif", "")
-
-BUILTIN(__nvvm_i2f_rn, "fi", "")
-BUILTIN(__nvvm_i2f_rz, "fi", "")
-BUILTIN(__nvvm_i2f_rm, "fi", "")
-BUILTIN(__nvvm_i2f_rp, "fi", "")
-
-BUILTIN(__nvvm_ui2f_rn, "fUi", "")
-BUILTIN(__nvvm_ui2f_rz, "fUi", "")
-BUILTIN(__nvvm_ui2f_rm, "fUi", "")
-BUILTIN(__nvvm_ui2f_rp, "fUi", "")
-
-BUILTIN(__nvvm_lohi_i2d, "dii", "")
-
-BUILTIN(__nvvm_d2i_lo, "id", "")
-BUILTIN(__nvvm_d2i_hi, "id", "")
-
-BUILTIN(__nvvm_f2ll_rn_ftz, "LLif", "")
-BUI...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Note that the PR doesn't actually change the lines that If instead folks would prefer me to apply the changes from |
eb81d2d
to
ab21fc2
Compare
Well, I did this and wrote this reply after looking at the diff shown by But looking at it in GitHub, this file is already inconsistent and many parts are clang-formatted. Nevermind, no real consistency argument. I've updated the PR to be formatted. |
0ff09d9
to
8af82ca
Compare
Pushed an update that rebases on main, and more notably use an improved script that preserves the grouping of builtins and the comments describing them. I noticed that there were interesting and important comments here, so I reworked things so we don't lose any information. |
8af82ca
to
70d8066
Compare
Ping -- a week now with no review. |
This switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend. The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version. Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity: https://gist.github.com/chandlerc/5e5b5e4f023e1ee29babcbe486770d49 Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of `Z` and `U` both as *qualifiers* of a single type, treating `Z` as its own `int32_t` type. So my hacky Python script converted `ZUi` into two types, an `int32_t` and an `unsigned int`. This produced a very wrong prototype. But the tests caught this nicely and I fixed it manually rather than trying to improve the Python script as it occurred in exactly one place I could find. This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in *significantly* less efficient string tables than other targets due to the long repeated feature strings.
70d8066
to
2b92f63
Compare
Ping! I've updated this to incorporate the changes in #123398 to the NVPTX.def file this is replacing. Adding the author & reviewers of that PR to this -- I'd really like to either get this landed or figure out what other approach to use it avoid having to continually update this to reflect more changes. =] |
Thanks for this! |
def __nvvm_cp_async_ca_shared_global_4 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>; | ||
def __nvvm_cp_async_ca_shared_global_8 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>; | ||
def __nvvm_cp_async_ca_shared_global_16 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>; | ||
def __nvvm_cp_async_cg_shared_global_16 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, the "..." is the way varargs are represented, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
LGTM overall. I work with these builtins only occasionally. So, let us wait for Artem's review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall with a couple of nits.
I like the direction of the change. Tablegen is probably a better way to handle NVPTX quirks, than a preprocessor. We may finally be able to replace the string-based constraints that are growing a bit too long already with something more flexible (custom code blocks similar to LLVM's constraints + sensible error message generation to go along with them)
def PTX85 : PTX<"85", PTX86>; | ||
def PTX84 : PTX<"84", PTX85>; | ||
def PTX83 : PTX<"83", PTX84>; | ||
def PTX82 : PTX<"82", PTX83>; | ||
def PTX81 : PTX<"81", PTX82>; | ||
def PTX80 : PTX<"80", PTX81>; | ||
def PTX78 : PTX<"78", PTX80>; | ||
def PTX77 : PTX<"77", PTX78>; | ||
def PTX76 : PTX<"76", PTX77>; | ||
def PTX75 : PTX<"75", PTX76>; | ||
def PTX74 : PTX<"74", PTX75>; | ||
def PTX73 : PTX<"73", PTX74>; | ||
def PTX72 : PTX<"72", PTX73>; | ||
def PTX71 : PTX<"71", PTX72>; | ||
def PTX70 : PTX<"70", PTX71>; | ||
def PTX65 : PTX<"65", PTX70>; | ||
def PTX64 : PTX<"64", PTX65>; | ||
def PTX63 : PTX<"63", PTX64>; | ||
def PTX62 : PTX<"62", PTX63>; | ||
def PTX61 : PTX<"61", PTX62>; | ||
def PTX60 : PTX<"60", PTX61>; | ||
def PTX42 : PTX<"42", PTX60>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bulk of the record generation for SM and PTX versions could probably be collapsed further into a loop over the list of known versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? This seemed pretty compact. I'm not a TableGen expert though.
I'll leave that to a follow-up though for someone else... I'm really just trying to get off the X-macros to address string table issues.
If you're really worried about the intermediate state, let me know and I'll work on a follow-up myself, but this doesn't seem any worse than the preprocessor tricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine as is. Now that we've switched to tablegen this regular pattern is a low hanging fruit for the future cleanups.
if (T.consume_back("*")) { | ||
// Pointers may have an address space qualifier immediately before them. | ||
std::optional<unsigned> AS = ConsumeAddrSpace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests for the tablegen parser changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert at any of this... I'm just muddling my way through to try and make some tactical improvements.
There is a file target-builtins-prototype-parser.td
, but it doesn't test many aspects of this code... Given that it didn't seem to try to be comprehensive, and that NVPTX provided some pretty reasonable tests, I just focused there. That's also what I did with the analogous change to x86. 🤷
Let me know if you'd like me to add more tests to cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert at any of this... I'm just muddling my way through to try and make some tactical improvements.
Welcome to the club. This matches my own thoughts. :-)
I agree that it would not improve effective test coverage much, but it would be useful for whoever may need to tinker with these parts of tablegen in the future. Having wider range of inputs elsewhere is great, but it is way less convenient to work with, considering that it tends to be much larger and has additional irrelevant moving parts.
Thanks! I'm merging as is to unblock stuff. If you'd particularly like some follow-up, let me know, and I'll send those as follow-up PRs. I've left responses for why I picked the current structure in-line. =] |
This switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend.
The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version.
Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity:
https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66
Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of
Z
andU
both as qualifiers of a single type, treatingZ
as its ownint32_t
type. So my hacky Python script convertedZUi
into two types, anint32_t
and anunsigned int
. This produced a very wrong prototype. But the tests caught this nicely and I fixed it manually rather than trying to improve the Python script as it occurred in exactly one place I could find.This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in significantly less efficient string tables than other targets due to the long repeated feature strings.