Skip to content

[StrTable] Mechanically convert NVPTX builtins to use TableGen #122873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

chandlerc
Copy link
Member

This switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend.

The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version.

Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity:

https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66

Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of Z and U both as qualifiers of a single type, treating Z as its own int32_t type. So my hacky Python script converted ZUi into two types, an int32_t and an unsigned int. This produced a very wrong prototype. But the tests caught this nicely and I fixed it manually rather than trying to improve the Python script as it occurred in exactly one place I could find.

This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in significantly less efficient string tables than other targets due to the long repeated feature strings.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 14, 2025

@llvm/pr-subscribers-clang

Author: Chandler Carruth (chandlerc)

Changes

This switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend.

The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version.

Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity:

https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66

Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of Z and U both as qualifiers of a single type, treating Z as its own int32_t type. So my hacky Python script converted ZUi into two types, an int32_t and an unsigned int. This produced a very wrong prototype. But the tests caught this nicely and I fixed it manually rather than trying to improve the Python script as it occurred in exactly one place I could find.

This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in significantly less efficient string tables than other targets due to the long repeated feature strings.


Patch is 121.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122873.diff

6 Files Affected:

  • (removed) clang/include/clang/Basic/BuiltinsNVPTX.def (-1116)
  • (added) clang/include/clang/Basic/BuiltinsNVPTX.td (+885)
  • (modified) clang/include/clang/Basic/CMakeLists.txt (+4)
  • (modified) clang/include/clang/Basic/TargetBuiltins.h (+1-1)
  • (modified) clang/lib/Basic/Targets/NVPTX.cpp (+1-5)
  • (modified) clang/utils/TableGen/ClangBuiltinsEmitter.cpp (+37)
diff --git a/clang/include/clang/Basic/BuiltinsNVPTX.def b/clang/include/clang/Basic/BuiltinsNVPTX.def
deleted file mode 100644
index 969dd9e41ebfa3..00000000000000
--- a/clang/include/clang/Basic/BuiltinsNVPTX.def
+++ /dev/null
@@ -1,1116 +0,0 @@
-//===--- BuiltinsPTX.def - PTX Builtin function database ----*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file defines the PTX-specific builtin function database.  Users of
-// this file must define the BUILTIN macro to make use of this information.
-//
-//===----------------------------------------------------------------------===//
-
-// The format of this database matches clang/Basic/Builtins.def.
-
-#if defined(BUILTIN) && !defined(TARGET_BUILTIN)
-#   define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) BUILTIN(ID, TYPE, ATTRS)
-#endif
-
-#pragma push_macro("SM_53")
-#pragma push_macro("SM_70")
-#pragma push_macro("SM_72")
-#pragma push_macro("SM_75")
-#pragma push_macro("SM_80")
-#pragma push_macro("SM_86")
-#pragma push_macro("SM_87")
-#pragma push_macro("SM_89")
-#pragma push_macro("SM_90")
-#pragma push_macro("SM_90a")
-#pragma push_macro("SM_100")
-#define SM_100 "sm_100"
-#define SM_90a "sm_90a"
-#define SM_90 "sm_90|" SM_90a "|" SM_100
-#define SM_89 "sm_89|" SM_90
-#define SM_87 "sm_87|" SM_89
-#define SM_86 "sm_86|" SM_87
-#define SM_80 "sm_80|" SM_86
-#define SM_75 "sm_75|" SM_80
-#define SM_72 "sm_72|" SM_75
-#define SM_70 "sm_70|" SM_72
-
-#pragma push_macro("SM_60")
-#define SM_60 "sm_60|sm_61|sm_62|" SM_70
-#define SM_53 "sm_53|" SM_60
-
-#pragma push_macro("PTX42")
-#pragma push_macro("PTX60")
-#pragma push_macro("PTX61")
-#pragma push_macro("PTX62")
-#pragma push_macro("PTX63")
-#pragma push_macro("PTX64")
-#pragma push_macro("PTX65")
-#pragma push_macro("PTX70")
-#pragma push_macro("PTX71")
-#pragma push_macro("PTX72")
-#pragma push_macro("PTX73")
-#pragma push_macro("PTX74")
-#pragma push_macro("PTX75")
-#pragma push_macro("PTX76")
-#pragma push_macro("PTX77")
-#pragma push_macro("PTX78")
-#pragma push_macro("PTX80")
-#pragma push_macro("PTX81")
-#pragma push_macro("PTX82")
-#pragma push_macro("PTX83")
-#pragma push_macro("PTX84")
-#pragma push_macro("PTX85")
-#pragma push_macro("PTX86")
-#define PTX86 "ptx86"
-#define PTX85 "ptx85|" PTX86
-#define PTX84 "ptx84|" PTX85
-#define PTX83 "ptx83|" PTX84
-#define PTX82 "ptx82|" PTX83
-#define PTX81 "ptx81|" PTX82
-#define PTX80 "ptx80|" PTX81
-#define PTX78 "ptx78|" PTX80
-#define PTX77 "ptx77|" PTX78
-#define PTX76 "ptx76|" PTX77
-#define PTX75 "ptx75|" PTX76
-#define PTX74 "ptx74|" PTX75
-#define PTX73 "ptx73|" PTX74
-#define PTX72 "ptx72|" PTX73
-#define PTX71 "ptx71|" PTX72
-#define PTX70 "ptx70|" PTX71
-#define PTX65 "ptx65|" PTX70
-#define PTX64 "ptx64|" PTX65
-#define PTX63 "ptx63|" PTX64
-#define PTX62 "ptx62|" PTX63
-#define PTX61 "ptx61|" PTX62
-#define PTX60 "ptx60|" PTX61
-#define PTX42 "ptx42|" PTX60
-
-#pragma push_macro("AND")
-#define AND(a, b) "(" a "),(" b ")"
-
-// Special Registers
-
-BUILTIN(__nvvm_read_ptx_sreg_tid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_tid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_tid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_tid_w, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_ntid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ntid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ntid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ntid_w, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_ctaid_w, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_x, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_y, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_z, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nctaid_w, "i", "nc")
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_clusterid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_nclusterid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctaid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_x, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_y, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_z, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctaid_w, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_ctarank, "i", "nc", AND(SM_90, PTX78))
-TARGET_BUILTIN(__nvvm_read_ptx_sreg_cluster_nctarank, "i", "nc", AND(SM_90, PTX78))
-
-TARGET_BUILTIN(__nvvm_is_explicit_cluster, "b", "nc", AND(SM_90, PTX78))
-
-BUILTIN(__nvvm_read_ptx_sreg_laneid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_warpid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nwarpid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_warpsize, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_smid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_nsmid, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_gridid, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_eq, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_le, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_lt, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_ge, "i", "nc")
-BUILTIN(__nvvm_read_ptx_sreg_lanemask_gt, "i", "nc")
-
-BUILTIN(__nvvm_read_ptx_sreg_clock, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_clock64, "LLi", "n")
-BUILTIN(__nvvm_read_ptx_sreg_globaltimer, "LLi", "n")
-
-BUILTIN(__nvvm_read_ptx_sreg_pm0, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_pm1, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_pm2, "i", "n")
-BUILTIN(__nvvm_read_ptx_sreg_pm3, "i", "n")
-
-// MISC
-
-BUILTIN(__nvvm_prmt, "UiUiUiUi", "")
-BUILTIN(__nvvm_exit, "v", "r")
-BUILTIN(__nvvm_reflect, "UicC*", "r")
-TARGET_BUILTIN(__nvvm_nanosleep, "vUi", "n", AND(SM_70, PTX63))
-
-// Min Max
-
-TARGET_BUILTIN(__nvvm_fmin_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_xorsign_abs_f16, "hhh", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_bf16, "yyy", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_bf16, "yyy", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_bf16x2, "V2yV2yV2y", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_bf16x2, "V2yV2yV2y", "",
-               AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmin_f, "fff", "")
-BUILTIN(__nvvm_fmin_ftz_f, "fff", "")
-TARGET_BUILTIN(__nvvm_fmin_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmin_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmin_ftz_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmin_d, "ddd", "")
-
-TARGET_BUILTIN(__nvvm_fmax_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_f16, "hhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_f16, "hhh", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_xorsign_abs_f16, "hhh", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_f16x2, "V2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_xorsign_abs_f16x2, "V2hV2hV2h", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_bf16, "yyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_bf16, "yyy", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_bf16, "yyy", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_bf16x2, "V2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_bf16x2, "V2yV2yV2y", "",
-               AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_bf16x2, "V2yV2yV2y", "",
-               AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmax_f, "fff", "")
-BUILTIN(__nvvm_fmax_ftz_f, "fff", "")
-TARGET_BUILTIN(__nvvm_fmax_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_f, "fff", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fmax_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-TARGET_BUILTIN(__nvvm_fmax_ftz_nan_xorsign_abs_f, "fff", "", AND(SM_86, PTX72))
-BUILTIN(__nvvm_fmax_d, "ddd", "")
-
-// Multiplication
-
-BUILTIN(__nvvm_mulhi_i, "iii", "")
-BUILTIN(__nvvm_mulhi_ui, "UiUiUi", "")
-BUILTIN(__nvvm_mulhi_ll, "LLiLLiLLi", "")
-BUILTIN(__nvvm_mulhi_ull, "ULLiULLiULLi", "")
-
-BUILTIN(__nvvm_mul_rn_ftz_f,  "fff", "")
-BUILTIN(__nvvm_mul_rn_f,  "fff", "")
-BUILTIN(__nvvm_mul_rz_ftz_f,  "fff", "")
-BUILTIN(__nvvm_mul_rz_f,  "fff", "")
-BUILTIN(__nvvm_mul_rm_ftz_f,  "fff", "")
-BUILTIN(__nvvm_mul_rm_f,  "fff", "")
-BUILTIN(__nvvm_mul_rp_ftz_f,  "fff", "")
-BUILTIN(__nvvm_mul_rp_f,  "fff", "")
-
-BUILTIN(__nvvm_mul_rn_d,  "ddd", "")
-BUILTIN(__nvvm_mul_rz_d,  "ddd", "")
-BUILTIN(__nvvm_mul_rm_d,  "ddd", "")
-BUILTIN(__nvvm_mul_rp_d,  "ddd", "")
-
-BUILTIN(__nvvm_mul24_i,  "iii", "")
-BUILTIN(__nvvm_mul24_ui,  "UiUiUi", "")
-
-// Div
-
-BUILTIN(__nvvm_div_approx_ftz_f,  "fff", "")
-BUILTIN(__nvvm_div_approx_f,  "fff", "")
-
-BUILTIN(__nvvm_div_rn_ftz_f,  "fff", "")
-BUILTIN(__nvvm_div_rn_f,  "fff", "")
-BUILTIN(__nvvm_div_rz_ftz_f,  "fff", "")
-BUILTIN(__nvvm_div_rz_f,  "fff", "")
-BUILTIN(__nvvm_div_rm_ftz_f,  "fff", "")
-BUILTIN(__nvvm_div_rm_f,  "fff", "")
-BUILTIN(__nvvm_div_rp_ftz_f,  "fff", "")
-BUILTIN(__nvvm_div_rp_f,  "fff", "")
-
-BUILTIN(__nvvm_div_rn_d,  "ddd", "")
-BUILTIN(__nvvm_div_rz_d,  "ddd", "")
-BUILTIN(__nvvm_div_rm_d,  "ddd", "")
-BUILTIN(__nvvm_div_rp_d,  "ddd", "")
-
-// Sad
-
-BUILTIN(__nvvm_sad_i, "iiii", "")
-BUILTIN(__nvvm_sad_ui, "UiUiUiUi", "")
-
-// Floor, Ceil
-
-BUILTIN(__nvvm_floor_ftz_f, "ff", "")
-BUILTIN(__nvvm_floor_f, "ff", "")
-BUILTIN(__nvvm_floor_d, "dd", "")
-
-BUILTIN(__nvvm_ceil_ftz_f, "ff", "")
-BUILTIN(__nvvm_ceil_f, "ff", "")
-BUILTIN(__nvvm_ceil_d, "dd", "")
-
-// Abs
-
-BUILTIN(__nvvm_fabs_ftz_f, "ff", "")
-BUILTIN(__nvvm_fabs_f, "ff", "")
-BUILTIN(__nvvm_fabs_d, "dd", "")
-
-// Round
-
-BUILTIN(__nvvm_round_ftz_f, "ff", "")
-BUILTIN(__nvvm_round_f, "ff", "")
-BUILTIN(__nvvm_round_d, "dd", "")
-
-// Trunc
-
-BUILTIN(__nvvm_trunc_ftz_f, "ff", "")
-BUILTIN(__nvvm_trunc_f, "ff", "")
-BUILTIN(__nvvm_trunc_d, "dd", "")
-
-// Saturate
-
-BUILTIN(__nvvm_saturate_ftz_f, "ff", "")
-BUILTIN(__nvvm_saturate_f, "ff", "")
-BUILTIN(__nvvm_saturate_d, "dd", "")
-
-// Exp2, Log2
-
-BUILTIN(__nvvm_ex2_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_ex2_approx_f, "ff", "")
-BUILTIN(__nvvm_ex2_approx_d, "dd", "")
-TARGET_BUILTIN(__nvvm_ex2_approx_f16, "hh", "", AND(SM_75, PTX70))
-TARGET_BUILTIN(__nvvm_ex2_approx_f16x2, "V2hV2h", "", AND(SM_75, PTX70))
-
-BUILTIN(__nvvm_lg2_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_lg2_approx_f, "ff", "")
-BUILTIN(__nvvm_lg2_approx_d, "dd", "")
-
-// Sin, Cos
-
-BUILTIN(__nvvm_sin_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_sin_approx_f, "ff", "")
-
-BUILTIN(__nvvm_cos_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_cos_approx_f, "ff", "")
-
-// Fma
-
-TARGET_BUILTIN(__nvvm_fma_rn_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_sat_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_sat_f16, "hhhh", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_f16, "hhhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_relu_f16, "hhhh", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_sat_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_sat_f16x2, "V2hV2hV2hV2h", "", AND(SM_53, PTX42))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_f16x2, "V2hV2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_ftz_relu_f16x2, "V2hV2hV2hV2h", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_bf16, "yyyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_bf16, "yyyy", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_bf16x2, "V2yV2yV2yV2y", "", AND(SM_80, PTX70))
-TARGET_BUILTIN(__nvvm_fma_rn_relu_bf16x2, "V2yV2yV2yV2y", "", AND(SM_80, PTX70))
-BUILTIN(__nvvm_fma_rn_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rn_f, "ffff", "")
-BUILTIN(__nvvm_fma_rz_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rm_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rm_f, "ffff", "")
-BUILTIN(__nvvm_fma_rp_ftz_f, "ffff", "")
-BUILTIN(__nvvm_fma_rp_f, "ffff", "")
-BUILTIN(__nvvm_fma_rn_d, "dddd", "")
-BUILTIN(__nvvm_fma_rz_d, "dddd", "")
-BUILTIN(__nvvm_fma_rm_d, "dddd", "")
-BUILTIN(__nvvm_fma_rp_d, "dddd", "")
-
-// Rcp
-
-BUILTIN(__nvvm_rcp_rn_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rn_f, "ff", "")
-BUILTIN(__nvvm_rcp_rz_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rm_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rm_f, "ff", "")
-BUILTIN(__nvvm_rcp_rp_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_rp_f, "ff", "")
-
-BUILTIN(__nvvm_rcp_rn_d, "dd", "")
-BUILTIN(__nvvm_rcp_rz_d, "dd", "")
-BUILTIN(__nvvm_rcp_rm_d, "dd", "")
-BUILTIN(__nvvm_rcp_rp_d, "dd", "")
-
-BUILTIN(__nvvm_rcp_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_rcp_approx_ftz_d, "dd", "")
-
-// Sqrt
-
-BUILTIN(__nvvm_sqrt_rn_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rn_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rz_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rm_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rm_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rp_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_rp_f, "ff", "")
-BUILTIN(__nvvm_sqrt_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_sqrt_approx_f, "ff", "")
-
-BUILTIN(__nvvm_sqrt_rn_d, "dd", "")
-BUILTIN(__nvvm_sqrt_rz_d, "dd", "")
-BUILTIN(__nvvm_sqrt_rm_d, "dd", "")
-BUILTIN(__nvvm_sqrt_rp_d, "dd", "")
-
-// Rsqrt
-
-BUILTIN(__nvvm_rsqrt_approx_ftz_f, "ff", "")
-BUILTIN(__nvvm_rsqrt_approx_f, "ff", "")
-BUILTIN(__nvvm_rsqrt_approx_d, "dd", "")
-
-// Add
-
-BUILTIN(__nvvm_add_rn_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rn_f, "fff", "")
-BUILTIN(__nvvm_add_rz_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rz_f, "fff", "")
-BUILTIN(__nvvm_add_rm_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rm_f, "fff", "")
-BUILTIN(__nvvm_add_rp_ftz_f, "fff", "")
-BUILTIN(__nvvm_add_rp_f, "fff", "")
-
-BUILTIN(__nvvm_add_rn_d, "ddd", "")
-BUILTIN(__nvvm_add_rz_d, "ddd", "")
-BUILTIN(__nvvm_add_rm_d, "ddd", "")
-BUILTIN(__nvvm_add_rp_d, "ddd", "")
-
-// Convert
-
-BUILTIN(__nvvm_d2f_rn_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rn, "fd", "")
-BUILTIN(__nvvm_d2f_rz_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rz, "fd", "")
-BUILTIN(__nvvm_d2f_rm_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rm, "fd", "")
-BUILTIN(__nvvm_d2f_rp_ftz, "fd", "")
-BUILTIN(__nvvm_d2f_rp, "fd", "")
-
-BUILTIN(__nvvm_d2i_rn, "id", "")
-BUILTIN(__nvvm_d2i_rz, "id", "")
-BUILTIN(__nvvm_d2i_rm, "id", "")
-BUILTIN(__nvvm_d2i_rp, "id", "")
-
-BUILTIN(__nvvm_d2ui_rn, "Uid", "")
-BUILTIN(__nvvm_d2ui_rz, "Uid", "")
-BUILTIN(__nvvm_d2ui_rm, "Uid", "")
-BUILTIN(__nvvm_d2ui_rp, "Uid", "")
-
-BUILTIN(__nvvm_i2d_rn, "di", "")
-BUILTIN(__nvvm_i2d_rz, "di", "")
-BUILTIN(__nvvm_i2d_rm, "di", "")
-BUILTIN(__nvvm_i2d_rp, "di", "")
-
-BUILTIN(__nvvm_ui2d_rn, "dUi", "")
-BUILTIN(__nvvm_ui2d_rz, "dUi", "")
-BUILTIN(__nvvm_ui2d_rm, "dUi", "")
-BUILTIN(__nvvm_ui2d_rp, "dUi", "")
-
-BUILTIN(__nvvm_f2i_rn_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rn, "if", "")
-BUILTIN(__nvvm_f2i_rz_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rz, "if", "")
-BUILTIN(__nvvm_f2i_rm_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rm, "if", "")
-BUILTIN(__nvvm_f2i_rp_ftz, "if", "")
-BUILTIN(__nvvm_f2i_rp, "if", "")
-
-BUILTIN(__nvvm_f2ui_rn_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rn, "Uif", "")
-BUILTIN(__nvvm_f2ui_rz_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rm_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rm, "Uif", "")
-BUILTIN(__nvvm_f2ui_rp_ftz, "Uif", "")
-BUILTIN(__nvvm_f2ui_rp, "Uif", "")
-
-BUILTIN(__nvvm_i2f_rn, "fi", "")
-BUILTIN(__nvvm_i2f_rz, "fi", "")
-BUILTIN(__nvvm_i2f_rm, "fi", "")
-BUILTIN(__nvvm_i2f_rp, "fi", "")
-
-BUILTIN(__nvvm_ui2f_rn, "fUi", "")
-BUILTIN(__nvvm_ui2f_rz, "fUi", "")
-BUILTIN(__nvvm_ui2f_rm, "fUi", "")
-BUILTIN(__nvvm_ui2f_rp, "fUi", "")
-
-BUILTIN(__nvvm_lohi_i2d, "dii", "")
-
-BUILTIN(__nvvm_d2i_lo, "id", "")
-BUILTIN(__nvvm_d2i_hi, "id", "")
-
-BUILTIN(__nvvm_f2ll_rn_ftz, "LLif", "")
-BUI...
[truncated]

Copy link

github-actions bot commented Jan 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@chandlerc
Copy link
Member Author

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

Note that the PR doesn't actually change the lines that clang-format changes here, and the clang-format change makes these lines inconsistent with the rest of the file, so I intentionally did not apply these formatting changes. My guess was that the community would prefer to not have deviations from the surrounding file introduced like this.

If instead folks would prefer me to apply the changes from clang-format, happy to do so. Just let me know.

@chandlerc
Copy link
Member Author

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

Note that the PR doesn't actually change the lines that clang-format changes here, and the clang-format change makes these lines inconsistent with the rest of the file, so I intentionally did not apply these formatting changes. My guess was that the community would prefer to not have deviations from the surrounding file introduced like this.

Well, I did this and wrote this reply after looking at the diff shown by git...

But looking at it in GitHub, this file is already inconsistent and many parts are clang-formatted. Nevermind, no real consistency argument. I've updated the PR to be formatted.

@chandlerc chandlerc force-pushed the tg-nvptx-builtins branch 2 times, most recently from 0ff09d9 to 8af82ca Compare January 18, 2025 07:44
@chandlerc
Copy link
Member Author

Pushed an update that rebases on main, and more notably use an improved script that preserves the grouping of builtins and the comments describing them. I noticed that there were interesting and important comments here, so I reworked things so we don't lose any information.

@chandlerc chandlerc changed the title Mechanically convert NVPTX builtins to use TableGen [StrTable] Mechanically convert NVPTX builtins to use TableGen Jan 20, 2025
@chandlerc
Copy link
Member Author

Ping -- a week now with no review.

This switches them to use tho common TableGen layer, extending it to
support the missing features needed by the NVPTX backend.

The biggest thing was to build a TableGen system that computes the
cumulative SM and PTX feature sets the same way the macros did. That's
done with some string concatenation tricks in TableGen, but they worked
out pretty neatly and are very comparable in complexity to the macro
version.

Then the actual defines were mapped over using a very hacky Python
script. It was never productionized or intended to work in the future,
but for posterity:

https://gist.github.com/chandlerc/5e5b5e4f023e1ee29babcbe486770d49

Last but not least, there was a very odd "bug" in one of the converted
builtins' prototype in the TableGen model: it didn't handle uses of `Z`
and `U` both as *qualifiers* of a single type, treating `Z` as its own
`int32_t` type. So my hacky Python script converted `ZUi` into two
types, an `int32_t` and an `unsigned int`. This produced a very wrong
prototype. But the tests caught this nicely and I fixed it manually
rather than trying to improve the Python script as it occurred in
exactly one place I could find.

This should provide direct benefits of allowing future refactorings to
more directly leverage TableGen to express builtins more structurally
rather than textually. It will also make my efforts to move builtins to
string tables significantly more effective for the NVPTX backend where
the X-macro approach resulted in *significantly* less efficient string
tables than other targets due to the long repeated feature strings.
@chandlerc
Copy link
Member Author

Ping!

I've updated this to incorporate the changes in #123398 to the NVPTX.def file this is replacing.

Adding the author & reviewers of that PR to this -- I'd really like to either get this landed or figure out what other approach to use it avoid having to continually update this to reflect more changes. =]

@durga4github
Copy link
Contributor

Ping!

I've updated this to incorporate the changes in #123398 to the NVPTX.def file this is replacing.

Thanks for this!

def __nvvm_cp_async_ca_shared_global_4 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>;
def __nvvm_cp_async_ca_shared_global_8 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>;
def __nvvm_cp_async_ca_shared_global_16 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>;
def __nvvm_cp_async_cg_shared_global_16 : NVPTXBuiltinSMAndPTX<"void(void address_space<3> *, void const address_space<1> *, ...)", SM_80, PTX70>;
Copy link
Contributor

@durga4github durga4github Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, the "..." is the way varargs are represented, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@durga4github
Copy link
Contributor

LGTM overall. I work with these builtins only occasionally. So, let us wait for Artem's review.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with a couple of nits.

I like the direction of the change. Tablegen is probably a better way to handle NVPTX quirks, than a preprocessor. We may finally be able to replace the string-based constraints that are growing a bit too long already with something more flexible (custom code blocks similar to LLVM's constraints + sensible error message generation to go along with them)

Comment on lines +53 to +74
def PTX85 : PTX<"85", PTX86>;
def PTX84 : PTX<"84", PTX85>;
def PTX83 : PTX<"83", PTX84>;
def PTX82 : PTX<"82", PTX83>;
def PTX81 : PTX<"81", PTX82>;
def PTX80 : PTX<"80", PTX81>;
def PTX78 : PTX<"78", PTX80>;
def PTX77 : PTX<"77", PTX78>;
def PTX76 : PTX<"76", PTX77>;
def PTX75 : PTX<"75", PTX76>;
def PTX74 : PTX<"74", PTX75>;
def PTX73 : PTX<"73", PTX74>;
def PTX72 : PTX<"72", PTX73>;
def PTX71 : PTX<"71", PTX72>;
def PTX70 : PTX<"70", PTX71>;
def PTX65 : PTX<"65", PTX70>;
def PTX64 : PTX<"64", PTX65>;
def PTX63 : PTX<"63", PTX64>;
def PTX62 : PTX<"62", PTX63>;
def PTX61 : PTX<"61", PTX62>;
def PTX60 : PTX<"60", PTX61>;
def PTX42 : PTX<"42", PTX60>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bulk of the record generation for SM and PTX versions could probably be collapsed further into a loop over the list of known versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? This seemed pretty compact. I'm not a TableGen expert though.

I'll leave that to a follow-up though for someone else... I'm really just trying to get off the X-macros to address string table issues.

If you're really worried about the intermediate state, let me know and I'll work on a follow-up myself, but this doesn't seem any worse than the preprocessor tricks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine as is. Now that we've switched to tablegen this regular pattern is a low hanging fruit for the future cleanups.

if (T.consume_back("*")) {
// Pointers may have an address space qualifier immediately before them.
std::optional<unsigned> AS = ConsumeAddrSpace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests for the tablegen parser changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert at any of this... I'm just muddling my way through to try and make some tactical improvements.

There is a file target-builtins-prototype-parser.td, but it doesn't test many aspects of this code... Given that it didn't seem to try to be comprehensive, and that NVPTX provided some pretty reasonable tests, I just focused there. That's also what I did with the analogous change to x86. 🤷

Let me know if you'd like me to add more tests to cover this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert at any of this... I'm just muddling my way through to try and make some tactical improvements.

Welcome to the club. This matches my own thoughts. :-)

I agree that it would not improve effective test coverage much, but it would be useful for whoever may need to tinker with these parts of tablegen in the future. Having wider range of inputs elsewhere is great, but it is way less convenient to work with, considering that it tends to be much larger and has additional irrelevant moving parts.

@chandlerc
Copy link
Member Author

LGTM overall with a couple of nits.

Thanks! I'm merging as is to unblock stuff. If you'd particularly like some follow-up, let me know, and I'll send those as follow-up PRs. I've left responses for why I picked the current structure in-line. =]

@chandlerc chandlerc merged commit b968fd9 into llvm:main Jan 28, 2025
8 checks passed
@chandlerc chandlerc deleted the tg-nvptx-builtins branch January 28, 2025 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants