Skip to content

sync : ggml #2953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
Mar 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
a483632
scripts : update sync
ggerganov Mar 27, 2025
b86a9d3
cmake: Comment out GGML_BIN_DIR for now (ggml/1139)
ckastner Mar 10, 2025
5852519
cmake: Enable specifying exact PowerPC CPU architecture (ggml/1138)
ckastner Mar 10, 2025
17e9fa0
ggml : skip intermediate .air file when compiling .metallib (llama/12…
danbev Mar 7, 2025
fd1017e
ggml-backend : make path_str compatible with C++20 (llama/12269)
ctrysbita Mar 8, 2025
a62e631
opencl: use OpenCL C standard supported by the device (llama/12221)
linehill Mar 10, 2025
06c9ec3
musa: support new arch mp_31 and update doc (llama/12296)
yeahdongcn Mar 10, 2025
31b7816
mat vec double buffer (llama/12188)
netrunnereve Mar 10, 2025
df910e1
metal : Cache the Metal library at the device context level (llama/12…
BB-fat Mar 11, 2025
aba8f6b
ggml-backend : fix backend search path (llama/12330)
jklincn Mar 11, 2025
e19010d
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …
IMbackK Mar 11, 2025
287cabf
vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)
jeffbolznv Mar 12, 2025
7a610d9
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
IMbackK Mar 12, 2025
13b405c
sycl : variable sg_size support for mmvq kernels (llama/12336)
Alcpz Mar 12, 2025
a3792c5
MUL_MAT optimization (llama/12382)
noemotiovon Mar 15, 2025
da2a49c
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (…
fairydreaming Mar 15, 2025
d8c4fea
SYCL: Delete redundant plus sign and space (llama/12391)
aubreyli Mar 15, 2025
3486a3e
SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)
qnixsynapse Mar 17, 2025
f542b3f
cmake : enable building llama.cpp using system libggml (llama/12321)
ckastner Mar 17, 2025
6867da5
vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)
jeffbolznv Mar 17, 2025
f207c27
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…
jeffbolznv Mar 17, 2025
a870ef9
vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)
jeffbolznv Mar 17, 2025
998c027
vulkan: subgroup size tuning (llama/12087)
daniandtheweb Mar 17, 2025
187e0ea
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)
jeffbolznv Mar 17, 2025
41fd69c
ggml-vulkan: remove unused find_program(glslc) (llama/12416)
guusw Mar 17, 2025
1b9e698
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
gaugarg-nv Mar 17, 2025
89d7fe0
llama: Add support for RWKV v7 architecture (llama/12412)
MollySophia Mar 17, 2025
4211248
fixed compilation warnings in ggml-sycl (llama/12424)
lslusarczyk Mar 18, 2025
2291a60
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…
0cc4m Mar 18, 2025
7f7b60c
ggml : add SVE support for q6_K_q8_K (llama/12361)
fj-y-saito Mar 18, 2025
4148f21
SYCL: using graphs is configurable by environment variable and compil…
lslusarczyk Mar 18, 2025
fdd1b5a
musa: override warp_size of musa device to 32 (llama/12445)
yeahdongcn Mar 18, 2025
0937343
opencl: improve profiling (llama/12442)
lhez Mar 18, 2025
4c38eba
vulkan: Submit once enough matmul work has been recorded (llama/12406)
jeffbolznv Mar 19, 2025
b5150fb
Fix visionOS build and add CI (llama/12415)
guusw Mar 19, 2025
558be6c
vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)
jeffbolznv Mar 19, 2025
c6af07f
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llam…
gaugarg-nv Mar 19, 2025
82dcc7e
ggml : block interleaving support for Q4_K quantization for x86 AVX2 …
Srihari-mcw Mar 20, 2025
9ed8723
sycl: cleanup oneDNN related code (llama/12097)
sgeor255 Mar 21, 2025
fb34bd2
Fix build on Windows when ccache enabled (ggml/9954) (llama/9976)
shou692199 Mar 21, 2025
56ef27a
vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/1…
netrunnereve Mar 21, 2025
e40edc7
Vulkan: RTE rounding for cpy to quant (llama/12480)
stduhpf Mar 21, 2025
68bd610
vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)
jeffbolznv Mar 22, 2025
79fa117
musa: refine compute capability (llama/12493)
yeahdongcn Mar 22, 2025
d2a5593
ggml : fix quantized cpy op (llama/12310)
ggerganov Mar 22, 2025
2e99ad1
vulkan: fix mul_mat_vec failure in backend tests (llama/12529)
jeffbolznv Mar 24, 2025
0f041a1
CUDA: Fix clang warnings (llama/12540)
yeahdongcn Mar 24, 2025
37113d1
opencl: simplify kernel embedding logic in cmakefile (llama/12503)
lhez Mar 24, 2025
6e05a8d
SYCL: disable Q4_0 reorder optimization (llama/12560)
qnixsynapse Mar 25, 2025
84b9ebe
ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)
eddnjjn Mar 25, 2025
5dd73ab
ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)
ggerganov Mar 26, 2025
d2de670
metal : refactor mat-vec code (llama/12569)
ggerganov Mar 26, 2025
a33c0f6
HIP: Add support for RDNA4 targets (llama/12372)
slojosic-amd Mar 26, 2025
fa94edc
SYCL: implement memset ggml backend buffer interface (llama/12580)
qnixsynapse Mar 27, 2025
96a8d31
llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)
amritahs-ibm Mar 27, 2025
550bbbe
ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)
ggerganov Mar 27, 2025
63067b0
sync : ggml
ggerganov Mar 27, 2025
c30117d
files : remove old wkv6 (#0)
ggerganov Mar 27, 2025
b834c53
xcf : fix visionOS build
ggerganov Mar 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions build-xcframework.sh
Original file line number Diff line number Diff line change
Expand Up @@ -448,8 +448,8 @@ cmake -B build-visionos -G Xcode \
-DCMAKE_SYSTEM_NAME=visionOS \
-DCMAKE_OSX_SYSROOT=xros \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=xros \
-DCMAKE_C_FLAGS="-D_XOPEN_SOURCE=700 -Du_int=unsigned\ int -Du_char=unsigned\ char -Du_short=unsigned\ short ${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="-D_XOPEN_SOURCE=700 -Du_int=unsigned\ int -Du_char=unsigned\ char -Du_short=unsigned\ short ${COMMON_CXX_FLAGS}" \
-DCMAKE_C_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_CXX_FLAGS}" \
-S .
cmake --build build-visionos --config Release -- -quiet

Expand All @@ -461,8 +461,8 @@ cmake -B build-visionos-sim -G Xcode \
-DCMAKE_SYSTEM_NAME=visionOS \
-DCMAKE_OSX_SYSROOT=xrsimulator \
-DCMAKE_XCODE_ATTRIBUTE_SUPPORTED_PLATFORMS=xrsimulator \
-DCMAKE_C_FLAGS="-D_XOPEN_SOURCE=700 -Du_int=unsigned\ int -Du_char=unsigned\ char -Du_short=unsigned\ short ${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="-D_XOPEN_SOURCE=700 -Du_int=unsigned\ int -Du_char=unsigned\ char -Du_short=unsigned\ short ${COMMON_CXX_FLAGS}" \
-DCMAKE_C_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_C_FLAGS}" \
-DCMAKE_CXX_FLAGS="-D_XOPEN_SOURCE=700 ${COMMON_CXX_FLAGS}" \
-S .
cmake --build build-visionos-sim --config Release -- -quiet

Expand Down
7 changes: 6 additions & 1 deletion ggml/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,10 +127,12 @@ endif()
option(GGML_LASX "ggml: enable lasx" ON)
option(GGML_LSX "ggml: enable lsx" ON)
option(GGML_RVV "ggml: enable rvv" ON)
option(GGML_RV_ZFH "ggml: enable riscv zfh" OFF)
option(GGML_VXE "ggml: enable vxe" ON)

option(GGML_CPU_ALL_VARIANTS "ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)" OFF)
set(GGML_CPU_ARM_ARCH "" CACHE STRING "ggml: CPU architecture for ARM")
set(GGML_CPU_ARM_ARCH "" CACHE STRING "ggml: CPU architecture for ARM")
set(GGML_CPU_POWERPC_CPUTYPE "" CACHE STRING "ggml: CPU type for PowerPC")


if (WIN32)
Expand Down Expand Up @@ -190,6 +192,7 @@ option(GGML_OPENMP "ggml: use OpenMP"
option(GGML_RPC "ggml: use RPC" OFF)
option(GGML_SYCL "ggml: use SYCL" OFF)
option(GGML_SYCL_F16 "ggml: use 16 bit floats for sycl calculations" OFF)
option(GGML_SYCL_GRAPH "ggml: enable graphs in the SYCL backend" ON)
set (GGML_SYCL_TARGET "INTEL" CACHE STRING
"ggml: sycl target device")
set (GGML_SYCL_DEVICE_ARCH "" CACHE STRING
Expand All @@ -199,6 +202,8 @@ option(GGML_OPENCL "ggml: use OpenCL"
option(GGML_OPENCL_PROFILING "ggml: use OpenCL profiling (increases overhead)" OFF)
option(GGML_OPENCL_EMBED_KERNELS "ggml: embed kernels" ON)
option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adreno" ON)
set (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING
"gmml: OpenCL API version to target")

# toolchain for vulkan-shaders-gen
set (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen")
Expand Down
26 changes: 26 additions & 0 deletions ggml/cmake/common.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
function(ggml_get_flags CCID CCVER)
set(C_FLAGS "")
set(CXX_FLAGS "")

if (CCID MATCHES "Clang")
set(C_FLAGS -Wunreachable-code-break -Wunreachable-code-return)
set(CXX_FLAGS -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi)

if (
(CCID STREQUAL "Clang" AND CCVER VERSION_GREATER_EQUAL 3.8.0) OR
(CCID STREQUAL "AppleClang" AND CCVER VERSION_GREATER_EQUAL 7.3.0)
)
list(APPEND C_FLAGS -Wdouble-promotion)
endif()
elseif (CCID STREQUAL "GNU")
set(C_FLAGS -Wdouble-promotion)
set(CXX_FLAGS -Wno-array-bounds)

if (CCVER VERSION_GREATER_EQUAL 8.1.0)
list(APPEND CXX_FLAGS -Wextra-semi)
endif()
endif()

set(GF_C_FLAGS ${C_FLAGS} PARENT_SCOPE)
set(GF_CXX_FLAGS ${CXX_FLAGS} PARENT_SCOPE)
endfunction()
2 changes: 1 addition & 1 deletion ggml/cmake/ggml-config.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

set_and_check(GGML_INCLUDE_DIR "@PACKAGE_GGML_INCLUDE_INSTALL_DIR@")
set_and_check(GGML_LIB_DIR "@PACKAGE_GGML_LIB_INSTALL_DIR@")
set_and_check(GGML_BIN_DIR "@PACKAGE_GGML_BIN_INSTALL_DIR@")
#set_and_check(GGML_BIN_DIR "@PACKAGE_GGML_BIN_INSTALL_DIR@")

find_package(Threads REQUIRED)

Expand Down
24 changes: 24 additions & 0 deletions ggml/include/ggml.h
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,7 @@ extern "C" {
GGML_OP_RMS_NORM,
GGML_OP_RMS_NORM_BACK,
GGML_OP_GROUP_NORM,
GGML_OP_L2_NORM,

GGML_OP_MUL_MAT,
GGML_OP_MUL_MAT_ID,
Expand Down Expand Up @@ -502,6 +503,7 @@ extern "C" {
GGML_OP_ADD_REL_POS,
GGML_OP_RWKV_WKV6,
GGML_OP_GATED_LINEAR_ATTN,
GGML_OP_RWKV_WKV7,

GGML_OP_UNARY,

Expand Down Expand Up @@ -1095,6 +1097,18 @@ extern "C" {
int n_groups,
float eps);

// l2 normalize along rows
// used in rwkv v7
GGML_API struct ggml_tensor * ggml_l2_norm(
struct ggml_context * ctx,
struct ggml_tensor * a,
float eps);

GGML_API struct ggml_tensor * ggml_l2_norm_inplace(
struct ggml_context * ctx,
struct ggml_tensor * a,
float eps);

// a - x
// b - dy
GGML_API struct ggml_tensor * ggml_rms_norm_back(
Expand Down Expand Up @@ -1890,6 +1904,16 @@ extern "C" {
struct ggml_tensor * state,
float scale);

GGML_API struct ggml_tensor * ggml_rwkv_wkv7(
struct ggml_context * ctx,
struct ggml_tensor * r,
struct ggml_tensor * w,
struct ggml_tensor * k,
struct ggml_tensor * v,
struct ggml_tensor * a,
struct ggml_tensor * b,
struct ggml_tensor * state);

// custom operators

typedef void (*ggml_unary_op_f32_t) (const int, float *, const float *);
Expand Down
38 changes: 10 additions & 28 deletions ggml/src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
include(CheckCXXCompilerFlag)
include("../cmake/common.cmake")

add_compile_definitions(GGML_SCHED_MAX_COPIES=${GGML_SCHED_MAX_COPIES})

Expand All @@ -24,33 +25,6 @@ if (NOT MSVC)
endif()
endif()

function(ggml_get_flags CCID CCVER)
set(C_FLAGS "")
set(CXX_FLAGS "")

if (CCID MATCHES "Clang")
set(C_FLAGS -Wunreachable-code-break -Wunreachable-code-return)
set(CXX_FLAGS -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi)

if (
(CCID STREQUAL "Clang" AND CCVER VERSION_GREATER_EQUAL 3.8.0) OR
(CCID STREQUAL "AppleClang" AND CCVER VERSION_GREATER_EQUAL 7.3.0)
)
list(APPEND C_FLAGS -Wdouble-promotion)
endif()
elseif (CCID STREQUAL "GNU")
set(C_FLAGS -Wdouble-promotion)
set(CXX_FLAGS -Wno-array-bounds)

if (CCVER VERSION_GREATER_EQUAL 8.1.0)
list(APPEND CXX_FLAGS -Wextra-semi)
endif()
endif()

set(GF_C_FLAGS ${C_FLAGS} PARENT_SCOPE)
set(GF_CXX_FLAGS ${CXX_FLAGS} PARENT_SCOPE)
endfunction()

if (GGML_FATAL_WARNINGS)
if (CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR CMAKE_CXX_COMPILER_ID MATCHES "Clang")
list(APPEND C_FLAGS -Werror)
Expand Down Expand Up @@ -102,7 +76,11 @@ if (GGML_CCACHE)
set(GGML_CCACHE_VARIANT sccache)
endif()
# TODO: should not be set globally
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${GGML_CCACHE_VARIANT}")
if (GGML_SYCL AND GGML_CCACHE_FOUND AND WIN32)
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "ccache compiler_type=icl")
else ()
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${GGML_CCACHE_VARIANT}")
endif ()
set(ENV{CCACHE_SLOPPINESS} time_macros)
message(STATUS "${GGML_CCACHE_VARIANT} found, compilation results will be cached. Disable with GGML_CCACHE=OFF.")
else()
Expand Down Expand Up @@ -351,6 +329,10 @@ if (CMAKE_SYSTEM_NAME MATCHES "Android")
target_link_libraries(ggml-base PRIVATE dl)
endif()

if(CMAKE_SYSTEM_NAME MATCHES "visionOS")
target_compile_definitions(ggml-base PUBLIC _DARWIN_C_SOURCE)
endif()

if (BUILD_SHARED_LIBS)
foreach (target ggml-base ggml)
set_target_properties(${target} PROPERTIES POSITION_INDEPENDENT_CODE ON)
Expand Down
17 changes: 12 additions & 5 deletions ggml/src/ggml-backend-reg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,14 @@ namespace fs = std::filesystem;
static std::string path_str(const fs::path & path) {
std::string u8path;
try {
#if defined(__cpp_lib_char8_t)
// C++20 and later: u8string() returns std::u8string
std::u8string u8str = path.u8string();
u8path = std::string(reinterpret_cast<const char*>(u8str.c_str()));
#else
// C++17: u8string() returns std::string
u8path = path.u8string();
#endif
} catch (...) {
}
return u8path;
Expand Down Expand Up @@ -490,7 +497,7 @@ static ggml_backend_reg_t ggml_backend_load_best(const char * name, bool silent,
search_paths.push_back(get_executable_path());
search_paths.push_back(fs::current_path());
} else {
search_paths.push_back(user_search_path);
search_paths.push_back(fs::u8path(user_search_path));
}

int best_score = 0;
Expand All @@ -504,9 +511,9 @@ static ggml_backend_reg_t ggml_backend_load_best(const char * name, bool silent,
fs::directory_iterator dir_it(search_path, fs::directory_options::skip_permission_denied);
for (const auto & entry : dir_it) {
if (entry.is_regular_file()) {
auto filename = entry.path().filename().native();
auto ext = entry.path().extension().native();
if (filename.find(file_prefix) == 0 && ext == file_extension) {
auto filename = entry.path().filename();
auto ext = entry.path().extension();
if (filename.native().find(file_prefix) == 0 && ext == file_extension) {
dl_handle_ptr handle { dl_load_library(entry) };
if (!handle && !silent) {
GGML_LOG_ERROR("%s: failed to load %s\n", __func__, path_str(entry.path()).c_str());
Expand Down Expand Up @@ -537,7 +544,7 @@ static ggml_backend_reg_t ggml_backend_load_best(const char * name, bool silent,
// try to load the base backend
for (const auto & search_path : search_paths) {
fs::path filename = backend_filename_prefix().native() + name_path.native() + backend_filename_extension().native();
fs::path path = search_path.native() + filename.native();
fs::path path = search_path / filename;
if (fs::exists(path)) {
return get_reg().load_backend(path, silent);
}
Expand Down
8 changes: 6 additions & 2 deletions ggml/src/ggml-cann/aclnn_ops.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2790,10 +2790,14 @@ static void ggml_cann_mul_mat_quant(ggml_backend_cann_context& ctx,
(char*)output_buffer + batch1 * output_stride, ACL_FLOAT16,
output_elem_size, output_ne, output_nb, 2, ACL_FORMAT_ND,
output_ne_offset);
int64_t antiquantGroupSize = 0;
if (src0->ne[0] > QK8_0) {
antiquantGroupSize = QK8_0;
}

ACL_CHECK(aclnnWeightQuantBatchMatmulV2GetWorkspaceSize(
acl_input_tensor, acl_weight_tensor, acl_scale_tensor, nullptr,
nullptr, nullptr, nullptr, QK8_0, acl_output_tensor,
nullptr, nullptr, nullptr, antiquantGroupSize, acl_output_tensor,
&workspaceSize, &executor));
if (workspaceAddr == nullptr) {
workspaceAddr = workspace_allocator.alloc(workspaceSize);
Expand Down Expand Up @@ -2833,7 +2837,7 @@ static void ggml_cann_mul_mat_quant(ggml_backend_cann_context& ctx,

ACL_CHECK(aclnnWeightQuantBatchMatmulV2GetWorkspaceSize(
acl_input_tensor, acl_weight_tensor, acl_scale_tensor,
nullptr, nullptr, nullptr, nullptr, QK8_0,
nullptr, nullptr, nullptr, nullptr, antiquantGroupSize,
acl_output_tensor, &workspaceSize, &executor));
ACL_CHECK(aclnnWeightQuantBatchMatmulV2(
workspaceAddr, workspaceSize, executor, ctx.stream()));
Expand Down
5 changes: 0 additions & 5 deletions ggml/src/ggml-cann/ggml-cann.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1689,11 +1689,6 @@ static bool ggml_backend_cann_supports_op(ggml_backend_dev_t dev,
case GGML_OP_MUL_MAT: {
switch (op->src[0]->type) {
case GGML_TYPE_Q8_0:
// Current groupsize should not be greater than k-1 in
// aclnnWeightQuantBatchMatmulV2GetWorkspaceSize
if (op->src[0]->ne[0] <= QK8_0) {
return false;
}
case GGML_TYPE_F16:
case GGML_TYPE_F32:
case GGML_TYPE_Q4_0:
Expand Down
42 changes: 30 additions & 12 deletions ggml/src/ggml-cpu/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -287,17 +287,31 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
endif()
endif()
endif()
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64")
elseif ("${CMAKE_SYSTEM_PROCESSOR} " STREQUAL "ppc64le " OR "${CMAKE_SYSTEM_PROCESSOR} " STREQUAL "powerpc ")
message(STATUS "PowerPC detected")
execute_process(COMMAND bash -c "grep POWER /proc/cpuinfo | head -n 1" OUTPUT_VARIABLE POWER_M)
if (${POWER_M} MATCHES "POWER10")
list(APPEND ARCH_FLAGS -mcpu=power10)
elseif (${POWER_M} MATCHES "POWER9")
list(APPEND ARCH_FLAGS -mcpu=power9)
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64le")
list(APPEND ARCH_FLAGS -mcpu=powerpc64le -mtune=native)
if (GGML_NATIVE)
if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64")
file(READ "/proc/cpuinfo" POWER10_M)
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "powerpc")
execute_process(COMMAND bash -c "prtconf |grep 'Implementation' | head -n 1" OUTPUT_VARIABLE POWER10_M)
endif()

string(REGEX MATCHALL "POWER *([0-9]+)" MATCHED_STRING "${POWER10_M}")
string(REGEX REPLACE "POWER *([0-9]+)" "\\1" EXTRACTED_NUMBER "${MATCHED_STRING}")

if (EXTRACTED_NUMBER GREATER_EQUAL 10)
list(APPEND ARCH_FLAGS -mcpu=power10 -mpowerpc64)
elseif (EXTRACTED_NUMBER EQUAL 9)
list(APPEND ARCH_FLAGS -mcpu=power9 -mpowerpc64)
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64le")
list(APPEND ARCH_FLAGS -mcpu=powerpc64le -mtune=native)
else()
list(APPEND ARCH_FLAGS -mcpu=native -mtune=native -mpowerpc64)
endif()
else()
list(APPEND ARCH_FLAGS -mcpu=powerpc64 -mtune=native)
if (GGML_CPU_POWERPC_CPUTYPE)
list(APPEND ARCH_FLAGS -mcpu=${GGML_CPU_POWERPC_CPUTYPE})
endif()
endif()
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "loongarch64")
message(STATUS "loongarch64 detected")
Expand All @@ -312,7 +326,11 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "riscv64")
message(STATUS "RISC-V detected")
if (GGML_RVV)
list(APPEND ARCH_FLAGS -march=rv64gcv -mabi=lp64d)
if (GGML_RV_ZFH)
list(APPEND ARCH_FLAGS -march=rv64gcv_zfhmin -DGGML_RV_ZFH -mabi=lp64d)
else()
list(APPEND ARCH_FLAGS -march=rv64gcv -mabi=lp64d)
endif()
endif()
elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "s390x")
message(STATUS "s390x detected")
Expand Down Expand Up @@ -351,9 +369,9 @@ function(ggml_add_cpu_backend_variant_impl tag_name)

# Fetch KleidiAI sources:
include(FetchContent)
set(KLEIDIAI_COMMIT_TAG "v1.3.0")
set(KLEIDIAI_COMMIT_TAG "v1.5.0")
set(KLEIDIAI_DOWNLOAD_URL "https://github.com/ARM-software/kleidiai/archive/refs/tags/${KLEIDIAI_COMMIT_TAG}.tar.gz")
set(KLEIDIAI_ARCHIVE_MD5 "060bd2dc64642b091f461cc8dd7426d9")
set(KLEIDIAI_ARCHIVE_MD5 "ea22e1aefb800e9bc8c74d91633cc58e")

if (POLICY CMP0135)
cmake_policy(SET CMP0135 NEW)
Expand Down
Loading
Loading