Skip to content

Incorrect Free Memory Reporting for Intel Arc(TM) A770 Graphics #750

Open
@avimanyu786

Description

@avimanyu786

Description

There is an inconsistency in the reported GPU free memory between the Intel Compute Runtime and tools such as xpu-smi. When using the Intel Compute Runtime on Intel Arc(TM) A770 Graphics, the reported free memory value is incorrect, consistently showing the same value as the total memory, even when memory is being consumed. This issue was observed in both Python (dpctl) and a standalone C++ executable.

Steps to Reproduce

  1. Set up an environment with the Intel Compute Runtime and xpu-smi installed.
  2. Save the following C++ code as say mem.cpp:
#include <iostream>
#include <vector>
#include <string>
#include <sycl/sycl.hpp>

int main(void) {
    sycl::queue q{sycl::default_selector_v};

    const sycl::device &dev = q.get_device();
    const std::string &dev_name = dev.get_info<sycl::info::device::name>();
    const std::string &driver_ver = dev.get_info<sycl::info::device::driver_version>();

    std::cout << "Device: " << dev_name << " ["  << driver_ver << "]" << std::endl;

    auto global_mem_size = dev.get_info<sycl::info::device::global_mem_size>();

    std::cout << "Global device memory size: " << global_mem_size << " bytes" << std::endl;

    if (dev.has(sycl::aspect::ext_intel_free_memory)) {
         auto free_memory = dev.get_info<sycl::ext::intel::info::device::free_memory>();
         std::cout << "Free memory: " << free_memory << " bytes" << std::endl;
         std::cout << "Implied memory in use: " << global_mem_size - free_memory << " bytes" << std::endl;
    } else {
        std::cout << "Free memory descriptor is not available" << std::endl;
    }

    return 0;
}
  1. Compile the code to obtain the binary:
icpx -fsycl mem.cpp -o mem.x
  1. Execute the compiled binary with the environment variable ZES_ENABLE_SYSMAN set to 1:
export ZES_ENABLE_SYSMAN=1
./mem.x
  1. Compare the output with the results from xpu-smi:
xpu-smi stats -d 0

Observed Behavior

The C++ code consistently reports the same value for global_mem_size and free_memory, implying 0 bytes of used memory, even when memory is being consumed by the GPU. In contrast, xpu-smi correctly reports non-zero GPU memory usage.

Expected Behavior

The free_memory value reported by the Intel Compute Runtime should reflect the actual free memory, showing a decrease when GPU memory is used, consistent with the output from xpu-smi.

Environment Details

  • OS: HiveOS (Based on Ubuntu 20.04 and 22.04)
  • GPU: Intel(R) Arc(TM) A770 Graphics
  • GPU driver versions tested:
    • 1.3.27642
    • 1.3.29735
  • Intel Compute Runtime: Relevant versions for the above drivers
  • Compiler: Intel DPC++/C++ Compiler (icpx)

Additional Information

This issue is tracked in the dpctl repository here. The problem appears to stem from the GPU driver or the Intel Compute Runtime itself, as confirmed by running a standalone C++ executable.

Please let me know if further information or testing is required. Thank you for investigating this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    L0 SysmanIssue related to L0 Sysman

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions