[libc][bug] Fix out of bound write in memcpy w/ software prefetching #90591

gchatelet · 2024-04-30T10:56:16Z

This patch adds tests for memcpy and memset making sure that we don't access buffers out of bounds. It relies on POSIX mmap / mprotect and works only when FULL_BUILD_MODE is disabled.

The bug showed up while enabling software prefetching. loop_and_tail_offset is always running at least one iteration but in some configurations loop unrolled prefetching was actually needing only the tail operation and no loop iterations at all.

llvmbot · 2024-04-30T10:56:46Z

@llvm/pr-subscribers-libc

Author: Guillaume Chatelet (gchatelet)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/90591.diff

4 Files Affected:

(modified) libc/fuzzing/string/CMakeLists.txt (+16)
(added) libc/fuzzing/string/memcpy_fuzz.cpp (+55)
(added) libc/fuzzing/string/memset_fuzz.cpp (+44)
(added) libc/fuzzing/string/protected_pages.h (+74)

diff --git a/libc/fuzzing/string/CMakeLists.txt b/libc/fuzzing/string/CMakeLists.txt
index 9dd4fceee3b596..6147ebfcb5013f 100644
--- a/libc/fuzzing/string/CMakeLists.txt
+++ b/libc/fuzzing/string/CMakeLists.txt
@@ -25,6 +25,22 @@ add_libc_fuzzer(
     libc.src.string.strlen
 )
 
+add_libc_fuzzer(
+  memcpy_fuzz
+  SRCS
+    memcpy_fuzz.cpp
+  DEPENDS
+    libc.src.string.memcpy
+)
+
+add_libc_fuzzer(
+  memset_fuzz
+  SRCS
+    memset_fuzz.cpp
+  DEPENDS
+    libc.src.string.memset
+)
+
 add_libc_fuzzer(
   memcmp_fuzz
   SRCS
diff --git a/libc/fuzzing/string/memcpy_fuzz.cpp b/libc/fuzzing/string/memcpy_fuzz.cpp
new file mode 100644
index 00000000000000..f9d98613cd7e23
--- /dev/null
+++ b/libc/fuzzing/string/memcpy_fuzz.cpp
@@ -0,0 +1,55 @@
+//===-- memcpy_fuzz.cpp ---------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// Fuzzing test for llvm-libc memcpy implementation.
+///
+//===----------------------------------------------------------------------===//
+#include "protected_pages.h"
+#include "src/string/memcpy.h"
+#include <stddef.h> // size_t
+#include <stdint.h> // uint8_t
+#include <stdlib.h> // rand
+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t data_size) {
+  static constexpr size_t MAX_SIZE = 1024;
+  static ProtectedPages regions;
+  static const Page write_buffer = regions.GetPageA().WithAccess(PROT_WRITE);
+  static const Page read_buffer = [&]() {
+    // We fetch page B in write mode.
+    auto region = regions.GetPageB().WithAccess(PROT_WRITE);
+    // And fill it with random numbers.
+    for (size_t i = 0; i < region.page_size; ++i)
+      region.page_ptr[i] = rand();
+    // Then return it in read mode.
+    return region.WithAccess(PROT_READ);
+  }();
+  // We fill 'size' with data coming from lib_fuzzer, this limits exploration to
+  // 2 bytes.
+  uint16_t size = 0;
+  if (data_size != sizeof(size))
+    return 0;
+  __builtin_memcpy(&size, data, sizeof(size));
+  if (size >= MAX_SIZE || size >= GetPageSize())
+    return 0;
+  // We cross-check the function from two sources and two destinations.
+  // The first of them (bottom) is always page aligned.
+  // The second one (top) is not necessarily aligned.
+  // Both sources and destinations are checked for out of bound accesses.
+  const uint8_t *sources[2] = {read_buffer.bottom(size), read_buffer.top(size)};
+  uint8_t *destinations[2] = {write_buffer.bottom(size),
+                              write_buffer.top(size)};
+  for (const uint8_t *src : sources) {
+    for (uint8_t *dst : destinations) {
+      LIBC_NAMESPACE::memcpy(dst, src, size);
+      for (size_t i = 0; i < size; ++i)
+        if (src[i] != dst[i])
+          __builtin_trap();
+    }
+  }
+  return 0;
+}
diff --git a/libc/fuzzing/string/memset_fuzz.cpp b/libc/fuzzing/string/memset_fuzz.cpp
new file mode 100644
index 00000000000000..445bb7b6fa2d6b
--- /dev/null
+++ b/libc/fuzzing/string/memset_fuzz.cpp
@@ -0,0 +1,44 @@
+//===-- memset_fuzz.cpp ---------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// Fuzzing test for llvm-libc memcset implementation.
+///
+//===----------------------------------------------------------------------===//
+#include "protected_pages.h"
+#include "src/string/memset.h"
+#include <stddef.h> // size_t
+#include <stdint.h> // uint8_t
+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t data_size) {
+  static constexpr size_t MAX_SIZE = 1024;
+  static ProtectedPages regions;
+  static const Page write_buffer = regions.GetPageA().WithAccess(PROT_WRITE);
+  // We fill 'size' with data coming from lib_fuzzer, this limits exploration to
+  // 2 bytes.
+  uint16_t size = 0;
+  uint8_t fill_char = 0;
+  if (data_size != sizeof(size) + sizeof(fill_char))
+    return 0;
+  __builtin_memcpy(&size, data, sizeof(size));
+  __builtin_memcpy(&fill_char, data + sizeof(size), sizeof(fill_char));
+  if (size >= MAX_SIZE || size >= GetPageSize())
+    return 0;
+  // We cross-check the function from two sources and two destinations.
+  // The first of them (bottom) is always page aligned.
+  // The second one (top) is not necessarily aligned.
+  // Both sources and destinations are checked for out of bound accesses.
+  uint8_t *destinations[2] = {write_buffer.bottom(size),
+                              write_buffer.top(size)};
+  for (uint8_t *dst : destinations) {
+    LIBC_NAMESPACE::memset(dst, fill_char, size);
+    for (size_t i = 0; i < size; ++i)
+      if (dst[i] != fill_char)
+        __builtin_trap();
+  }
+  return 0;
+}
diff --git a/libc/fuzzing/string/protected_pages.h b/libc/fuzzing/string/protected_pages.h
new file mode 100644
index 00000000000000..fa09f9ac7947f1
--- /dev/null
+++ b/libc/fuzzing/string/protected_pages.h
@@ -0,0 +1,74 @@
+#ifndef LIBC_FUZZING_STRING_PROTECTED_PAGES_H
+#define LIBC_FUZZING_STRING_PROTECTED_PAGES_H
+
+#include <stddef.h>   // size_t
+#include <stdint.h>   // uint8_t
+#include <sys/mman.h> // mmap, munmap
+#include <unistd.h>   // sysconf, _SC_PAGESIZE
+
+// Returns mmap page size.
+size_t GetPageSize() { return sysconf(_SC_PAGESIZE); }
+
+// Represents a page of memory which access can be configured throught the
+// 'WithAccess' function. Accessing data above or below this page will trap as
+// it is sandwiched between two pages with no read / write access.
+struct Page {
+  // Returns an aligned pointer that can be accessed up to page_size. Accessing
+  // data at ptr[-1] will fault.
+  uint8_t *bottom(size_t size) const {
+    if (size >= page_size)
+      __builtin_trap();
+    return page_ptr;
+  }
+  // Returns a pointer to a buffer that can be accessed up to size. Accessing
+  // data at ptr[size] will fault.
+  uint8_t *top(size_t size) const { return page_ptr + page_size - size; }
+
+  Page &WithAccess(int protection) {
+    if (mprotect(page_ptr, page_size, protection) != 0)
+      __builtin_trap();
+    return *this;
+  }
+
+  const size_t page_size;
+  uint8_t *const page_ptr;
+};
+
+// Allocates 5 consecutive pages that will trap if accessed.
+// +-----------------+
+// | page 0 (FAULT)  |
+// | page 1 (CUSTOM) |
+// | page 2 (FAULT)  |
+// | page 3 (CUSTOM) |
+// | page 4 (FAULT)  |
+// +-----------------+
+// The pages 1 and 3 can be retrieved as with 'GetPageA' / 'GetPageB' and their
+// accesses can be customized through the 'WithAccess' function.
+struct ProtectedPages {
+  static constexpr size_t PAGES = 5;
+
+  ProtectedPages()
+      : page_size(GetPageSize()),
+        ptr(mmap(/*address*/ nullptr, /*length*/ PAGES * page_size,
+                 /*protection*/ PROT_NONE,
+                 /*flags*/ MAP_PRIVATE | MAP_ANONYMOUS, /*fd*/ -1,
+                 /*offset*/ 0)) {
+    if (reinterpret_cast<intptr_t>(ptr) == -1)
+      __builtin_trap();
+  }
+  ~ProtectedPages() { munmap(ptr, PAGES * page_size); }
+
+  Page GetPageA() const { return Page{page_size, page<1>()}; }
+  Page GetPageB() const { return Page{page_size, page<3>()}; }
+
+private:
+  template <size_t index> uint8_t *page() const {
+    static_assert(index < PAGES);
+    return static_cast<uint8_t *>(ptr) + (index * page_size);
+  }
+
+  const size_t page_size;
+  void *const ptr = nullptr;
+};
+
+#endif // LIBC_FUZZING_STRING_PROTECTED_PAGES_H

legrosbuffle · 2024-04-30T11:29:14Z

libc/fuzzing/string/memset_fuzz.cpp

+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t data_size) {
+  static constexpr size_t MAX_SIZE = 1024;
+  static ProtectedPages pages;


I think using msan should catch these without having to write a custom class and make the fuzz test clearer. It should be as simple as __msan_poison the dst (to check we have no spurious reads) and _msan_unpoison the src (to allow reads).

So for a reason that I don't understand it seems that passing -fsanitize=memory globally does not instrument the memcpy code. As a consequence, I couldn't get the sanitizer to find the bug.
The compiled memcpy library contains a symbol for sanitizer initialization but the memcpy code is left un-instrumented.

Having all libc functions instrumented is one of the goals of the project so we definitely need to find the root cause.

In the meantime, I'm willing to submit this code as-is if you don't mind as it's a reproducer for a bug in memcpy with software prefetching when AVX is enabled.

Does it have something to do with the function name?

This bug showed up when running fuzzers newly added fuzzers llvm#90591.

libc/fuzzing/string/protected_pages.h

libc/fuzzing/string/memset_fuzz.cpp

libc/fuzzing/string/memcpy_fuzz.cpp

…t build bot breakage

gchatelet · 2024-05-14T11:36:33Z

So as discussed offline, I turned the fuzzers into tests and fixed the bug in this patch to avoid breaking the build bots.
Contrary to what I originally thought the tests do exercise the software prefetching code.

gchatelet · 2024-05-14T11:38:23Z

The last presubmit run is indeed running the tests

...
[4131/4223] Running unit test libc.test.src.string.memcpy_x86_64_opt_sw_prefetch_avx_test.__unit__
[ RUN      ] LlvmLibcMemcpyTest.SizeSweep
[       OK ] LlvmLibcMemcpyTest.SizeSweep (took 157 us)
[ RUN      ] LlvmLibcMemcpyTest.CheckAccess
[       OK ] LlvmLibcMemcpyTest.CheckAccess (took 98 us)
Ran 2 tests.  PASS: 2  FAIL: 0
...

[libc] Add fuzzers for memcpy and memset

83ab25f

llvmbot added the libc label Apr 30, 2024

gchatelet requested a review from legrosbuffle April 30, 2024 10:57

gchatelet added 3 commits April 30, 2024 11:00

rename regions to pages

768448b

Rephrase comments

8fa3e16

Update protected_pages header and documentation

6c9e5e1

legrosbuffle reviewed Apr 30, 2024

View reviewed changes

gchatelet requested a review from legrosbuffle April 30, 2024 13:45

gchatelet added a commit to gchatelet/llvm-project that referenced this pull request Apr 30, 2024

[libc][bug] Fix out of bound write in memcpy wi software prefetching

b41ea8e

This bug showed up when running fuzzers newly added fuzzers llvm#90591.

gchatelet mentioned this pull request Apr 30, 2024

[libc][bug] Fix out of bound write in memcpy w/ software prefetching #90613

Closed

legrosbuffle reviewed Apr 30, 2024

View reviewed changes

gchatelet added 4 commits May 14, 2024 09:47

Improve documentation

e0deb56

Convert fuzzers into tests and fix the bug at the same time to preven…

af4068b

…t build bot breakage

Fix macro comments

42f38ac

Fix documentation

5ea47aa

gchatelet changed the title ~~[libc] Add fuzzers for memcpy and memset~~ [libc][bug] Fix out of bound write in memcpy w/ software prefetching May 14, 2024

gchatelet mentioned this pull request May 14, 2024

[libc] Test memory functions with software prefetching turned on #92089

Closed

gchatelet requested a review from legrosbuffle May 14, 2024 11:46

legrosbuffle approved these changes May 14, 2024

View reviewed changes

gchatelet merged commit 292b300 into llvm:main May 14, 2024
4 checks passed

gchatelet deleted the add_mem_fuzzers branch May 14, 2024 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc][bug] Fix out of bound write in memcpy w/ software prefetching #90591

[libc][bug] Fix out of bound write in memcpy w/ software prefetching #90591

gchatelet commented Apr 30, 2024 •

edited

Loading

Uh oh!

llvmbot commented Apr 30, 2024

Uh oh!

legrosbuffle Apr 30, 2024

Uh oh!

gchatelet Apr 30, 2024

Uh oh!

nickdesaulniers May 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gchatelet commented May 14, 2024

Uh oh!

gchatelet commented May 14, 2024

Uh oh!

Uh oh!

Uh oh!

[libc][bug] Fix out of bound write in memcpy w/ software prefetching #90591

[libc][bug] Fix out of bound write in memcpy w/ software prefetching #90591

Conversation

gchatelet commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 30, 2024

Uh oh!

legrosbuffle Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

gchatelet Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

nickdesaulniers May 20, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gchatelet commented May 14, 2024

Uh oh!

gchatelet commented May 14, 2024

Uh oh!

Uh oh!

Uh oh!

gchatelet commented Apr 30, 2024 •

edited

Loading