Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] Warn about memset/memcpy to NonTriviallyCopyable types #111434

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

serge-sans-paille
Copy link
Collaborator

This implements a warning that's similar to what GCC does in that context: both memcpy and memset require their first and second operand to be trivially copyable, let's warn if that's not the case.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 7, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 7, 2024

@llvm/pr-subscribers-pgo
@llvm/pr-subscribers-libcxx

@llvm/pr-subscribers-clang

Author: None (serge-sans-paille)

Changes

This implements a warning that's similar to what GCC does in that context: both memcpy and memset require their first and second operand to be trivially copyable, let's warn if that's not the case.


Full diff: https://github.com/llvm/llvm-project/pull/111434.diff

3 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+4)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+24)
  • (modified) clang/test/SemaCXX/constexpr-string.cpp (+4)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 583475327c5227..d9bff4a559b3b7 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -790,6 +790,10 @@ def warn_cstruct_memaccess : Warning<
   "%1 call is a pointer to record %2 that is not trivial to "
   "%select{primitive-default-initialize|primitive-copy}3">,
   InGroup<NonTrivialMemaccess>;
+def warn_cxxstruct_memaccess : Warning<
+  "%select{destination for|source of|first operand of|second operand of}0 this "
+  "%1 call is a pointer to record %2 that is not trivially-copyable">,
+  InGroup<NonTrivialMemaccess>;
 def note_nontrivial_field : Note<
   "field is non-trivial to %select{copy|default-initialize}0">;
 def err_non_trivial_c_union_in_invalid_context : Error<
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 2bcb930acdcb57..46dda34d0ac8f3 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -8899,18 +8899,42 @@ void Sema::CheckMemaccessArguments(const CallExpr *Call,
           << ArgIdx << FnName << PointeeTy
           << Call->getCallee()->getSourceRange());
     else if (const auto *RT = PointeeTy->getAs<RecordType>()) {
+
+      auto IsTriviallyCopyableCXXRecord = [](auto const *RT) {
+        auto const *D = RT->getDecl();
+        if (!D)
+          return true;
+        auto const *RD = dyn_cast<CXXRecordDecl>(D);
+        if (!RD)
+          return true;
+        RD = RD->getDefinition();
+        if (!RD)
+          return true;
+        return RD->isTriviallyCopyable();
+      };
+
       if ((BId == Builtin::BImemset || BId == Builtin::BIbzero) &&
           RT->getDecl()->isNonTrivialToPrimitiveDefaultInitialize()) {
         DiagRuntimeBehavior(Dest->getExprLoc(), Dest,
                             PDiag(diag::warn_cstruct_memaccess)
                                 << ArgIdx << FnName << PointeeTy << 0);
         SearchNonTrivialToInitializeField::diag(PointeeTy, Dest, *this);
+      } else if ((BId == Builtin::BImemset || BId == Builtin::BIbzero) &&
+                 !IsTriviallyCopyableCXXRecord(RT)) {
+        DiagRuntimeBehavior(Dest->getExprLoc(), Dest,
+                            PDiag(diag::warn_cxxstruct_memaccess)
+                                << ArgIdx << FnName << PointeeTy);
       } else if ((BId == Builtin::BImemcpy || BId == Builtin::BImemmove) &&
                  RT->getDecl()->isNonTrivialToPrimitiveCopy()) {
         DiagRuntimeBehavior(Dest->getExprLoc(), Dest,
                             PDiag(diag::warn_cstruct_memaccess)
                                 << ArgIdx << FnName << PointeeTy << 1);
         SearchNonTrivialToCopyField::diag(PointeeTy, Dest, *this);
+      } else if ((BId == Builtin::BImemcpy || BId == Builtin::BImemmove) &&
+                 !IsTriviallyCopyableCXXRecord(RT)) {
+        DiagRuntimeBehavior(Dest->getExprLoc(), Dest,
+                            PDiag(diag::warn_cxxstruct_memaccess)
+                                << ArgIdx << FnName << PointeeTy);
       } else {
         continue;
       }
diff --git a/clang/test/SemaCXX/constexpr-string.cpp b/clang/test/SemaCXX/constexpr-string.cpp
index c456740ef7551f..26e2e138ef34e0 100644
--- a/clang/test/SemaCXX/constexpr-string.cpp
+++ b/clang/test/SemaCXX/constexpr-string.cpp
@@ -603,12 +603,16 @@ namespace MemcpyEtc {
   };
   constexpr bool test_nontrivial_memcpy() { // expected-error {{never produces a constant}}
     NonTrivial arr[3] = {};
+    // expected-warning@+2 {{source of this '__builtin_memcpy' call is a pointer to record 'NonTrivial' that is not trivially-copyable}}
+    // expected-note@+1 {{explicitly cast the pointer to silence this warning}}
     __builtin_memcpy(arr, arr + 1, sizeof(NonTrivial)); // expected-note 2{{non-trivially-copyable}}
     return true;
   }
   static_assert(test_nontrivial_memcpy()); // expected-error {{constant}} expected-note {{in call}}
   constexpr bool test_nontrivial_memmove() { // expected-error {{never produces a constant}}
     NonTrivial arr[3] = {};
+    // expected-warning@+2 {{source of this '__builtin_memcpy' call is a pointer to record 'NonTrivial' that is not trivially-copyable}}
+    // expected-note@+1 {{explicitly cast the pointer to silence this warning}}
     __builtin_memcpy(arr, arr + 1, sizeof(NonTrivial)); // expected-note 2{{non-trivially-copyable}}
     return true;
   }

@serge-sans-paille
Copy link
Collaborator Author

This change is triggering warnings at several point in the code base, I'll fix that too.

@serge-sans-paille serge-sans-paille requested a review from a team as a code owner October 8, 2024 12:47
@llvmbot llvmbot added compiler-rt libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. PGO Profile Guided Optimizations llvm:adt labels Oct 8, 2024
@serge-sans-paille
Copy link
Collaborator Author

  • patches to make sure llvm, libc++ and compiler-rt don't trigger that extra warning

@serge-sans-paille
Copy link
Collaborator Author

cc @ldionne got the compiler-rt part.

@serge-sans-paille
Copy link
Collaborator Author

Some of the libcxx builds fail, they are marked as "build stopped: interrupted by user."... I can't find the origin

@@ -102,7 +102,7 @@ struct __aliasing_iterator_wrapper {

_LIBCPP_HIDE_FROM_ABI _Alias operator*() const _NOEXCEPT {
_Alias __val;
__builtin_memcpy(&__val, std::__to_address(__base_), sizeof(value_type));
__builtin_memcpy(&__val, reinterpret_cast<void*>(std::__to_address(__base_)), sizeof(value_type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a false-positive. This inspects the non-trivial object and implicitly starts the lifetime of _Alias, which is perfectly defined behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast should probably be static_cast<void*> also

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack for the false positive. I'll move that to a static_cast which should silent the warning.

@carlosgalvezp
Copy link
Contributor

carlosgalvezp commented Oct 9, 2024

Please note: GCC is more strict; for std::memset, it requires the type to be trivial, not just trivially copyable:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107361

I want to remember it was also more strict for std::memcpy, but I haven't been able to reproduce it.

I haven't looked at the patch in detail but for consistency with GCC it should also support suppressing the warning by explictly casting the pointers to void* (maybe it's already in place).

@serge-sans-paille
Copy link
Collaborator Author

Please note: GCC is more strict; for std::memset, it requires the type to be trivial, not just trivially copyable:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107361

ACK. Not a standard requirement though, is it?

I want to remember it was also more strict for std::memcpy, but I haven't been able to reproduce it.

I haven't looked at the patch in detail but for consistency with GCC it should also support suppressing the warning by explictly casting the pointers to void* (maybe it's already in place).

Yeah, that's part of current logic clang in clang too (thus the various cast added in this PR)

@carlosgalvezp
Copy link
Contributor

ACK. Not a standard requirement though, is it?

Correct, it's only UB on non-trivially-copyable. My point was more about if we want to be consistent with GCC or not. I don't have any strong opinion on that.

@philnik777
Copy link
Contributor

ACK. Not a standard requirement though, is it?

Correct, it's only UB on non-trivially-copyable. My point was more about if we want to be consistent with GCC or not. I don't have any strong opinion on that.

I can see an argument that a user probably is doing something wrong if they memset a non-trivial type, especially since compilers are perfectly capable of optimizing zero-initializing loops into memsets. For memcpy/memmove we should definitely only diagnose non-trivially copyable cases, since it's quite a common optimization to memcpy trivially copyable objects.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! It's missing a lot of test coverage that should be added to clang/test/SemaCXX and you should also add a release note to clang/docs/ReleaseNotes.rst so users know about the new diagnostic.

Comment on lines 794 to 795
"%select{destination for|source of|first operand of|second operand of}0 this "
"%1 call is a pointer to record %2 that is not trivially-copyable">,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"%select{destination for|source of|first operand of|second operand of}0 this "
"%1 call is a pointer to record %2 that is not trivially-copyable">,
"%select{destination for|source of|first operand of|second operand of}0 call to "
"%1 is a pointer to non-trivially copyable type %2">,

Slight rewording

@@ -8899,18 +8899,42 @@ void Sema::CheckMemaccessArguments(const CallExpr *Call,
<< ArgIdx << FnName << PointeeTy
<< Call->getCallee()->getSourceRange());
else if (const auto *RT = PointeeTy->getAs<RecordType>()) {

auto IsTriviallyCopyableCXXRecord = [](auto const *RT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto IsTriviallyCopyableCXXRecord = [](auto const *RT) {
auto IsTriviallyCopyableCXXRecord = [](const RecordType *RT) {

Do not use auto when there's only one type this can be called with (I just spent a lot of time writing comments that I had to delete because this can only accept a RecordType).

Actually, I don't think we even need the lambda. You could precalculate this as a local variable:

bool IsTriviallyCopyableCXXRecord = false;
if (const auto *RD = RT->getAsCXXRecordDecl())
  IsTriviallyCopyableCXXRecord = RD->isTriviallyCopyable();

should suffice. (The call should return false if the record has no definition because that's an incomplete type but that's a good test case to ensure we have.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @erichkeane just reminded me that we have QualType::isTriviallyCopyableType() -- that's an even better interface to use IMO.

@@ -8899,18 +8899,42 @@ void Sema::CheckMemaccessArguments(const CallExpr *Call,
<< ArgIdx << FnName << PointeeTy
<< Call->getCallee()->getSourceRange());
else if (const auto *RT = PointeeTy->getAs<RecordType>()) {

auto IsTriviallyCopyableCXXRecord = [](auto const *RT) {
auto const *D = RT->getDecl();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto const *D = RT->getDecl();
const Decl *D = RT->getDecl();

Please only use auto when the type is spelled out in the initialization. Also, we use const Type and not Type const as the prevailing style, so you should stick with that.

auto const *D = RT->getDecl();
if (!D)
return true;
auto const *RD = dyn_cast<CXXRecordDecl>(D);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto const *RD = dyn_cast<CXXRecordDecl>(D);
const auto *RD = dyn_cast<CXXRecordDecl>(D);

@@ -102,7 +102,7 @@ struct __aliasing_iterator_wrapper {

_LIBCPP_HIDE_FROM_ABI _Alias operator*() const _NOEXCEPT {
_Alias __val;
__builtin_memcpy(&__val, std::__to_address(__base_), sizeof(value_type));
__builtin_memcpy(&__val, static_cast<const void*>(std::__to_address(__base_)), sizeof(value_type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it visible again: IMO this is a false-positive, since this is inspecting a non-trivial type, which is perfectly defined behaviour.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one thing I don't understand: my understanding is that passing a pointer to a non-trivially copyable type to memcpy first or second argument is UB and _BaseIter maybe a non-trivially copyable type, so what happens then?

This implements a warning that's similar to what GCC does in that
context: both memcpy and memset require their first and second operand
to be trivially copyable, let's warn if that's not the case.
@serge-sans-paille
Copy link
Collaborator Author

@AaronBallman doc, tests and style updated. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. llvm:adt PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants