Skip to content

[clang-format] Add space after a word token #92741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

robincaloudis
Copy link
Contributor

Closes #92688

@robincaloudis
Copy link
Contributor Author

robincaloudis commented May 20, 2024

Since I am by no means an expert on Clang, a few questions arose

  • As Clang formats the code correctly (no code changes necessary) when instead of xor a different function name e.g. xooor is used, I wonder why xor is tokenized as Unary operator in the first place in C?
  • Is C and C++ using the same tokenizations?
  • How to properly distinguish between the C and C++ language in Clang Format?

@robincaloudis robincaloudis marked this pull request as ready for review May 20, 2024 19:19
@llvmbot
Copy link
Member

llvmbot commented May 20, 2024

@llvm/pr-subscribers-clang-format

Author: Robin Caloudis (robincaloudis)

Changes

Closes #92688


Full diff: https://github.com/llvm/llvm-project/pull/92741.diff

2 Files Affected:

  • (modified) clang/lib/Format/TokenAnnotator.cpp (+2-1)
  • (modified) clang/unittests/Format/FormatTest.cpp (+7)
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 7c4c76a91f2c5..7786b85e8a1fc 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -5280,7 +5280,8 @@ bool TokenAnnotator::spaceRequiredBefore(const AnnotatedLine &Line,
     // handled.
     if (Left.is(tok::amp) && Right.is(tok::r_square))
       return Style.SpacesInSquareBrackets;
-    return Style.SpaceAfterLogicalNot && Left.is(tok::exclaim);
+    return (Style.SpaceAfterLogicalNot && Left.is(tok::exclaim)) ||
+           Right.is(TT_BinaryOperator);
   }
 
   // If the next token is a binary operator or a selector name, we have
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index 6f57f10e12e88..ca0edd7b22630 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -24545,6 +24545,13 @@ TEST_F(FormatTest, STLWhileNotDefineChed) {
                "#endif // while");
 }
 
+TEST_F(FormatTest, BinaryOperatorAfterUnaryOperator) {
+  verifyFormat("void test(void) {\n"
+               "  static void (*xor)(uint8_t *, size_t, uint8_t);\n"
+               "  xor = resolve_xor_x86();\n"
+               "}");
+}
+
 TEST_F(FormatTest, OperatorSpacing) {
   FormatStyle Style = getLLVMStyle();
   Style.PointerAlignment = FormatStyle::PAS_Right;

@owenca
Copy link
Contributor

owenca commented May 21, 2024

Since I am by no means an expert on Clang, a few questions arose

  • As Clang formats the code correctly (no code changes necessary) when instead of xor a different function name e.g. xooor is used, I wonder why xor is tokenized as Unary operator in the first place in C?
  • Is C and C++ using the same tokenizations?

clang-format formats all C code as C++. Since xor is a C++ keyword, it's lexed as tok::caret, which is then erroneously annotated as TT_UnaryOperator. This bug was uncovered by #90161.

  • How to properly distinguish between the C and C++ language in Clang Format?

clang-format can't do it properly. @mydeveloperday, @HazardyKnusperkeks, and @rymiel may know more about why we didn't add LK_C to the Language option.

@@ -5280,7 +5280,8 @@ bool TokenAnnotator::spaceRequiredBefore(const AnnotatedLine &Line,
// handled.
if (Left.is(tok::amp) && Right.is(tok::r_square))
return Style.SpacesInSquareBrackets;
return Style.SpaceAfterLogicalNot && Left.is(tok::exclaim);
return (Style.SpaceAfterLogicalNot && Left.is(tok::exclaim)) ||
Right.is(TT_BinaryOperator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was deleted in #90161. Adding it back will hide the uncovered bug and only fix #92688 superficially IMO. For example, SpaceBeforeAssignmentOperators will have no effect.

@mydeveloperday
Copy link
Contributor

Because if this code was in a .h and not a .c you wouldn't know what language you were in

@mydeveloperday
Copy link
Contributor

I thought we had some code like this around using a variable called "try"

@HazardyKnusperkeks
Copy link
Contributor

Since I am by no means an expert on Clang, a few questions arose

  • As Clang formats the code correctly (no code changes necessary) when instead of xor a different function name e.g. xooor is used, I wonder why xor is tokenized as Unary operator in the first place in C?
  • Is C and C++ using the same tokenizations?

clang-format formats all C code as C++. Since xor is a C++ keyword, it's lexed as tok::caret, which is then erroneously annotated as TT_UnaryOperator. This bug was uncovered by #90161.

  • How to properly distinguish between the C and C++ language in Clang Format?

clang-format can't do it properly. @mydeveloperday, @HazardyKnusperkeks, and @rymiel may know more about why we didn't add LK_C to the Language option.

If I remember correctly, I was in favor of adding a C language.

Because if this code was in a .h and not a .c you wouldn't know what language you were in

There are certainly headers which are ambiguous, and we could add an option to set the language of such headers. But if we hit a class, namespace, or a :: we could fairly certain use it as C++ header. Similar to guessIsObjC, and skip checking for C++ if the new option is set to C++ (which of course would be the default, to keep existing behavior).

@owenca
Copy link
Contributor

owenca commented May 22, 2024

@robincaloudis See #92880.

@robincaloudis
Copy link
Contributor Author

robincaloudis commented May 22, 2024

Thanks @owenca, @mydeveloperday and @HazardyKnusperkeks for the explanation and insights! I'm closing this PR as @owenca found a much better solution.

@owenca
Copy link
Contributor

owenca commented Feb 22, 2025

Because if this code was in a .h and not a .c you wouldn't know what language you were in

This would be addressed by #128287 and #128122.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang-format] A space is missing after a word token
5 participants