Skip to content

[clang-format] Break after string literals with trailing line breaks #76795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 8, 2024

Conversation

kadircet
Copy link
Member

@kadircet kadircet commented Jan 3, 2024

This restores a subset of functionality that was forego in
d68826d.

Streaming multiple string literals is rare enough in practice, hence
that change makes sense in general. But it seems people were
incidentally relying on this for having line breaks after string
literals that ended with \n.

This patch tries to restore that behavior to prevent regressions in the
upcoming LLVM release, until we can implement some configuration based
approach as proposed in #69859.

@llvmbot
Copy link
Member

llvmbot commented Jan 3, 2024

@llvm/pr-subscribers-clang-format

Author: kadir çetinkaya (kadircet)

Changes

This restores a subset of functionality that was forego in
d68826d.

Streaming multiple string literals is rare enough in practice, hence
that change makes sense in general. But it seems people were
incidentally relying on this for having line breaks after string
literals that ended with \n.

This patch tries to restore that behavior to prevent regressions in the
upcoming LLVM release, until we can implement some configuration based
approach as proposed in #69859.


Full diff: https://github.com/llvm/llvm-project/pull/76795.diff

2 Files Affected:

  • (modified) clang/lib/Format/TokenAnnotator.cpp (+8)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+10)
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 3ac3aa3c5e3a22..27c6bb0ef6fe4f 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -5151,6 +5151,14 @@ bool TokenAnnotator::mustBreakBefore(const AnnotatedLine &Line,
     return true;
   if (Left.IsUnterminatedLiteral)
     return true;
+  if (Right.is(tok::lessless) && Right.Next && Left.is(tok::string_literal) &&
+      // FIXME: Breaking after newlines seems useful in general. Turn this into
+      // an option and Recognize more cases like endl etc, and break independent
+      // of what comes after operator lessless.
+      Right.Next->is(tok::string_literal) &&
+      Left.TokenText.ends_with("\\n\"")) {
+    return true;
+  }
   if (Right.is(TT_RequiresClause)) {
     switch (Style.RequiresClausePosition) {
     case FormatStyle::RCPS_OwnLine:
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 2cafc0438ffb46..cd3ffb611839d2 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -10,6 +10,7 @@
 
 #include "FormatTestUtils.h"
 #include "TestLexer.h"
+#include "clang/Basic/TokenKinds.h"
 #include "gtest/gtest.h"
 
 namespace clang {
@@ -2499,6 +2500,15 @@ TEST_F(TokenAnnotatorTest, BraceKind) {
   EXPECT_BRACE_KIND(Tokens[6], BK_Block);
 }
 
+TEST_F(TokenAnnotatorTest, StreamOperator) {
+  auto Tokens = annotate("\"foo\\n\" << aux << \"foo\\n\" << \"foo\";");
+  ASSERT_EQ(Tokens.size(), 9u) << Tokens;
+  EXPECT_FALSE(Tokens[1]->MustBreakBefore);
+  EXPECT_FALSE(Tokens[3]->MustBreakBefore);
+  // Only break between string literals if the former ends with \n.
+  EXPECT_TRUE(Tokens[5]->MustBreakBefore);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

@s1Sharp
Copy link

s1Sharp commented Jan 3, 2024

It seems your changes remain compatible with the old clang-format behavior.
Сode looks quite correct

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a formatting test?

This restores a subset of functionality that was forego in
d68826d.

Streaming multiple string literals is rare enough in practice, hence
that change makes sense in general. But it seems people were
incidentally relying on this for having line breaks after string
literals that ended with `\n`.

This patch tries to restore that behavior to prevent regressions in the
upcoming LLVM release, until we can implement some configuration based
approach as proposed in llvm#69859.
@kadircet kadircet force-pushed the break_after_newline branch from b09c89e to ecc4284 Compare January 4, 2024 07:37
@kadircet
Copy link
Member Author

kadircet commented Jan 4, 2024

Can you also add a formatting test?

done

@kadircet
Copy link
Member Author

kadircet commented Jan 5, 2024

ping @HazardyKnusperkeks @owenca this is blocking releases on our side, anything concerning here?

Comment on lines +5158 to +5159
Right.Next->is(tok::string_literal) && Left.is(tok::string_literal) &&
Left.TokenText.ends_with("\\n\"")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also handle \n and endl now. For example:

Suggested change
Right.Next->is(tok::string_literal) && Left.is(tok::string_literal) &&
Left.TokenText.ends_with("\\n\"")) {
Right.Next->is(tok::string_literal) &&
((Left.is(tok::string_literal) && Left.TokenText.ends_with("\\n\"")) ||
Left.TokenText == "'\\n'" || Left.TokenText == "endl")) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also handle \n and endl now. For example:

yes, but that'll be "new" behavior, and even if we do that, i'd rather do that in a separate patch that can be reviewed/reverted on it's own.

as mentioned, this is mostly to restore some of the "incidental" behavior that was relied on by codebases in the wild. so that changes in the new clang-format release will be minimal.

@kadircet kadircet requested a review from owenca January 8, 2024 09:20
@kadircet kadircet merged commit 27f5479 into llvm:main Jan 8, 2024
@kadircet kadircet deleted the break_after_newline branch January 8, 2024 10:11
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…lvm#76795)

This restores a subset of functionality that was forego in
d68826d.

Streaming multiple string literals is rare enough in practice, hence
that change makes sense in general. But it seems people were
incidentally relying on this for having line breaks after string
literals that ended with `\n`.

This patch tries to restore that behavior to prevent regressions in the
upcoming LLVM release, until we can implement some configuration based
approach as proposed in llvm#69859.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants