-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[clang] Introduce "binary" StringLiteral for #embed data #127629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
StringLiteral is used as internal data of EmbedExpr and we directly use it as an initializer if a single EmbedExpr appears in the initializer list of a char array. It is fast and convenient, but it is causing problems when string literal character values are checked because #embed data values are within a range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char type. This PR introduces new kind of StringLiteral to hold binary data coming from an embedded resource to mitigate these problems. The new kind of StringLiteral is not assumed to have signed char type. The new kind of StringLiteral also helps to prevent crashes when trying to find StringLiteral token locations since these simply do not exist for binary data. Fixes llvm#119256
@llvm/pr-subscribers-clang Author: Mariya Podchishchaeva (Fznamznon) ChangesStringLiteral is used as internal data of EmbedExpr and we directly use it as an initializer if a single EmbedExpr appears in the initializer list of a char array. It is fast and convenient, but it is causing problems when string literal character values are checked because #embed data values are within a range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char type. Fixes #119256 Full diff: https://github.com/llvm/llvm-project/pull/127629.diff 5 Files Affected:
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm::APInt(N->getValue().getBitWidth(), | ||
EExpr->Data->BinaryData->getCodeUnit(CurOffset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change that line (and nothing else) does it still works?
I have a feeling this is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish it did. This particular line helps for generic case of #embed
inside of an initializer list. The rest that this patch is adding is for #embed "fast path" where we simply put StringLiteral to the initializer list instead of EmbedExpr when a single #embed is used to initialize char array.
When we do that, the iterators of EmbedExpr won't be in use and the fail from #119256 is still in place.
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! It also needs a release note for the fix. :-)
In general, I think this seems reasonable, but I'd like confirmation from @cor3ntin given his somewhat recent thinking about unevaluated string literals and where those end up touching the rest of the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay
No changelog because we want to backport?
LGTM, but i think this could use some comments
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind { | |||
UTF8, | |||
UTF16, | |||
UTF32, | |||
Unevaluated | |||
Unevaluated, | |||
Binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining this is for embed?
I'm sorry it took me a while to understand how this patch works.
(The reason is that this allows us to not "cast" to char in getCodeUnitS()
- which is only used in C23 mode)
Maybe also add a comment in getCodeUnitS
and/or CheckC23ConstexprInitStringLiteral
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments in 8ff7d03 . Do you think it turned out helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, Thanks!
No changelog because I forgot. Should we backport though? |
This seems simple enough to warrant backporting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -1752,7 +1752,8 @@ enum class StringLiteralKind { | |||
UTF8, | |||
UTF16, | |||
UTF32, | |||
Unevaluated | |||
Unevaluated, | |||
Binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, Thanks!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/12901 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/9355 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/9949 Here is the relevant piece of the build log for the reference
|
Ooops, I'll fix the bots quickly |
The bots are upset by llvm#127629 . Fix that.
The bots are upset by #127629 . Fix that.
…(#132201) The bots are upset by llvm/llvm-project#127629 . Fix that.
StringLiteral is used as internal data of EmbedExpr and we directly use it as an initializer if a single EmbedExpr appears in the initializer list of a char array. It is fast and convenient, but it is causing problems when string literal character values are checked because #embed data values are within a range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char type. This PR introduces new kind of StringLiteral to hold binary data coming from an embedded resource to mitigate these problems. The new kind of StringLiteral is not assumed to have signed char type. The new kind of StringLiteral also helps to prevent crashes when trying to find StringLiteral token locations since these simply do not exist for binary data. Fixes llvm#119256
The bots are upset by llvm#127629 . Fix that.
StringLiteral is used as internal data of EmbedExpr and we directly use it as an initializer if a single EmbedExpr appears in the initializer list of a char array. It is fast and convenient, but it is causing problems when string literal character values are checked because #embed data values are within a range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char type. This PR introduces new kind of StringLiteral to hold binary data coming from an embedded resource to mitigate these problems. The new kind of StringLiteral is not assumed to have signed char type. The new kind of StringLiteral also helps to prevent crashes when trying to find StringLiteral token locations since these simply do not exist for binary data. Fixes llvm#119256
The bots are upset by llvm#127629 . Fix that.
…(#132201) The bots are upset by llvm/llvm-project#127629 . Fix that.
StringLiteral is used as internal data of EmbedExpr and we directly use it as an initializer if a single EmbedExpr appears in the initializer list of a char array. It is fast and convenient, but it is causing problems when string literal character values are checked because #embed data values are within a range [0-2^(char width)] but ordinary StringLiteral is of maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming from an embedded resource to mitigate these problems. The new kind of StringLiteral is not assumed to have signed char type. The new kind of StringLiteral also helps to prevent crashes when trying to find StringLiteral token locations since these simply do not exist for binary data.
Fixes #119256