Better safeguard against incompatible builtin handles between dialects #15961

clonker · 2025-03-20T12:12:26Z

Inspired from the remarks by @cameel in #15952 (comment), I have refactored EVMDialect a bit so that there is a class for generating the builtin vector that will automatically insert blanks if conditions aren't met and not give direct access to the underlying raw datastructure.
Also, I have added a unit test which takes the default dialect (without EOF) as well as the latest dialect with EOF and tests whether the populated builtin functions with their handles contained in the respective dialects are preserved when compared to all other dialects with and without object access.

cameel

Looks like a step in a good direction, but if we're refactoring these builtins, there's a way to do it in a way that will more completely eliminate the possibility of introducing these kinds of mistakes. We should simply separate they declarations with the logic that defines them. See comments below for details.

cameel · 2025-03-20T20:22:59Z

libyul/backends/evm/EVMDialect.cpp

 	if (!_eofVersion.has_value())
 	{
-		builtins.emplace_back(createIfObjectAccess("datasize", 1, 1, SideEffects{}, ControlFlowSideEffects{}, {LiteralKind::String}, [](
+		builtins.addIfObjectAccess(_objectAccess, "datasize", 1, 1, SideEffects{}, ControlFlowSideEffects{}, {LiteralKind::String}, [](
 			FunctionCall const& _call,


The fact that now some conditions are handled here and some in Builtins still bothers me a bit. I think it would be best if we made the whole thing more declarative. I.e. have one place that declares the builtins along with their scopes (i.e. EVM/EOF version range and state of object access). Then have another place that loops over all builtins that decides whether to actually define them based on their scopes.

The definition loop would then be the single place that decides whether the opcode should be skipped and whether to leave an empty spot in m_functions.

This would be a more comprehensive solution for the problem, because we would no longer touch that definition loop when adding new opcodes. We would only touch declarations and scopes without the risk of accidentally breaking the logic.

BTW, not sure "scope" is the best way to call it, but I could not think of anything better so far. Could be just "conditions", but that sounds very non-specific.

After thinking about it for a while, I think I do like Scope!

cameel · 2025-03-20T21:11:14Z

libyul/backends/evm/EVMDialect.cpp

+			!(_opcode >= evmasm::Instruction::DUP1 && _opcode <= evmasm::Instruction::DUP16) &&
+			!(_opcode >= evmasm::Instruction::SWAP1 && _opcode <= evmasm::Instruction::SWAP16) &&
+			!isPushInstruction(_opcode) &&
+			_opcode != evmasm::Instruction::JUMP &&
+			_opcode != evmasm::Instruction::JUMPI &&
+			_opcode != evmasm::Instruction::JUMPDEST &&
+			_opcode != evmasm::Instruction::DATALOADN &&
+			_opcode != evmasm::Instruction::EOFCREATE &&
+			_opcode != evmasm::Instruction::RETURNCONTRACT &&
+			_opcode != evmasm::Instruction::RJUMP &&
+			_opcode != evmasm::Instruction::RJUMPI &&
+			_opcode != evmasm::Instruction::CALLF &&
+			_opcode != evmasm::Instruction::JUMPF &&
+			_opcode != evmasm::Instruction::DUPN &&
+			_opcode != evmasm::Instruction::SWAPN &&
+			_opcode != evmasm::Instruction::RETF &&


For builtins based on opcodes, this part of the condition is the declarative part. I'd move it to a separate function that returns a boolean. Then when adding a new opcode we'd only really modify that function.

In fact, I'd actually create more than one function. I think we should make it more semantic by having smaller functions expressing specific properties the excluded opcodes satisfy:

All the DUP/SWAP/PUSH variants: low-level stack manipulation opcodes.

We should also use SemanticInformation::isSwapInstruction()/SemanticInformation::isDupInstruction() here.

All the JUMP variants and RETF/CALLF: low-level control flow opcodes

DATALOADN, EOFCREATE, RETURNCONTRACT: these are actually exposed, but not directly. We have builtins replacing them. We can define a separate function for them, but a nicer alternative would be to be able to declare for a builtin that it replaces an instruction and have the definition logic exclude that instruction automatically.

Executing these functions and EVMVersion::hasOpcode() would then be a fixed part of the definition logic and would not have to be adjusted. It would be the equivalent of checking the scope of a builtin.

I have separated out the builtin function definitions into its own class which then can serve as reference database for all EVM dialects. They can iterate over it and declare that they have a specific function by simply referring to it.
That should also reduce memory pressure a bit, as there's only ever one copy of a builtin function (ignoring these no-output builtins).
So in principle the dialect is then simply a bool field (or implicitly convertible to one at least) over all possible builtins.

cameel · 2025-03-20T21:27:13Z

libyul/backends/evm/EVMDialect.cpp

+
+/// Make sure to only add builtins in a way that is consistent over EVM versions. If the order depends on the
+/// EVM version - which can easily happen using conditionals -, different dialects' builtin handles
+/// become inherently incompatible.


Thinking about it, the compatibility between dialects with and without object access should be more prominently documented in the header. I'd mention it for strictAssemblyForEVMObjects() and maybe also for the whole EVMDialect class.

TBH we guarantee so much compatibility between them that I'm not sure we should even be calling these distinct dialects. Move like flavors of the same dialect. It seems like an orthogonal property that could apply to any dialect.

test/libyul/EVMDialectCompatibility.h

test/libyul/EVMDialectCompatibility.cpp

cameel · 2025-03-20T22:15:49Z

test/libyul/EVMDialectCompatibility.cpp

+	builtin_function_handle_compatibility_eof,
+	bdata::monomorphic::grid(
+		bdata::make(generateEVMDialectConfigurationsToTest(std::nullopt)) + bdata::make(generateEVMDialectConfigurationsToTest(1)),


This seems to actually be testing both EOF and non-EOF despite the name.

And a much more relevant difference is that it's using the latest rather than the default EVM version. In not so distant future both current and latest will have EOF, but the fact that current is not always the latest will remain. If you replace withEOF check with withEOF && evmVersion.supportsEOF() it will be more future-proof. You can then unify the tests and make the source EVM version just another dimension.

Also, I think we should have test case where we cover the "inline" -> "with objects" conversion for all dialects. Converting between distinct dialects is nice to have but these pairs are the most relevant, because for them we actually do perform such a conversion in practice.

github-actions · 2025-04-18T12:06:17Z

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

cameel

Looks good overall, I see no big issues, but there a few minor things that could be improved or cleaned up.

Especially the instruction and replaced scopes seem redundant to me.

libyul/backends/evm/EVMBuiltins.cpp

cameel · 2025-05-12T11:17:09Z

libyul/backends/evm/EVMBuiltins.cpp

+	m_functions.emplace_back(objectAccess | requiresEOF, eofcreateBuiltin());
+	m_functions.emplace_back(objectAccess | requiresEOF, returncontractBuiltin());
+
+	using namespace std::string_literals;


Let's keep these at the top of the file.

It seems better to me if using namespaces are as local as possible. Less potential for conflicting definitions. I have rewritten it to just not use the string literal so your comment is resolved but still. This is a general pet-peeve of mine with the style we use in implementation units. I'd prefer if we just explicitly declare namespace(hierarchies) and the implementation lived in them.

cameel · 2025-05-12T11:42:16Z

libyul/backends/evm/EVMBuiltins.cpp

+			opcode == evmasm::Instruction::SWAPN ||
+			opcode == evmasm::Instruction::DUPN ||
+			evmasm::SemanticInformation::isSwapInstruction(opcode) ||
+			evmasm::SemanticInformation::isDupInstruction(opcode)


These functions already check for SWAPN/DUPN.

Suggested change

opcode == evmasm::Instruction::SWAPN ||

opcode == evmasm::Instruction::DUPN ||

evmasm::SemanticInformation::isSwapInstruction(opcode) ||

evmasm::SemanticInformation::isDupInstruction(opcode)

SemanticInformation::isSwapInstruction(opcode) ||

SemanticInformation::isDupInstruction(opcode)

Same in isLowLevelStackManipulationInstruction().

Also, shouldn't you be skipping PUSH and isLowLevelControlFlowInstruction() here as well? We don't expose them in any dialect.

These functions already check for SWAPN/DUPN.
Same in isLowLevelStackManipulationInstruction().

I purposefully did it that way because of how AssemblyItem is designed. We have the opcode here (type evmasm::Instruction) which is implicitly converted to evmasm::AssemblyItem. We added asserts to not create assembly items for SWAPN/DUPN with the implicit conversion constructor but to always use the static helper methods. This makes it necessary to test for them individually and beforehand. It is quite unfortunate.

Also, shouldn't you be skipping PUSH and isLowLevelControlFlowInstruction() here as well? We don't expose them in any dialect.

Not if we want the builtins thing to be a field over all possible (except for swap/dup) functions - which was my intention at least.

libyul/backends/evm/EVMBuiltins.h

cameel · 2025-05-12T11:59:18Z

libyul/backends/evm/EVMBuiltins.h

+	static Scopes constexpr instruction{1 << instructionBit};
+	static Scopes constexpr replaced{1 << replacedInstructionBit};
+	static Scopes constexpr objectAccess{1 << objectAccessBit};
+	static Scopes constexpr requiresEOF{1 << requiresEOFBit};
+	static Scopes constexpr requiresNonEOF{1 << requiresNonEOFBit};


Suggested change

static Scopes constexpr instruction{1 << instructionBit};

static Scopes constexpr replaced{1 << replacedInstructionBit};

static Scopes constexpr objectAccess{1 << objectAccessBit};

static Scopes constexpr requiresEOF{1 << requiresEOFBit};

static Scopes constexpr requiresNonEOF{1 << requiresNonEOFBit};

static Scopes constexpr instructionScope{1 << instructionBit};

static Scopes constexpr nonInstructionScope{1 << replacedInstructionBit};

static Scopes constexpr objectAccessScope{1 << objectAccessBit};

static Scopes constexpr eofScope{1 << requiresEOFBit};

static Scopes constexpr nonEOFScope{1 << requiresNonEOFBit};

Alternatively, might be enough to place them inside Scopes without the suffix so that you can refer to them as e.g. Scope::eof, but that would conflict with functions you already have there.

I prefer it without the Scope suffix tbh. The type already tells the rest of the story.

libyul/backends/evm/EVMBuiltins.cpp

cameel · 2025-05-12T12:44:34Z

libyul/backends/evm/EVMDialect.cpp

+		builtinShouldBeAdded &= !scopes.requiresEOF() || _eofVersion.has_value();
+		builtinShouldBeAdded &= !scopes.requiresNonEOF() || !_eofVersion.has_value();


Please assert that both flags are not set at the same time.

Same for instruction and replaced.

Agree for the EOF ones, added an assert for that. Not sure why instruction and replaced should be mutually exclusive though. One means that it is the builtin belonging to an instruction, replaced means it was replaced by something else and shouldn't be added to a dialect (in favor of whatever it was replaced with).

Ah you mean that replaced should only be set if instruction is set? I'm not sure if that isn't too restrictive. Happy to add it if you insist but I don't find it particularly necessary.

cameel · 2025-05-14T16:12:15Z

libyul/backends/evm/EVMBuiltins.h

+		/// whether the corresponding evm builtin function is an instruction builtin
+		bool instruction() const { return value.test(instructionBit); }
+		/// whether the corresponding evm builtin has been replaced by another builtin, ie, should be skipped
+		bool replaced() const { return value.test(replacedInstructionBit); }


I'm not sure we need these two. As long as an instruction matches the EVM version and EOF, we always add it.

And when we replace an instruction we do not use the original in any dialect. We can just replace it permanently in EVMBuiltins and not even expose the fact that it has been replaced to EVMDialect.

I mean, it could make sense to also go the other way and just define builtins for all instructions and only have EVMDialect worry about choosing the ones it wants (possibly never choosing some of them). But you don't seem to be going for that here, since you don't create builtins for SWAP and DUP instructions. If you're skipping some of them, you can just as well skip all the ones we never use.

My idea was indeed to just declare all of them. Then the dialect is essentially just a bool field over all functions. For SWAP and DUP I made an exception because we currently cannot define functions for them because we can't define side effects:

solidity/libevmasm/SemanticInformation.cpp

Line 481 in 68c4aa3

assertThrow(!isDupInstruction(_instruction) && !isSwapInstruction(_instruction), AssemblyException, "");

cameel · 2025-05-14T17:39:37Z

libyul/backends/evm/NoOutputAssembly.cpp

+		}
+
+		return modifiedBuiltins;
+	}();


I'm not a fan of this pattern. The () at the end just stand out enough. And it's not just me: Improving Readability of IIFE .

This two especially would be perfectly fine as normal, static functions with well defined inputs and outputs and descriptive names.

Conversely you'd probably be able to find people and evidence who like the pattern. I do, for example. It makes it very easy for me to see that a variable is assigned the result of an expression and the implementation of the expression is right below it. Matter of preference and what your brain can easily digest when reading it, as it is the case with many things :)

Here especially I think the function is quite localized and does not necessarily warrant a 'full' static function, the IIFE improves readability and also clarifies scoping for definition of m_functions. The second one is purposefully not static as it depends on the input dialect.

I have modified it so that (I hope) it is more to your liking but I want to note that I think it rather hurts readability (for me at least) and also doesn't change anything in terms of semantics.

…ion and configuration

This makes for a better separation between the declarations and the logic that defines builtins. # Conflicts: # libyul/backends/evm/EVMDialect.cpp

Between current default dialect as well as latest dialect and all other dialects.

clonker requested a review from cameel March 20, 2025 12:12

clonker mentioned this pull request Mar 20, 2025

IRGeneratorForStatements: Remove outdated check against usr$ prefixing of builtins #15952

Merged

clonker force-pushed the safeguard_builtin_handles branch 2 times, most recently from 27f4e90 to 106775f Compare March 20, 2025 15:17

cameel reviewed Mar 20, 2025

View reviewed changes

cameel added the refactor label Mar 20, 2025

clonker force-pushed the safeguard_builtin_handles branch from 106775f to 3e3258d Compare April 4, 2025 05:27

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 18, 2025

cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 18, 2025

clonker force-pushed the safeguard_builtin_handles branch 19 times, most recently from 5128ef7 to 3ed3248 Compare April 29, 2025 08:26

clonker requested a review from cameel April 29, 2025 08:38

clonker force-pushed the safeguard_builtin_handles branch from 3ed3248 to 8a7d619 Compare April 29, 2025 08:47

clonker force-pushed the safeguard_builtin_handles branch 3 times, most recently from 5784211 to b19cde2 Compare May 9, 2025 12:47

cameel reviewed May 14, 2025

View reviewed changes

clonker force-pushed the safeguard_builtin_handles branch from b19cde2 to 4a48ff7 Compare May 22, 2025 08:53

clonker added 2 commits May 22, 2025 11:21

Add EVMVersion::current() helper

e37a5e2

First define _all_ EVM functions, then declare them based on EVM vers…

f2f1a0e

…ion and configuration

clonker force-pushed the safeguard_builtin_handles branch from 4a48ff7 to a8e5124 Compare May 22, 2025 09:21

clonker added 4 commits May 22, 2025 11:33

Separate out builtin function collection into its own class

fd67703

This makes for a better separation between the declarations and the logic that defines builtins. # Conflicts: # libyul/backends/evm/EVMDialect.cpp

Add allEOFVersions to EVMVersion

610cc0b

EVMVersion::allVersions is constexpr

d923c0f

EVMDialect: add test that enforces builtin handle compatibility

1d8b77d

Between current default dialect as well as latest dialect and all other dialects.

clonker force-pushed the safeguard_builtin_handles branch from a8e5124 to 1d8b77d Compare May 22, 2025 09:33

		builtinShouldBeAdded &= !scopes.requiresEOF() \|\| _eofVersion.has_value();
		builtinShouldBeAdded &= !scopes.requiresNonEOF() \|\| !_eofVersion.has_value();

Better safeguard against incompatible builtin handles between dialects #15961

Are you sure you want to change the base?

Better safeguard against incompatible builtin handles between dialects #15961

Uh oh!

Conversation

clonker commented Mar 20, 2025

Uh oh!

cameel left a comment

Choose a reason for hiding this comment

Uh oh!

cameel Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

cameel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clonker May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clonker May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clonker May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

cameel Mar 20, 2025 •

edited

Loading

cameel Mar 20, 2025 •

edited

Loading

clonker May 22, 2025 •

edited

Loading

clonker May 22, 2025 •

edited

Loading

clonker May 22, 2025 •

edited

Loading