An introduction to evm-trace, a fast and correct Python library to work with Ethereum Virtual Machine traces.
There are two common trace formats, which we’ll refer to as Geth-style and Parity-style traces. Geth traces provide opcode-level resolution, while the most commonly used Parity trace format provides a call tree.
Our goal today would be to make opcode-resolution traces 100x faster, making them from a little-known Parity trace format. We’ll also make them compatible with Geth traces so they can be used as a drop-in replacement.
Geth traces return frames containing the full state of the virtual machine at each step of the execution, including the program counter, opcode, remaining gas, gas cost of the operation, call depth, items on the stack, a snapshot of the memory, and a storage diff.
Each subcall starts with a fresh and separate memory, stack, and storage changes.
It’s a powerful but a bit unwieldy format, since you need to do extra work to know things like what’s the current contract being called or what’s the calldata.
I started digging into this when I stumbled upon several transactions that crash Erigon if you request a debug trace. It appeared that they generated very large traces, and soon enough I was able to confirm why when Erigon re-enabled streaming for heavy RPC endpoints.
In Geth format each frame contains the full copy of the EVM memory, which can lead to traces that don’t fit in the system memory. A contract expanding the memory to 100kb could generate traces as large as 70 GB.
This can be addressed by streaming and iteratively decoding the response as you get it from the Ethereum client. This is what ape does by default with Geth traces when you call receipt.get_transaction_trace()
.
If we look up the memory expansion gas cost formulas at evm.codes, it becomes apparent that EVM memory is not that expensive and we can possibly construct or find transactions with even larger traces.
After sampling peak memory usage over 10,000 most recent blocks, we find even larger traces. For one of those, Erigon hangs up on us after 30 minutes, after returning 460 GB and 898,836 frames.
We can patch Erigon to increase the write timeout, and learn that the full trace consists of 960,198 frames, takes up 524 GB and 34m7s to fetch, before deserialization or processing. We can calculate that we got this trace at 275 MB/s and 469 frames/s.
Keep that number in mind, because with a method presented below, we can recompute the same trace using 137 MB of data (3916x less), which we can fetch in 5.6s, deserialize in 7.1s and replay in 4.6 seconds, with full processing taking 117x less time than just fetching a Geth trace.
Parity traces are quite different from Geth, the ad hoc traces come in three different flavors: trace
, vmTrace
, and stateDiff
.
Parity traces are supported in Erigon, Nethermind, Besu, Foundry.
They are not supported in Geth, Hardhat, Ganache.
Anything beyondtrace
type requires an archive node.
The trace
is the most widely used mode, it returns a call tree, similar to what you can see on ethtx.info or when calling receipt.show_trace()
in ape. If you need a raw call tree, you can get it from chain.provider.get_call_tree(tx)
. You can also provide a custom display class and only display the parts you care about.
The stateDiff
mode shows the changes of balance, code, nonce, and storage grouped by account.
A more interesting problem here is how to map storage keys to readable names like balanceOf[account]
. I’ve seen it solved in Tenderly, which is a proprietary tool.
Maybe you, the reader, could be the one to tackle this problem and release it as an open source tool for everyone to use.
The vmTrace
mode is one of the most enigmatic and rarely used, mainly because it was never really documented. This mode will be the main focus of this article and I’ll show you how you can make a drop-in replacement for Geth traces that requires 1000x less data and works 100x faster.
vmTrace returns a nested structure of incremental updates which we can use to replay the transaction, while tracking stack, memory, storage at each step of the execution. Of course, the incremental format doesn’t come for free and requires some legwork.
Here is what we get in a vmTrace
response:
VMTrace
represents a call and contains all subcalls
code
EVM bytecode to be executedops
list of VMOperation
to be executedVMOperation
represents a step of the execution
pc
program countercost
gas cost of the operationex
the result of the execution, could be null
if the operation has revertedsub
list of VMTrace
subcallsop
opcode nameidx
index in the call treeVMExecutedOperation
represents side effects of executing the operation
used
incorrectly named, shows the remaining gaspush
the items to be placed on the stackmem
the memory deltastore
the storage deltaMemoryDiff
represents a memory update
off
memory offset where to write the datadata
the bytes to write starting at the offsetStorageDiff
represents a storage write
key
storage key to write toval
value to writeI defined this structure using msgspec to get fast deserialization and type validation.
Now we need to replay the transaction, applying the incremental updates and untangling the nested structure into a flat execution order.
I borrowed the real Stack and Memory implementations from py-evm, which is a great EVM reference itself.
I wrote a small harness to do pairwise comparison of each frame of my synthetic trace with Geth trace frames, which exited after the first divergence in stack or memory. I learned that Geth sends frames after memory expansion but before anything from ex
has been applied.
Memory was no problem, but the stack was missing a few important pieces. We had push
, but no info about the number of items popped from the stack by each opcode.
I’ve assembled this list by hand using this file from ethereumjs
as a reference. These opcodes pop a fixed amount of items:
EXTCODEHASH
, ISZERO
, NOT
, BALANCE
, CALLDATALOAD
, EXTCODESIZE
, BLOCKHASH
, POP
, MLOAD
, SLOAD
, JUMP
, SELFDESTRUCT
SHL
, SHR
, SAR
, REVERT
, ADD
, MUL
, SUB
, DIV
, SDIV
, MOD
, SMOD
, EXP
, SIGNEXTEND
, LT
, GT
, SLT
, SGT
, EQ
, AND
, XOR
, OR
, BYTE
, SHA3
, MSTORE
,MSTORE8
, SSTORE
, JUMPI
, RETURN
RETURNDATACOPY
, ADDMOD
, MULMOD
, CALLDATACOPY
, CODECOPY
, CREATE
CREATE2
, EXTCODECOPY
STATICCALL
, DELEGATECALL
CALL
, CALLCODE
These opcodes can be expressed as a formula:
LOG0-LOG4
pops 2-6 itemsSWAP1-SWAP16
pops¹ 2-17 itemsDUP1-DUP16
pops¹ 1-16 items¹ not really, this is only true for how the vmTrace format is served
Some transactions reconciled, but the stack still diverged, especially for large transactions. Using foundry’s debugger I found that the incremental updates have missed occasional pushes for several opcodes, namely SMOD
, CHAINID
, COINBASE
. I fixed this in Erigon and the patch has already made it to 2022.07.03-alpha
release.
This finding has made me question a lot of things. Most people fully trust the data they get from the node, but Ethereum is a very complex system and bugs happen. It’s great having so many clients and references and different approaches to check on each other.
I’ve added full block tracing with opcode-level resolution and an option to return a memoryview
of the memory bytearray
instead of copying it for each frame. It works like a pointer and comes with a caveat that you must process the frame immediately after receiving it from the iterator, otherwise the value may be mutated. It should work for most streaming workflows with traces, since you are usually scanning for some specific frames and don’t look back.
I’ve benchmarked evm-trace against the largest traces I could identify. Most of them were replayed in under 3 seconds, with no significant slowdown even when a lot of memory is allocated. I’ve also benchmarked it against 1000 most recent Ethereum mainnet blocks. It was pleasant seeing the needle has moved from seconds per trace to traces per second.
When dealing with full blocks, you can expect to fetch a median of 27.4 transactions per second from the node and replay 108.8 transactions per second at around 252,000 frames per second. When requesting traces of individual transactions, your client replays all transactions in the block up to the requested transaction index to get to the correct state and discards all other traces. To give you the intuition, here is how many frames you would usually encounter in a block:
This lightning-fast drop-in replacement for Geth-style traces is available in the latest evm-trace and soon will be integrated into ape, your new favorite Ethereum development and data science framework.