Scaling Relic Protocol with zk-SNARKs
Our previous post gave a high-level overview of how Relic Protocol enables smart contracts to trustlessly access all of Ethereum’s historical state. As promised, this follow-up post will give more details about our scaling approach, including the zk-SNARK circuits, proof system, and on-chain verifier.
Motivation
We’ll first give some motivation for our scaling approach, which we hope has some general lessons about scaling blockchain applications.
Quick Recap of State Proofs
Ethereum block headers contain commitments to the entire state (as of the end of the block). Given a block header, smart contracts can use Merkle-Patricia Trie (MPT) proofs to verify any account data, storage slot, transaction data, or receipt contained in the block.
This is, surprisingly, the easy part of provable historical state access. The hard part is: how can smart contracts check that a given block header is valid? The Ethereum Virtual Machine (EVM) only gives access to the last 256 block hashes, so how can we know if a supposed block header from 10000 blocks ago is valid, and doesn’t actually contain commitments to incorrect data? As described in the last post, the naive solutions to this problem are completely impractical due to the costs of Ethereum storage and calldata. Thankfully, there are better ideas.
Compression: Merkle Trees
First, we drastically reduce the amount of storage required by using Merkle Trees. The idea is to divide the block history into fixed-size chunks, build a Merkle Tree of each chunk’s block hashes, and commit only these Merkle roots on-chain. When we need to verify a block hash on-chain, we provide the Merkle proof that the block hash is at the correct index in the Merkle tree.
For Relic, we chose a chunk size of 8192 blocks. This reduces the required storage slots from 15.5 million to only 2000, but still allows us to import new block hashes once per day (8192 blocks is about 27.3 hours).
This solves the storage cost problem, but how can we trust that the Merkle roots were computed correctly? We need some way to ensure that the committed Merkle Trees do not contain incorrect block hashes. Unfortunately, computing the Merkle roots on-chain by passing all of the headers in calldata is cost-prohibitive: the entire history of block headers is over 7GB of data (most of which are non-zero bytes) so this would require about 100 billion gas, or about 2000 ETH at 20 gwei gas.
First Thought: Use a Rollup
If verifying the Merkle roots costs too much on Ethereum, could we instead compute the Merkle roots on some L2 and then send a message to L1 with the verified Merkle roots?
Well, optimistic rollups still require the transaction calldata to be pushed to L1 (due to the data availability problem), so this would only marginally reduce gas costs. In the future, zk-rollups can solve the data availability problem by instead publishing only the state differences to L1, which requires much less data. Of course, validating 7GB of block header data would still require substantial fees due to high proving cost, but it should be much cheaper than an optimistic rollup. Sadly, at the time of writing, no battle-tested rollup is scalable enough for our needs.
Second Thought: Custom “Optimistic” Solution
As is common with scaling solutions, one feasible approach uses economic incentives and fraud proofs to reduce the need for trust. For an “optimistic” approach, we could publish the Merkle roots on-chain and offer a bounty to anyone who can prove the data is invalid. After all, block hashes are publicly available, so anybody should be able to validate the Merkle roots and challenge them if they are incorrect. The challenge procedure is as follows:
- The challenger identifies the last Merkle root they believe to be incorrect.
- The challenger calls the contract’s challenge function, which accepts an entire chunk of block headers.
- The challenge function checks the headers’ validity, i.e. that they are properly linked to each other and that they connect to the first block of the next chunk (if there is no next chunk yet, an EVM-accessible blockhash can be used instead).
- The challenge function computes the verifiably correct Merkle root. If it doesn’t match what was stored, the challenger receives the bounty.
(There are some ways to optimize this, but hopefully this gives the idea).
While this approach could work, it relies on economic assumptions rather than cryptographic assumptions, so some tradeoffs have to be made:
- Settlement time: we must give sufficient time for challenges to occur before relying on the published data. As a new project, we would need an especially large challenge window to allow sufficient trust in the verification.
- Capital inefficiency: we must lock up a meaningful amount of capital to incentivize the verification.
Our Solution: Custom SNARK Circuits
Terminology
For those who are new to zk-SNARK technology, we’ll define a few terms that we use in this section:
- (Arithmetic) Circuit: a model of computation where inputs are fed through a directed graph of operations (called gates) resulting in some outputs. An arithmetic circuit is one where the gates are addition and multiplication. As opposed to most other models of computation, a circuit represents a fixed-size computation.
- SNARK: a proof system where a prover performs some computation and generates a corresponding proof that it was performed correctly, such that a verifier can check the proof using significantly fewer resources than the prover. In most SNARK constructions, the computation is represented as an arithmetic circuit. For apps building on blockchains, the verifier is usually a smart contract.
- zk-SNARK: a SNARK where some of the inputs to the computation are hidden, meaning that verifier does not learn what the value of those inputs are — they only learn that those inputs result in the output of the computation.
- Output: a node in a circuit that is designated to be a “result” of the computation. In SNARKs, an output value is often thought of as just another public input, where the circuit computation instead enforces this input equals the expected computed value.
- Custom gates: additional operations supported by some proof system, often used because representing some operations in terms of arithmetic gates can be too costly. Most custom gate implementations use lookup tables.
Why use custom SNARKs?
The appeal of using a zk-rollup was the massive reduction in calldata costs, but the transaction costs may still be high due to proving costs. After all, we need to transmit 7GB of data and perform roughly 30 million Keccak hashes, which are notoriously expensive to do in arithmetic circuits. Implementing our own SNARK circuits has a few advantages:
- No virtual machine overhead: zk-rollups process transactions in some virtual machine, e.g. StarkNet has the Cairo VM, while zkSync 2.0, Polygon Hermez, Scroll, and others have some type of “zk-EVM”. Although these VMs are more convenient to use than circuits, it should always be possible to squeeze out more performance at the circuit level.
- Optimized proof system: with such a specific application in mind, we can select and optimize a proving system for our use case.
- No cross-domain messaging: if the SNARK computation is directly verified on L1, there is no need to handle L2->L1 messaging, simplifying the logic.
- Production-ready: while zk-rollups and zk-VMs are still being battle-tested and optimized, the underlying SNARK/STARK technologies are much easier to ship today.
Circuit Architecture
To produce validated Merkle roots, our circuits must verify two properties:
- The input block headers are correctly linked to one another.
- The hashes of the input block headers form the output Merkle root.
Because Keccak hashes are expensive, we won’t be able to hash an entire chunk of block hashes in a single circuit. Thankfully, our computation has a nice recursive structure which lets us divide and conquer, using two circuit designs:
Inner Circuit: takes M (some small power of 2) block headers as hidden input, verifies they are properly linked, and outputs
1. merkleRoot
: the Merkle Root of the M block hashes
2. parentHash
: the parent block hash from the first block header
3. lastHash
: the last input block’s hash
Recursive Circuit: takes two proofs (and their outputs) as input, recursively verifies them, checks they are linked (the lastHash
of the first proof equals the parentHash
of the second proof), and outputs:
1. merkleRoot
: the Merkle root of the two inputmerkleRoot
values
2. parentHash
: the parentHash
from the first proof
3. lastHash
: the lastHash
from the second proof
The recursive circuit’s proofs and outputs can then be fed back into itself to recursively aggregate more and more block hashes into a single proof to verify on-chain!
Choosing a Proof System
With this circuit architecture in mind, we need a proof system which can efficiently handle Keccak hashes and recursive verification. As always, it would be nice to have no trusted setup (or at least a universal trusted setup), so we don’t have any Relic-specific toxic waste. And of course, it must be efficient to verify the proofs in the EVM, so it should either use very small fields or Ethereum’s natively supported BN-254 curve (sometimes called BN256).
There are a few options that meet these criteria, but two stood out:
- Polygon’s Plonky2: a relatively new proof system with incredibly fast recursive proof times. However, this efficiency comes with the cost of large proof sizes, which means high calldata costs. For applications that need very high throughput (such as a zk-rollup), this is a worthwhile tradeoff. However, our use case can trade longer proof times for cheaper verification.
- Aztec’s UltraPLONK: a more battle-tested proof system which has been used in production for zkSync, Zcash, and others. Conveniently, many of the custom gates we need (for Keccak, SHA, etc) are already implemented by Matter Labs for use in zkSync 2.0. Although the proving times are higher than Plonky2, PLONK has very small proof sizes, so verification costs are currently much lower.
Due to the cheaper verification costs and existing tooling, UltraPLONK fits our needs better. But, high proving times could pose challenges, since we must be fast enough to keep up with the stream of Ethereum blocks. Thankfully, after a few optimizations, proving times are low enough that a single low-end consumer GPU can easily keep up with the chain.
Implementation and Optimizations
Our proof system is based on the Matter Labs fork of bellman, but with a few changes:
- Added GPU proving support based on Filecoin’s work
- Optimized a few inner functions based on profiling results
Additionally, since proving times are roughly proportional to the number of gates, we used a few tricks to reduce the size of our circuits:
- Used hash function gadgets based on lookup tables, greatly reducing the number of gates needed to perform hashes.
- Used SHA-256 instead of Keccak to build our Merkle trees. Keccak must be used to compute the block hashes, but we can use any EVM-efficient hash function for our Merkle Trees. SHA-256 offers a nice balance between circuit size and EVM gas cost, due to the Ethereum precompile.
- Modified the recursive aggregation logic to use a fixed inner-proof verification key (VK). In our case, the inner proofs come from the same circuit, so we can hard-code the verification key to reduce the number of gates needed to aggregate proofs. This comes at the cost of having a different VK for each proof size (e.g. a proof containing 8K blocks has a different VK than a proof containing 16K blocks), but we only need to support on-chain verification of a few of proof sizes.
If you’re curious, our circuit code is public on github. You may also want to check out our recursive PLONK verifier written in yul assembly, which reduces gas costs for verification on-chain.
Future Work
Our usage of SNARKs so far has enabled very cheap verification of historical block hashes in the EVM. As a result, the gas costs to access historical state are dominated by the Merkle-Patricia Trie proofs. We are experimenting with aggregating many MPT proofs into a single SNARK, which has the potential to bring the proving cost per state fact down dramatically.
Questions? Comments? Join our discord and let’s discuss!