Executive Summary
Hi I’m Mofi, a protocol engineer at OP Labs. OP Labs is a software development company focused on the Optimism ecosystem and a core developer of the OP Stack. We provide some services to, but do not represent or speak on behalf of, the Optimism Foundation.
This upgrade is proposed in response to security vulnerabilities identified during a series of third-party security audits by Spearbit, Cantina, and Code 4 rena. None of the vulnerabilities have been exploited, and user assets are not and were never at risk. However, out of an abundance of caution, the permissioned fallback mechanism has been activated in order to avoid any potential instability while the vulnerabilities are patched. For more information on the permissioned fallback mechanism and the OP Stack’s defense-in-depth approach to Fault Proof security, see the documentation 50 .
The upgrade includes both a set of smart contract upgrades to fix the vulnerabilities identified in the audit as well as an L 2 hardfork to improve the stability and performance of the fault proof system. In addition, we propose extending the capabilities of the Guardian and DeputyGuardian to set the anchor state for the fault proof system in order to prevent referencing invalid anchor states. Aside from implementing these fixes, the primary impact of this upgrade would be to reset user withdrawals at the planned time, similar to the initial Fault Proof upgrade.
Motivation
As described in the original upgrade proposal 36 , our rollout strategy for Fault Proofs focuses on securing the fundamental security mechanisms first, then building confidence on the correctness of the rest of the system over time. Successfully performing this strategy requires that:
The fallback mechanisms are activated whenever there is a risk of a security vulnerability.
Any vulnerabilities that may arise are swiftly patched.
Therefore, the Foundation (via the Deputy Guardian role) activated the permissioned fallback which restricts output proposals to a trusted proposer and we at OP Labs created this proposal to resolve the vulnerabilities identified by the security audits. Note that the Guardian role is generally authorized to enforce “Safety over Liveness” in the system, meaning that it can pause and unpause withdrawals and as part of the Fault Proofs upgrade was authorized to intervene in the event that a bug would allow an invalid L 2 output to be finalized.
Specifications
The specification for the proposed changes can be found in the specs repo 24 . Details of the individual audit issues are enumerated in the Audit Issues section below.
Technical Details
In the section below, we will summarize each of the audit issues we are fixing as well as the smart contracts/off-chain components affected by the upgrade. The full reports for each audit can be found here:
Spearbit 46
Cantina 33
Code 4 rena 30
Note that this upgrade only contains the most important issues (in our opinion) from each audit. We plan on addressing other, lower-severity issues in future upgrades.
Audit Issues
The table below lists all the audit issues that are fixed in this upgrade. Issue severities have been updated to match the Optimism ImmuneFi bounty 23 . While the auditors did discover some high severity issues, no user assets were ever at risk. All of the audit issues listed below can be detected by our monitoring tooling. Had an exploit been detected, the Deputy Guardian role - which is held by the Optimism Foundation and revocable by the Security Council - would have been expected to blacklist any exploitable dispute games or activate the permissioned fallback.
Note that we have updated the spec to clarify some assumptions around the Cannon VM and program. Specifically, we trust that the Go compiler will emit proper MIPS 32 programs. As a result, the Cantina issues that reference problems related to invalid MIPS 32 programs are considered out-of-scope and will not be fixed.
Unless otherwise noted, vulnerabilities in the dispute game all occur at MAX_GAME_DEPTH and are therefore classified as medium.
Issue ID
Severity
Summary
Cantina 3 . 1 . 1 : Allocation overflow could allow for arbitrary code execution
High
Mmap calls did not perform memory bounds checking, which allowed memory pointers to wrap around to zero and access the entire memory of the fault proof program including the data and text sections of active MIPS programs. This could lead to arbitrary code execution within the VM, effectively breaking the VM’s correctness guarantees. No PoC was given, and it is our belief that such an exploit is infeasible due to memory protections employed by the Go runtime. Nevertheless, given the potential impact of the issue we worked with the auditors to classify this as a high severity issue and deploy a fix.
C 4 H- 01 : Invalid DISPUTED_L 2 _BLOCK_NUMBER is passed to VM
High
An attacker can counter a valid output claim by providing a trace containing one block after the original claim. For example, if an output root is proposed for block 13 , the attacker could counter using a trace that includes valid blocks up to block 14 . This issue is classified as high severity since it occurs above MAX_GAME_DEPTH.
Spearbit 5 . 1 . 1 : PreimageOracle.loadPrecompilePreimagePart an outOfGas error in the precompile will overwrite correct preimageParts
Medium
See section below.
C 4 H- 02 : The LPP challenge period can cause malicious and freeloader claims to be uncounterable and can also cause freeloader claims to be abused to entrap honest challengers
Medium
The clock extension mechanism is designed to give an honest actor time to counter freeloader claims, even though in that case they will inherit the opposing chess clock which may have very limited time remaining. Since the LPP challenge period was longer than the clock extension period, the clock extension granted in that case would not be sufficient to allow the honest actor to complete a call to step. While the overall game still resolves correctly, the challenger would lose bonds posted in attempting to counter the freeloader claim.
C 4 H- 05 : An attacker can bypass the challenge period during LPP finalization
Medium
An attacker can bypass the large preimage challenge period by calling addLeavesLPP with _finalize set to false. Since the challenge period timestamp is never set, attackers can then call squeezeLPP, thereby bypassing the challenge period and inserting invalid data into the preimage oracle.
Cantina 3 . 3 . 5 : Wrong implementation of srav
Low
The srav instruction does not mask the 5 lower bits of rs, which is nonconformant with the MIPS specification. This could lead to undefined behavior.
Cantina 3 . 4 . 2 : Location of registers array in memory should be verified
Low
The on-chain MIPS VM does not verify that the registers array is allocated right after the state struct. Validating this would make the code more defensive.
Spearbit 5 . 2 . 5 : Preimage proposals can be initialized multiple times
Low
initLPP() does not check if a proposal already exists. This could lead to loss of funds if a user provides the same LPP UUID multiple times.
Spearbit 5 . 2 . 3 : Extension period not applied correctly for next root when SPLIT_DEPTH is set to 1 or less
Low
When SPLIT_DEPTH is set to 1 , the extension period for the next root is calculated as zero which results in no extension period being applied. If SPLIT_DEPTH is set to zero, subsequent game moves will result due to an integer underflow.
Spearbit 5 . 1 . 2 : Invalid Ancestor Lookup Leading to Out-of-Bounds Array Access
Low
An out of bounds array access can occur when MAX_DEPTH = SPLIT_DEPTH + 1 . While this is unlikely to happen in practice, we have updated the FDG constructor to require that SPLIT_DEPTH + 1 >= MAX_GAME_DEPTH.
Spearbit 5 . 2 . 4 : Inconsistent _partOffset check and memory boundaries in loadLocalData function
Low
The _partOffset parameter is handled inconsistently in some places within the loadLocalData function.
Spearbit 5 . 2 . 6 : _clockExtension and _maxClockDuration are not validated correctly in DisputeGame constructor
Low
If a dispute game is initialized with a clock extension set to more than half of the max duration, move transactions during the execution trace bisection will revert since the difference between 2 * the clock extension and the game’s max duration will underflow.
Notes on Spearbit 5 . 1 . 1
By calling loadPrecompilePreimagePart with less gas than necessary, an attacker could produce an outOfGas error in the precompile. If there is enough gas left in the loadPrecompilePart function, a valid preimage could be overwritten with the outOfGas error itself. This would result in an incorrect game outcome.
The function of the loadPrecompilePreimagePart method is to allow certain expensive precompiles - namely ecrecover, ecpairing, and kzg_point_evaluation - to be accelerated. Accelerated precompiles offload their execution to an L 1 oracle. Since precompiles are implemented natively rather than via EVM opcodes, this improves Cannon’s performance and allows challengers to quickly generate traces for blocks filled with these computationally expensive calls. To address the outOfGas issue, we’re adding a minimum gas requirement to the PreimageOracle to ensure there’s enough gas to accelerate precompiles on L 1 .
Another related issue with loadPrecompilePreimagePart is that the gas required to accelerate precompiles on L 1 may be insufficient given the cost of executing them on L 2 . This is because the gas provided to the precompile accelerator contract on L 1 can never exceed 63 / 64 th of the gas limit on L 2 . This is a problem for precompiles that have a dynamic gas cost of execution. Of the accelerated precompiles, ecPairing is the only one that contains this vulnerability as its gas cost scales with its input size.
To fix this problem, we are proposing an L 2 hardfork to limit the maximum input size provided to the ecPairing precompile to 112687 bytes. This number is high enough to enable all known use cases of the ecPairing precompile, but low enough to enable the challenger to generate traces for larger blocks in a timely manner. While this is technically a divergence from the EVM, our on-chain data has found no calls to the ecPairing precompile with an input size over 1187 bytes. The provided limit is therefore 2 orders of magnitude larger than any known use case, which we believe is sufficiently safe.
Additional Fixes
In addition to the audit fixes above, we are also proposing the following additional fixes:
We propose reducing the ChannelTimeout value from 200 blocks to 50 . Canon has a limited amount of memory - approximately 1 . 1 GB - and is not currently garbage collected. The longer channel timeout caused Cannon to run out of memory on OP Sepolia, and was close to the limit on mainnet. Reducing the ChannelTimeout significantly increases the amount of memory available to Cannon and reduces the risk of an OOM occurring.
An ImmuneFi bounty hunter noticed that DelayedWETH’s recover function is not robust against transfers which need more than 2300 gas. We propose modifying DelayedWETH such that the owner can always recover funds regardless of how much gas is required.
We have updated the Guardian and DeputyGuardian roles to have the permission to set the anchor state back to a valid game. This allows the DeputyGuardian to fix the anchor state registry in the event that a game with an invalid proposal incorrectly resolves as DEFENDER_WINS. Referencing existing game ensures the Guardian or Deputy Guardian can’t set an arbitrary anchor state - it has to be one that the fault dispute game found to be valid.
All proposed contract changes can be found in the op-contracts/v 1 . 6 . 0 release 22 .
Impacted Components
This upgrade involves both L 1 smart contracts as well as the node and execution client software.
The following contracts are modified as part of this upgrade:
On-Chain MIPS VM:
MIPS.sol 8
PreimageOracle.sol 1
Dispute Game
FaultDisputeGame.sol 2
PermissionedDisputeGame.sol
DelayedWETH.sol 1
DeputyGuardianModule.sol
AnchorStateRegistry.sol 1
OP Node has been updated to process the reduced ChannelTimeout. OP Geth has been updated to limit the maximum input size to ecPairing.
Security Considerations
These changes are all in response to vulnerabilities discovered during external security audits. No vulnerabilities were found in the fallback mechanisms, which were themselves audited 2 prior to deploying Fault Proofs in June.
As per the Audit Framework 12 , the dispute game and MIPS contracts fall into the liveness/reputational risk category which do not require audits. The fallback mechanisms make any bugs simple to recover from and pose no risk to user funds. Therefore, we have opted not to pursue a fix review for the changes made in this proposal. We propose addressing any additional issues discovered in a similar manner to the way they are being addressed here, specifically:
Depending on the issues at hand, Labs recommend that the Deputy Guardian trigger the fallback or blacklist specific dispute games.
Labs or (others in the core developer community) would create a governance proposal to resolve the issues.
There’s quite a bit of nuance to when the fallbacks should be activated. In light of this audit, we propose adopting the following rubric to decide if/when the fallback should be activated. If an issue is costly to exploit - e.g., it requires playing the game to MAX_GAME_DETH - then we propose disclosing it immediately and using the dispute game blacklist to mitigate any attempts to exploit it. The dispute game blacklist will seize any bonds paid by an attacker, and makes attempting to exploit dispute resolution deeply unprofitable. On the other hand, if issues are not costly to exploit then we propose activating the fallback prior to disclosure. In both cases, fixes for the vulnerabilities would be proposed as a regular protocol upgrade in the nearest voting cycle.
Consistent with the OP Labs Audit Framework 12 , we have not had the contents of the hardfork audited. However, OP Labs did perform a security review of these changes. Risk analysis of each L 2 change is below.
Limiting the size of ecPairing’s input is considered a low-risk change. Implementation bugs would not put user assets at risk. Even though this is technically a divergence from the EVM, our data suggests that there have been no usages of the ecPairing precompile with an input size > 1152 bytes, which is far below the limit we will be imposing.
Reducing the ChannelTimeout is considered a low-risk change. Implementation bugs would not put user assets at risk.
Impact Summary
OP Labs does not anticipate any downtime due to this upgrade.
If this proposal is approved, node operators must upgrade their node software prior to September 11 th in order to avoid a chain split.
As a result of triggering the fallback, all pending withdrawals will be invalidated. Users with pending withdrawals will need to re-prove them against an output proposal submitted by the permissioned proposer. This means that withdrawals initiated less than one week before the upgrade is executed will only be finalized one week after the upgrade is complete. For example, a withdrawal initiated 6 days before the upgrade would take a total of 13 days to finalize. In addition, proposals made within a week of the permissionless game being reactivated will also be invalidated.
Users will be unable to provide more than 112687 bytes of input to the ecPairing precompile.
Proposers (other than the trusted proposer operated by OP Labs) will be unable to propose their own outputs until the fallback is deactivated following the L 1 upgrades.
All client-side tooling is unaffected.
Action Plan
If this vote passes, the Granite upgrade will be scheduled for execution on September 11 th at 16 : 00 : 01 UTC. The upgrade will occur automatically for nodes on a release which contains the baked-in activation time. Granite is code complete in the optimism monorepo at commit a 81 de 910 dc 2 fd 9 b 2 f 67 ee 946466 f 2 de 70 d 62611 a and op-geth at commit 0 f 5 b 9 dcfd 2 ac 66 f 6 fd 8 faae 526 b 1549721 f 5 f 392 . The smart contracts release is op-contracts/v 1 . 6 . 0 -rc. 1 . The op-node and op-geth releases will be finalized if this proposal passes.
This upgrade has already been activated on internal devnets and the Sepolia Superchain in coordination with Base and Conduit.
The overall upgrade plan is as follows:
Update the Absolute Pre-State: Prior to the hardfork activation, we will update the absolute pre-state as done in the Fjord upgrade. This ensures that the new op-program can be used with the upgraded protocol, and must be performed prior to hardfork activation. See the Fjord upgrade proposal 7 for more details. This upgrade is transparent to users, and no action is required.
Activate the Hardfork on L 2 : The hardfork will activate on the L 2 network at the scheduled time. Node operators must upgrade to the versions described above to avoid a chain split. Once upgraded, no further action is required.
Update the L 1 Smart Contracts: Finally, we will update the L 1 smart contracts to new versions that contain fixes for the audit issues. This upgrade is transparent to users, and no action is required. This update will also deactivate the fallback mechanism, and revert back to permissionless proposing.
The Security Council and Optimism Foundation must sign the transactions for steps 1 and 3 prior to the hardfork activation. This sequence is crucial to prevent breaking the fault proof system.
Emergency Cancellation
The releases above will contain a Granite activation at the above-mentioned time. If a critical security issue is found between approval and rollout, the Optimism Foundation and Security Council should coordinate an emergency cancellation. Node operators can quickly react by using the --override.granite flag on both op-node and op-geth.
Conclusion
This proposal outlines the Granite network upgrade, which responds to security vulnerabilities identified by third-party auditors. This upgrade brings better security and performance to the fault proof system.
Proposal Edit Changelog
8 / 21 / 2024 - In Additional Fixes - clarified the reason the Guardian and DeputyGuardian roles have extended capabilties.
8 / 21 / 2024 - Added etherscan references to the newly deployed contract implementations.
8 / 29 / 2024 - Fixed etherscan links to the deployed contract implementations
OP Labs, a software development company focused on the Optimism ecosystem, is proposing the Granite network upgrade to address security vulnerabilities identified by third-party audits. The upgrade includes smart contract modifications and an L2 hardfork to enhance the fault proof system's stability and performance. Additional changes and fixes are described, impacting various components. Node operators are required to upgrade their software to avoid a chain split. If approved, the upgrade will be activated on September 11th, 16:00:01 UTC. Mandatory actions, potential impacts, and emergency cancellation procedures are outlined in the proposal.
donnoh: inphi:
As a result of falling back to the permissioned game, we realized that switching back to the permissionless game could in theory cause the anchor state to reference an old state prior to the switch to the permissioned game. To remedy this, we have updated the Guardian and DeputyGuardian roles to have the permission to set the anchor state.
can you better elaborate when this scenario can happen? Aren’t there solutions that don’t involve permissioned roles?
Executive Summary
Hi I’m Mofi, a protocol engineer at OP Labs. OP Labs is a software development co…
Executive Summary
Hi I’m Mofi, a protocol engineer at OP Labs. OP Labs is a software development company focused on the Optimism ecosystem and a core developer of the OP Stack. We provide some services to, but do not represent or speak on behalf of, the Optimism Foundation.
This upgrade is proposed in response to security vulnerabilities identified during a series of third-party security audits by Spearbit, Cantina, and Code 4 rena. None of the vulnerabilities have been exploited, and user assets are not and were never at risk. However, out of an abundance of caution, the permissioned fallback mechanism has been activated in order to avoid any potential instability while the vulnerabilities are patched. For more information on the permissioned fallback mechanism and the OP Stack’s defense-in-depth approach to Fault Proof security, see the documentation.
The upgrade includes both a set of smart contract upgrades to fix the vulnerabilities identified in the audit as well as an L 2 hardfork to improve the stability and performance of the fault proof system. In addition, we propose extending the capabilities of the Guardian and DeputyGuardian to set the anchor state for the fault proof system in order to prevent referencing invalid anchor states. Aside from implementing these fixes, the primary impact of this upgrade would be to reset user withdrawals at the planned time, similar to the initial Fault Proof upgrade.
Motivation
As described in the original upgrade proposal, our rollout strategy for Fault Proofs focuses on securing the fundamental security mechanisms first, then building confidence on the correctness of the rest of the system over time. Successfully performing this strategy requires that:
The fallback mechanisms are activated whenever there is a risk of a security vulnerability.
Any vulnerabilities that may arise are swiftly patched.
Therefore, the Foundation (via the Deputy Guardian role) activated the permissioned fallback which restricts output proposals to a trusted proposer and we at OP Labs created this proposal to resolve the vulnerabilities identified by the security audits. Note that the Guardian role is generally authorized to enforce “Safety over Liveness” in the system, meaning that it can pause and unpause withdrawals and as part of the Fault Proofs upgrade was authorized to intervene in the event that a bug would allow an invalid L 2 output to be finalized.
Specifications
The specification for the proposed changes can be found in the specs repo. Details of the individual audit issues are enumerated in the Audit Issues section below.
Technical Details
In the section below, we will summarize each of the audit issues we are fixing as well as the smart contracts/off-chain components affected by the upgrade. The full reports for each audit can be found here:
Spearbit
Cantina
Code 4 rena
Note that this upgrade only contains the most important issues (in our opinion) from each audit. We plan on addressing other, lower-severity issues in future upgrades.
Audit Issues
The table below lists all the audit issues that are fixed in this upgrade. Issue severities have been updated to match the Optimism ImmuneFi bounty. While the auditors did discover some high severity issues, no user assets were ever at risk. All of the audit issues listed below can be detected by our monitoring tooling. Had an exploit been detected, the Deputy Guardian role - which is held by the Optimism Foundation and revocable by the Security Council - would have been expected to blacklist any exploitable dispute games or activate the permissioned fallback.
Note that we have updated the spec to clarify some assumptions around the Cannon VM and program. Specifically, we trust that the Go compiler will emit proper MIPS 32 programs. As a result, the Cantina issues that reference problems related to invalid MIPS 32 programs are considered out-of-scope and will not be fixed.
Unless otherwise noted, vulnerabilities in the dispute game all occur at MAX_GAME_DEPTH and are therefore classified as medium.
Issue ID
Severity
Summary
Cantina 3 . 1 . 1 : Allocation overflow could allow for arbitrary code execution
High
Mmap calls did not perform memory bounds checking, which allowed memory pointers to wrap around to zero and access the entire memory of the fault proof program including the data and text sections of active MIPS programs. This could lead to arbitrary code execution within the VM, effectively breaking the VM’s correctness guarantees. No PoC was given, and it is our belief that such an exploit is infeasible due to memory protections employed by the Go runtime. Nevertheless, given the potential impact of the issue we worked with the auditors to classify this as a high severity issue and deploy a fix.
C 4 H- 01 : Invalid DISPUTED_L 2 _BLOCK_NUMBER is passed to VM
High
An attacker can counter a valid output claim by providing a trace containing one block after the original claim. For example, if an output root is proposed for block 13 , the attacker could counter using a trace that includes valid blocks up to block 14 . This issue is classified as high severity since it occurs above MAX_GAME_DEPTH.
Spearbit 5 . 1 . 1 : PreimageOracle.loadPrecompilePreimagePart an outOfGas error in the precompile will overwrite correct preimageParts
Medium
See section below.
C 4 H- 02 : The LPP challenge period can cause malicious and freeloader claims to be uncounterable and can also cause freeloader claims to be abused to entrap honest challengers
Medium
The clock extension mechanism is designed to give an honest actor time to counter freeloader claims, even though in that case they will inherit the opposing chess clock which may have very limited time remaining. Since the LPP challenge period was longer than the clock extension period, the clock extension granted in that case would not be sufficient to allow the honest actor to complete a call to step. While the overall game still resolves correctly, the challenger would lose bonds posted in attempting to counter the freeloader claim.
C 4 H- 05 : An attacker can bypass the challenge period during LPP finalization
Medium
An attacker can bypass the large preimage challenge period by calling addLeavesLPP with _finalize set to false. Since the challenge period timestamp is never set, attackers can then call squeezeLPP, thereby bypassing the challenge period and inserting invalid data into the preimage oracle.
Cantina 3 . 3 . 5 : Wrong implementation of srav
Low
The srav instruction does not mask the 5 lower bits of rs, which is nonconformant with the MIPS specification. This could lead to undefined behavior.
Cantina 3 . 4 . 2 : Location of registers array in memory should be verified
Low
The on-chain MIPS VM does not verify that the registers array is allocated right after the state struct. Validating this would make the code more defensive.
Spearbit 5 . 2 . 5 : Preimage proposals can be initialized multiple times
Low
initLPP() does not check if a proposal already exists. This could lead to loss of funds if a user provides the same LPP UUID multiple times.
Spearbit 5 . 2 . 3 : Extension period not applied correctly for next root when SPLIT_DEPTH is set to 1 or less
Low
When SPLIT_DEPTH is set to 1 , the extension period for the next root is calculated as zero which results in no extension period being applied. If SPLIT_DEPTH is set to zero, subsequent game moves will result due to an integer underflow.
Spearbit 5 . 1 . 2 : Invalid Ancestor Lookup Leading to Out-of-Bounds Array Access
Low
An out of bounds array access can occur when MAX_DEPTH = SPLIT_DEPTH + 1 . While this is unlikely to happen in practice, we have updated the FDG constructor to require that SPLIT_DEPTH + 1 >= MAX_GAME_DEPTH.
Spearbit 5 . 2 . 4 : Inconsistent _partOffset check and memory boundaries in loadLocalData function
Low
The _partOffset parameter is handled inconsistently in some places within the loadLocalData function.
Spearbit 5 . 2 . 6 : _clockExtension and _maxClockDuration are not validated correctly in DisputeGame constructor
Low
If a dispute game is initialized with a clock extension set to more than half of the max duration, move transactions during the execution trace bisection will revert since the difference between 2 * the clock extension and the game’s max duration will underflow.
Notes on Spearbit 5 . 1 . 1
By calling loadPrecompilePreimagePart with less gas than necessary, an attacker could produce an outOfGas error in the precompile. If there is enough gas left in the loadPrecompilePart function, a valid preimage could be overwritten with the outOfGas error itself. This would result in an incorrect game outcome.
The function of the loadPrecompilePreimagePart method is to allow certain expensive precompiles - namely ecrecover, ecpairing, and kzg_point_evaluation - to be accelerated. Accelerated precompiles offload their execution to an L 1 oracle. Since precompiles are implemented natively rather than via EVM opcodes, this improves Cannon’s performance and allows challengers to quickly generate traces for blocks filled with these computationally expensive calls. To address the outOfGas issue, we’re adding a minimum gas requirement to the PreimageOracle to ensure there’s enough gas to accelerate precompiles on L 1 .
Another related issue with loadPrecompilePreimagePart is that the gas required to accelerate precompiles on L 1 may be insufficient given the cost of executing them on L 2 . This is because the gas provided to the precompile accelerator contract on L 1 can never exceed 63 / 64 th of the gas limit on L 2 . This is a problem for precompiles that have a dynamic gas cost of execution. Of the accelerated precompiles, ecPairing is the only one that contains this vulnerability as its gas cost scales with its input size.
To fix this problem, we are proposing an L 2 hardfork to limit the maximum input size provided to the ecPairing precompile to 112687 bytes. This number is high enough to enable all known use cases of the ecPairing precompile, but low enough to enable the challenger to generate traces for larger blocks in a timely manner. While this is technically a divergence from the EVM, our on-chain data has found no calls to the ecPairing precompile with an input size over 1187 bytes. The provided limit is therefore 2 orders of magnitude larger than any known use case, which we believe is sufficiently safe.
Additional Fixes
In addition to the audit fixes above, we are also proposing the following additional fixes:
We propose reducing the ChannelTimeout value from 200 blocks to 50 . Canon has a limited amount of memory - approximately 1 . 1 GB - and is not currently garbage collected. The longer channel timeout caused Cannon to run out of memory on OP Sepolia, and was close to the limit on mainnet. Reducing the ChannelTimeout significantly increases the amount of memory available to Cannon and reduces the risk of an OOM occurring.
An ImmuneFi bounty hunter noticed that DelayedWETH’s recover function is not robust against transfers which need more than 2300 gas. We propose modifying DelayedWETH such that the owner can always recover funds regardless of how much gas is required.
We have updated the Guardian and DeputyGuardian roles to have the permission to set the anchor state back to a valid game. This allows the DeputyGuardian to fix the anchor state registry in the event that a game with an invalid proposal incorrectly resolves as DEFENDER_WINS. Referencing existing game ensures the Guardian or Deputy Guardian can’t set an arbitrary anchor state - it has to be one that the fault dispute game found to be valid.
All proposed contract changes can be found in the op-contracts/v 1 . 6 . 0 release.
Impacted Components
This upgrade involves both L 1 smart contracts as well as the node and execution client software.
The following contracts are modified as part of this upgrade:
On-Chain MIPS VM:
MIPS.sol
PreimageOracle.sol
Dispute Game
FaultDisputeGame.sol
PermissionedDisputeGame.sol
DelayedWETH.sol
DeputyGuardianModule.sol
AnchorStateRegistry.sol
OP Node has been updated to process the reduced ChannelTimeout. OP Geth has been updated to limit the maximum input size to ecPairing.
Security Considerations
These changes are all in response to vulnerabilities discovered during external security audits. No vulnerabilities were found in the fallback mechanisms, which were themselves audited prior to deploying Fault Proofs in June.
As per the Audit Framework, the dispute game and MIPS contracts fall into the liveness/reputational risk category which do not require audits. The fallback mechanisms make any bugs simple to recover from and pose no risk to user funds. Therefore, we have opted not to pursue a fix review for the changes made in this proposal. We propose addressing any additional issues discovered in a similar manner to the way they are being addressed here, specifically:
Depending on the issues at hand, Labs recommend that the Deputy Guardian trigger the fallback or blacklist specific dispute games.
Labs or (others in the core developer community) would create a governance proposal to resolve the issues.
There’s quite a bit of nuance to when the fallbacks should be activated. In light of this audit, we propose adopting the following rubric to decide if/when the fallback should be activated. If an issue is costly to exploit - e.g., it requires playing the game to MAX_GAME_DETH - then we propose disclosing it immediately and using the dispute game blacklist to mitigate any attempts to exploit it. The dispute game blacklist will seize any bonds paid by an attacker, and makes attempting to exploit dispute resolution deeply unprofitable. On the other hand, if issues are not costly to exploit then we propose activating the fallback prior to disclosure. In both cases, fixes for the vulnerabilities would be proposed as a regular protocol upgrade in the nearest voting cycle.
Consistent with the OP Labs Audit Framework, we have not had the contents of the hardfork audited. However, OP Labs did perform a security review of these changes. Risk analysis of each L 2 change is below.
Limiting the size of ecPairing’s input is considered a low-risk change. Implementation bugs would not put user assets at risk. Even though this is technically a divergence from the EVM, our data suggests that there have been no usages of the ecPairing precompile with an input size > 1152 bytes, which is far below the limit we will be imposing.
Reducing the ChannelTimeout is considered a low-risk change. Implementation bugs would not put user assets at risk.
Impact Summary
OP Labs does not anticipate any downtime due to this upgrade.
If this proposal is approved, node operators must upgrade their node software prior to September 11 th in order to avoid a chain split.
As a result of triggering the fallback, all pending withdrawals will be invalidated. Users with pending withdrawals will need to re-prove them against an output proposal submitted by the permissioned proposer. This means that withdrawals initiated less than one week before the upgrade is executed will only be finalized one week after the upgrade is complete. For example, a withdrawal initiated 6 days before the upgrade would take a total of 13 days to finalize. In addition, proposals made within a week of the permissionless game being reactivated will also be invalidated.
Users will be unable to provide more than 112687 bytes of input to the ecPairing precompile.
Proposers (other than the trusted proposer operated by OP Labs) will be unable to propose their own outputs until the fallback is deactivated following the L 1 upgrades.
All client-side tooling is unaffected.
Action Plan
If this vote passes, the Granite upgrade will be scheduled for execution on September 11 th at 16 : 00 : 01 UTC. The upgrade will occur automatically for nodes on a release which contains the baked-in activation time. Granite is code complete in the optimism monorepo at commit a 81 de 910 dc 2 fd 9 b 2 f 67 ee 946466 f 2 de 70 d 62611 a and op-geth at commit 0 f 5 b 9 dcfd 2 ac 66 f 6 fd 8 faae 526 b 1549721 f 5 f 392 . The smart contracts release is op-contracts/v 1 . 6 . 0 -rc. 1 . The op-node and op-geth releases will be finalized if this proposal passes.
This upgrade has already been activated on internal devnets and the Sepolia Superchain in coordination with Base and Conduit.
The overall upgrade plan is as follows:
Update the Absolute Pre-State: Prior to the hardfork activation, we will update the absolute pre-state as done in the Fjord upgrade. This ensures that the new op-program can be used with the upgraded protocol, and must be performed prior to hardfork activation. See the Fjord upgrade proposal for more details. This upgrade is transparent to users, and no action is required.
Activate the Hardfork on L 2 : The hardfork will activate on the L 2 network at the scheduled time. Node operators must upgrade to the versions described above to avoid a chain split. Once upgraded, no further action is required.
Update the L 1 Smart Contracts: Finally, we will update the L 1 smart contracts to new versions that contain fixes for the audit issues. This upgrade is transparent to users, and no action is required. This update will also deactivate the fallback mechanism, and revert back to permissionless proposing.
The Security Council and Optimism Foundation must sign the transactions for steps 1 and 3 prior to the hardfork activation. This sequence is crucial to prevent breaking the fault proof system.
Emergency Cancellation
The releases above will contain a Granite activation at the above-mentioned time. If a critical security issue is found between approval and rollout, the Optimism Foundation and Security Council should coordinate an emergency cancellation. Node operators can quickly react by using the --override.granite flag on both op-node and op-geth.
Conclusion
This proposal outlines the Granite network upgrade, which responds to security vulnerabilities identified by third-party auditors. This upgrade brings better security and performance to the fault proof system.
Proposal Edit Changelog
8 / 21 / 2024 - In Additional Fixes - clarified the reason the Guardian and DeputyGuardian roles have extended capabilties.
8 / 21 / 2024 - Added etherscan references to the newly deployed contract implementations.
8 / 29 / 2024 - Fixed etherscan links to the deployed contract implementations
donnoh: inphi:
As a result of falling back to the permissioned game, we realized that switching back to the permissionless game could in theory cause the anchor state to reference an old state prior to the switch to the permissioned game. To remedy this, we have updated the Guardian and DeputyGuardian roles to have the permission to set the anchor state.
can you better elaborate when this scenario can happen? Aren’t there solutions that don’t involve permissioned roles?
I am an Optimism delegate with sufficient voting power and I believe this proposal is ready to move…
I am an Optimism delegate with sufficient voting power and I believe this proposal is ready to move to a vote.
This seems like a highly technical upgrade so though the post makes sense I can’t say I myself can …
This seems like a highly technical upgrade so though the post makes sense I can’t say I myself can say it’s all fine.
I will trust the auditors and developers here and just give the okay as a delegate for this to go to a vote.
I am an optimism delegate 4 with sufficient voting power and I believe the proposal is ready to move to a vote.
This seems like a highly technical upgrade so though the post makes sense I can’t say I myself can …
This seems like a highly technical upgrade so though the post makes sense I can’t say I myself can say it’s all fine.
I will trust the auditors and developers here and just give the okay as a delegate for this to go to a vote.
I am an optimism delegate with sufficient voting power and I believe the proposal is ready to move to a vote.
Seems a reasonable upgrade to address the vulnerabilities of the security audits and prioritize use…
Seems a reasonable upgrade to address the vulnerabilities of the security audits and prioritize user safety/reinforce the fault proof system.
I am an Optimism delegate with sufficient voting power and I believe this proposal is ready to move to a vote.
Seems a reasonable upgrade to address the vulnerabilities of the security audits and prioritize use…
Seems a reasonable upgrade to address the vulnerabilities of the security audits and prioritize user safety/reinforce the fault proof system.
I am an Optimism delegate 2 with sufficient voting power and I believe this proposal is ready to move to a vote.
Can someone explain the respectedGameType ( 0 → 1 ) change?
Ethereum (ETH) Blockchain Explorer…
Can someone explain the respectedGameType ( 0 → 1 ) change?
Ethereum (ETH) Blockchain Explorer
Ethereum Transaction Hash (Txhash) Details | Etherscan
Ethereum (ETH) detailed transaction info for txhash 0 x 493 e 2 f 3354 e 8 c 6 c 46 fb 37925 a 13 c 02364 c 1 f 3 b 38 f 88548 b 9 bb 4673 e 3 fc 762 e 69 . The transaction status, block confirmation, gas fee, Ether (ETH), and token transfer are shown.
eg. why hasn’t a PermissionedDisputeGame been posted yet? The current AnchorStateRegistry is 4 m blocks old.
ajsutton: There have been a number of proposals already made with the permissioned game type (about one an hour as was done with the permissionless games). The AnchorStateRegistry is only updated once the dispute period for games has elapsed and the game resolves as Defender Wins. It’s then used as the starting point for new games after that. Having an old anchor state just means there are more blocks that could be disputed in the top half of the dispute game which narrows down to find the first disputed block.
Can someone explain the respectedGameType ( 0 → 1 ) change?
Ethereum (ETH) Blockchain Explorer…
Can someone explain the respectedGameType ( 0 → 1 ) change?
Ethereum (ETH) Blockchain Explorer
Ethereum Transaction Hash (Txhash) Details | Etherscan 7
Ethereum (ETH) detailed transaction info for txhash 0 x 493 e 2 f 3354 e 8 c 6 c 46 fb 37925 a 13 c 02364 c 1 f 3 b 38 f 88548 b 9 bb 4673 e 3 fc 762 e 69 . The transaction status, block confirmation, gas fee, Ether (ETH), and token transfer are shown.
eg. why hasn’t a PermissionedDisputeGame been posted yet? The current AnchorStateRegistry is 4 m blocks old.
ajsutton: There have been a number of proposals already made with the permissioned game type (about one an hour as was done with the permissionless games). The AnchorStateRegistry is only updated once the dispute period for games has elapsed and the game resolves as Defender Wins. It’s then used as the starting point for new games after that. Having an old anchor state just means there are more blocks that could be disputed in the top half of the dispute game which narrows down to find the first disputed block.
Emergency upgrade may affect protocols that rely on dispute game such as ENS gateway. Need to test …
Emergency upgrade may affect protocols that rely on dispute game such as ENS gateway. Need to test if this such case is handled.
clowes.eth: chom:
Emergency upgrade may affect protocols that rely on dispute game such as ENS gateway. Need to test if this such case is handled.
This is the case.
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Is there documentation for this?
—-
Whilst I appreciate that this is an ‘emergency’ upgrade and a good opportunity to prepare for the future, tooling utilising the implementation as is(was) is prone to breaking - it would be great to have additional documentation and clarity on what can change and under what circumstances.
inphi: Although the fallback was activated, strictly speaking, an emergency upgrade has not been performed. The contracts deployed on op-mainnet have not been changed from those approved in Protocol Upgrade #7: Fault Proofs. The ability for the Guardian or Deputy Guardian to switch to the permissioned fallback was built into that proposal as part of a staged, responsible rollout of fault proofs.
That said, other applications that are using the OptimismPortal.respectedGameType will be affected by the change and need to wait for new dispute games to resolve. The exact impact will depend on the details of the application and how they’re using dispute game results.
There have been a number of proposals already made with the permissioned game type (about one an ho…
There have been a number of proposals already made with the permissioned game type (about one an hour as was done with the permissionless games). The AnchorStateRegistry is only updated once the dispute period for games has elapsed and the game resolves as Defender Wins. It’s then used as the starting point for new games after that. Having an old anchor state just means there are more blocks that could be disputed in the top half of the dispute game which narrows down to find the first disputed block.
Thanks for the response. Let me rephrase: once the gameType switch was made, why wasn’t the new re…
Thanks for the response. Let me rephrase: once the gameType switch was made, why wasn’t the new respectedGameType’s anchor root set to the last finalized game?
Similar to the new setAnchorState(), there could be copyAnchorState(GameType, GameType).
There’s no need to adjust the anchor state. It doesn’t affect withdrawals at all and will just natu…
There’s no need to adjust the anchor state. It doesn’t affect withdrawals at all and will just naturally be updated when the next game resolves.
raffy.eth: Isn’t it the latest on-chain finalized state?
raffy.eth: ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Where can I query the current finalized root on mainnet?
Game 1633 (first gameType 1) is starting from the current gameType 1 anchor state 0x2694ac14dcf54b7a77363e3f60e6462dc78da0d43d1e2f058dbb6a1488814977 @ block 120059863 (95 days ago)
ajsutton:
It doesn’t affect withdrawals at all
proveWithdrawalTransaction() apparently doesn’t require finalization… shouldn’t that be != DEFENDER_WINS ? This was clarified on Discord (there is another finalization step. No withdrawals have been processed post gameType change.)
No, the anchor state is just the starting point for new dispute games of that game type.
No, the anchor state is just the starting point for new dispute games of that game type.
clowes.eth: chom:
Emergency upgrade may affect protocols that rely on dispute game such as ENS gateway. Need to test if this such case is handled.
This is the case.
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Is there documentation for this?
—-
Whilst I appreciate that this is an ‘emergency’ upgrade and a good opportunity to prepare for the future, tooling utilising the implementation as is(was) is prone to breaking - it would be great to have additional documentation and clarity on what can change and under what circumstances.
raffy.eth: ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Where can I query the current finalized root on mainnet?
Game 1633 (first gameType 1) is starting from the current gameType 1 anchor state 0x2694ac14dcf54b7a77363e3f60e6462dc78da0d43d1e2f058dbb6a1488814977 @ block 120059863 (95 days ago)
ajsutton:
It doesn’t affect withdrawals at all
proveWithdrawalTransaction() apparently doesn’t require finalization… shouldn’t that be != DEFENDER_WINS ? This was clarified on Discord (there is another finalization step. No withdrawals have been processed post gameType change.)
inphi:
As a result of falling back to the permissioned game, we realized that switching back t…
inphi:
As a result of falling back to the permissioned game, we realized that switching back to the permissionless game could in theory cause the anchor state to reference an old state prior to the switch to the permissioned game. To remedy this, we have updated the Guardian and DeputyGuardian roles to have the permission to set the anchor state.
can you better elaborate when this scenario can happen? Aren’t there solutions that don’t involve permissioned roles?
inphi: Apologies, the description of this change is inaccurate - I’ll get the post updated to fix that. There is a new setAnchorState method on the AnchorStateRegistry which can only be called by the guardian or deputy guardian, but it accepts as input a reference to an existing dispute game which resolved as DEFENDER_WINS. This provides an ability to reset the anchor state back to a valid game in the event that a game with an invalid proposal incorrectly resolves as DEFENDER_WINS and updates the anchor state registry with that invalid proposal. Previously this would need to be done by upgrading the FaultDisputeGame contract used for a game type to one that uses a new AnchorStateRegistry. Referencing an existing game ensures the Guardian or Deputy Guardian can’t set an arbitrary anchor state - it has to be one that the fault dispute game found to be valid.
The situation of having old anchor state when we switch back to the permissioned game can be solved without needing a permissioned role by creating a permissionless game periodically. While the portal won’t respect that game while in the fallback state, it will still update the anchor state if it resolves as DEFENDER_WINS, which will subsequently be respected by the fault proof game.
chom:
Emergency upgrade may affect protocols that rely on dispute game such as ENS gateway. Ne…
chom:
Emergency upgrade may affect protocols that rely on dispute game such as ENS gateway. Need to test if this such case is handled.
This is the case.
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Is there documentation for this?
—-
Whilst I appreciate that this is an ‘emergency’ upgrade and a good opportunity to prepare for the future, tooling utilising the implementation as is(was) is prone to breaking - it would be great to have additional documentation and clarity on what can change and under what circumstances.
inphi: Although the fallback was activated, strictly speaking, an emergency upgrade has not been performed. The contracts deployed on op-mainnet have not been changed from those approved in Protocol Upgrade #7: Fault Proofs. The ability for the Guardian or Deputy Guardian to switch to the permissioned fallback was built into that proposal as part of a staged, responsible rollout of fault proofs.
The safe guards, including the ability to change the respected game type are documented in the original fault proofs governance proposal, the optimism docs for fault proofs and the OP Stack specification.
If you need help with understanding how to work with these safe guards the OP Stack developer forum is a good place to ask questions. We welcome feedback to help us improve, especially if there are areas where we can increase clarity!
I am an Optimism delegate with sufficient voting power and I believe this proposal is ready to mov…
I am an Optimism delegate with sufficient voting power and I believe this proposal is ready to move to a vote.
From my understanding, this hardfork upgrade has not completed an audit so far.
Such an upgrade sho…
From my understanding, this hardfork upgrade has not completed an audit so far.
Such an upgrade shouldnt pass without an audit, as it is already a fix of existing bugs.
If there are further bugs introduced in this upgrade, it will result in irreparable harm to the chain which will require further fixes.
inphi: The staged rollout of fault proofs is designed to ensure that the chain and user assets remain secure even if there are issues within the fault proof system, and we believe that passing this upgrade is appropriate and consistent with our previously published audit framework. This allows to continue to improve the system safely and build confidence in it over time. The safeguards were audited prior to the initial launch of fault proofs and none of the findings from these three audits allowed those safeguards to be circumvented. Notably, this proposal fixes three issues which were not identified by these audits - despite the very high quality of the auditors. If you have any specific concerns with the changes outlined in the proposal, always happy to discuss further! The audit framework gives some more detail on how OP Labs thinks about audits.
could you also elaborate on Cantina 3 . 3 . 5 , i.e. the incorrect implementation of the srav opco…
could you also elaborate on Cantina 3 . 3 . 5 , i.e. the incorrect implementation of the srav opcode? why is it considered low severity?
secondly, to me it’s a bit concerning that the fault proof system is outside the scope of external audits. I understand that in Stage 1 , given the fallback, the blacklisting mechanism, the pause button and the Security Council, a bug is unlikely to affect the safety of the system. Citing Vitalik 1 , the amazing property of Stage 1 rollups is being able to be safe AND live assuming < 25 % of honest members in the SC, also assuming small probability of bugs. If this probability is not small, and the system is often in the fallback state, the system mostly relies on a > 75 % honesty, defeating the purpose of Stage 1 .
inphi: could you also elaborate on Cantina 3.3.5, i.e. the incorrect implementation of the srav opcode? why is it considered low severity?
The srav opcode deviating from the spec is only considered low priority because the actual MIPS instructions generated by the go compiler when compiling op-program are not affected by the difference in behavior. So it doesn’t currently have an impact on the fault proof system but may in the future due to changes in the go compiler or op-program. Additional information about this class of bugs can be found on this section of the Cannon FPVM spec
secondly, to me it’s a bit concerning that the fault proof system is outside the scope of external audits. I understand that in Stage 1, given the fallback, the blacklisting mechanism, the pause button and the Security Council, a bug is unlikely to affect the safety of the system. Citing Vitalik, the amazing property of Stage 1 rollups is being able to be safe AND live assuming <25% of honest members in the SC, also assuming small probability of bugs. If this probability is not small, and the system is often in the fallback state, the system mostly relies on a >75% honesty, defeating the purpose of Stage 1.
The audit framework gives some more detail on the reasoning behind the choice to leave the fault proof system outside of the initial scope. We feel that one of the best ways to gain confidence in the fault proof system is to operate it in the real world. This proposal is a good example of that - three audits have been completed and while they did find a number of issues, there were also issues they did not find that were identified either through the bug bounty program or from experience actually working with the fault proof system.
You are absolutely right that reducing the probability of bugs in the fault proof–as this upgrade itself seeks to accomplish–is important, and we felt that the path we took would ultimately get us to a bug-minimized system the fastest. At the end of the day, the community should be empowered to hold us accountable to how upgrades happen. While we (and even those who voted no on the original upgrade) continue to believe that the risks posed by this approach are not about the fundamental security of the system, there should continue to be space for the community to push back if they feel that other (i.e. reputational) risks are too high going forward.
could you also elaborate on Cantina 3 . 3 . 5 , i.e. the incorrect implementation of the srav opco…
could you also elaborate on Cantina 3 . 3 . 5 , i.e. the incorrect implementation of the srav opcode? why is it considered low severity?
secondly, to me it’s a bit concerning that the fault proof system is outside the scope of external audits. I understand that in Stage 1 , given the fallback, the blacklisting mechanism, the pause button and the Security Council, a bug is unlikely to affect the safety of the system. Citing Vitalik, the amazing property of Stage 1 rollups is being able to be safe AND live assuming < 25 % of honest members in the SC, also assuming small probability of bugs. If this probability is not small, and the system is often in the fallback state, the system mostly relies on a > 75 % honesty, defeating the purpose of Stage 1 .
inphi: could you also elaborate on Cantina 3.3.5, i.e. the incorrect implementation of the srav opcode? why is it considered low severity?
The srav opcode deviating from the spec is only considered low priority because the actual MIPS instructions generated by the go compiler when compiling op-program are not affected by the difference in behavior. So it doesn’t currently have an impact on the fault proof system but may in the future due to changes in the go compiler or op-program. Additional information about this class of bugs can be found on this section of the Cannon FPVM spec
secondly, to me it’s a bit concerning that the fault proof system is outside the scope of external audits. I understand that in Stage 1, given the fallback, the blacklisting mechanism, the pause button and the Security Council, a bug is unlikely to affect the safety of the system. Citing Vitalik, the amazing property of Stage 1 rollups is being able to be safe AND live assuming <25% of honest members in the SC, also assuming small probability of bugs. If this probability is not small, and the system is often in the fallback state, the system mostly relies on a >75% honesty, defeating the purpose of Stage 1.
The audit framework gives some more detail on the reasoning behind the choice to leave the fault proof system outside of the initial scope. We feel that one of the best ways to gain confidence in the fault proof system is to operate it in the real world. This proposal is a good example of that - three audits have been completed and while they did find a number of issues, there were also issues they did not find that were identified either through the bug bounty program or from experience actually working with the fault proof system.
You are absolutely right that reducing the probability of bugs in the fault proof–as this upgrade itself seeks to accomplish–is important, and we felt that the path we took would ultimately get us to a bug-minimized system the fastest. At the end of the day, the community should be empowered to hold us accountable to how upgrades happen. While we (and even those who voted no on the original upgrade) continue to believe that the risks posed by this approach are not about the fundamental security of the system, there should continue to be space for the community to push back if they feel that other (i.e. reputational) risks are too high going forward.
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game t…
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Where can I query the current finalized root on mainnet?
Game 1633 2 (first gameType 1 ) is starting from the current gameType 1 anchor state 4 0 x 2694 ac 14 dcf 54 b 7 a 77363 e 3 f 60 e 6462 dc 78 da 0 d 43 d 1 e 2 f 058 dbb 6 a 1488814977 @ block 120059863 ( 95 days ago)
ajsutton:
It doesn’t affect withdrawals at all
proveWithdrawalTransaction() apparently doesn’t require finalization… shouldn’t that be != DEFENDER_WINS ? This was clarified on Discord (there is another finalization step. No withdrawals have been processed post gameType change.)
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game t…
ajsutton:
No, the anchor state is just the starting point for new dispute games of that game type.
Where can I query the current finalized root on mainnet?
Game 1633 (first gameType 1 ) is starting from the current gameType 1 anchor state 0 x 2694 ac 14 dcf 54 b 7 a 77363 e 3 f 60 e 6462 dc 78 da 0 d 43 d 1 e 2 f 058 dbb 6 a 1488814977 @ block 120059863 ( 95 days ago)
ajsutton:
It doesn’t affect withdrawals at all
proveWithdrawalTransaction() apparently doesn’t require finalization… shouldn’t that be != DEFENDER_WINS ? This was clarified on Discord (there is another finalization step. No withdrawals have been processed post gameType change.)
The SEED Latam delegation, as we have communicated here 1 , with @Joxes being an Optimism delegate…
The SEED Latam delegation, as we have communicated here 1 , with @Joxes being an Optimism delegate 2 with sufficient voting power we believe this proposal is ready to move towards a vote.
The SEED Latam delegation, as we have communicated here, with @Joxes being an Optimism delegate wit…
The SEED Latam delegation, as we have communicated here, with @Joxes being an Optimism delegate with sufficient voting power we believe this proposal is ready to move towards a vote.
I am an Optimism Delegate with sufficient voting power and I believe this proposal is ready to move…
I am an Optimism Delegate with sufficient voting power and I believe this proposal is ready to move to a vote.
Here is a non-technical summary of Granite upgrade proposal on behalf of the Developer Advisory Boa…
Here is a non-technical summary of Granite upgrade proposal on behalf of the Developer Advisory Board:
Major changes:
In response to security audits on the Fault Proof system, this upgrade aims to make three major changes:
Fix the individual vulnerabilities defined in the audits.
There were 3 audits performed. Here are the reports: Spearbit, Cantina and Code 4 rena.
This upgrade fixes important issues identified in these audits. Lower-severity issues will be planned in future upgrades.
DAB and the security audit firms haven’t reviewed fixes but they have been reviewed by OP Labs team and remain behind all the audited safeguards.
A non-technical summary for the audit issues is added to the appendix.
Make other changes to the smart contract system to improve robustness
In response to these audits, the system was put into “fallback” mode, where only OP Labs trusted proposer can propose state. After this upgrade is complete, the system will be put back into permissionless mode.
Make DelayedWETH robust to ETH transfers: DelayedWETH contract holds the bonded ETH for each fault dispute game. ETH transfers usually takes 2300 gas. However, the receiver can execute some code on receiving ETH increasing the gas consumption. DelayedWETH is not robust against such transfers. The proposed fix is to remove the requirement on gas.
Grant privileged actors the power to set anchor state: As a result of switching back to the permissionless fault dispute game, anchor state (proved state) from the time before the fallback mechanism was activated can be referred again in fault dispute game. To prevent this, Guardian and DeputyGuardian roles are being given the permission to set this anchor state themselves.
There will be a hardfork to the L 2 node software that makes two changes to improve the stability and performance of fault proof system
Reduce memory load to run off-chain node software: The off-chain software uses “channels” to pass data between processes. Any data received from channels after a certain amount of time (called ChannelTimeout) is considered invalid. This upgrade reduces that time to reduce the load on memory.
Limit the maximum input size to a precompile: Whenever a smart contract A calls another smart contract B, the gas it can forward to the call is limited ( 63 / 64 th of A’s gas budget).
This can cause issues in the Fault Dispute Game, because there are certain precompiles where we need to call them on L 1 to prove the result from L 2 . If they used enough gas on L 2 , it may not be possible for the 63 / 64 ths to be sufficient on L 1 , so it will always fail.
To solve this, a change has been made to op-geth to limit the maximize amount of gas that can be used for these precompiles on L 2 .
Onchain data shows that the new limit is 2 x higher than the largest call that has ever occurred, so there should be no user impact.
User Impact
This upgrade resets user withdrawals. Any proof submissions within the last 7 days from the upgrade will be invalid and they will be have to resubmit the proofs.
Appendix
Cantina 3 . 1 . 1 : During fault proof execution, there are certain assumptions taken by the program. This bug, if realized, can invalidate those assumption leading to wrong verification of fault proof. Although, no proof of concept was given to show an exploit, this bug fix is prioritized as a cautionary measure.
C 4 H- 01 : A claim proposed for block n can be countered with using information from block n+ 1 . This is incorrect behavior as information only up to block n should be relevant in this scenario.
Spearbit 5 . 1 . 1 : Whenever a smart contract A calls another smart contract B, the gas it can forward to the call is limited ( 63 / 64 th of A’s gas budget). If the call to B reverts, A can continue executing the code that comes after the call. until it itself runs out of gas or executes its own code or reverts. The audit discovered that when PreimageOracle.loadPrecompilePreimagePart (Smart contract A) calls precompiles (B), the call can revert if enough gas isn’t passed to the call. On revert, the error message is used by the function instead of the value that would have been returned had the call succeeded.
Only one precompile is vulnerable to this attack. The gas needed for precompile execution depends on the size of the input passed to it. The proposed fix is to put a upper limit on this input size. From onchain data, it has been found that this limit is twice the size of the maximum data size that has been passed to it ever, hence there is no user impact.
C 4 H- 02 : In a particular section of the fault dispute game (requesting preimage of hash values), there is mismatch between relative values of time an honest actor and a malicious actor gets. This fix adjusts these values so that an honest actor gets more time to post the response.
C 4 H- 05 : Invalid data can be used in fault dispute game by calling a function with passing false to a function call. This leads to the challenge period never kicking in.
Rest are low-severity issues.
Here is a non-technical summary of Granite upgrade proposal on behalf of the Developer Advisory Boa…
Here is a non-technical summary of Granite upgrade proposal on behalf of the Developer Advisory Board:
Major changes:
In response to security audits on the Fault Proof system, this upgrade aims to make three major changes:
Fix the individual vulnerabilities defined in the audits.
There were 3 audits performed. Here are the reports: Spearbit 2 , Cantina and Code 4 rena.
This upgrade fixes important issues identified in these audits. Lower-severity issues will be planned in future upgrades.
DAB and the security audit firms haven’t reviewed fixes but they have been reviewed by OP Labs team and remain behind all the audited safeguards.
A non-technical summary for the audit issues is added to the appendix.
Make other changes to the smart contract system to improve robustness
In response to these audits, the system was put into “fallback” mode, where only OP Labs trusted proposer can propose state. After this upgrade is complete, the system will be put back into permissionless mode.
Make DelayedWETH robust to ETH transfers: DelayedWETH contract holds the bonded ETH for each fault dispute game. ETH transfers usually takes 2300 gas. However, the receiver can execute some code on receiving ETH increasing the gas consumption. DelayedWETH is not robust against such transfers. The proposed fix is to remove the requirement on gas.
Grant privileged actors the power to set anchor state: As a result of switching back to the permissionless fault dispute game, anchor state (proved state) from the time before the fallback mechanism was activated can be referred again in fault dispute game. To prevent this, Guardian and DeputyGuardian roles are being given the permission to set this anchor state themselves.
There will be a hardfork to the L 2 node software that makes two changes to improve the stability and performance of fault proof system
Reduce memory load to run off-chain node software: The off-chain software uses “channels” to pass data between processes. Any data received from channels after a certain amount of time (called ChannelTimeout) is considered invalid. This upgrade reduces that time to reduce the load on memory.
Limit the maximum input size to a precompile: Whenever a smart contract A calls another smart contract B, the gas it can forward to the call is limited ( 63 / 64 th of A’s gas budget).
This can cause issues in the Fault Dispute Game, because there are certain precompiles where we need to call them on L 1 to prove the result from L 2 . If they used enough gas on L 2 , it may not be possible for the 63 / 64 ths to be sufficient on L 1 , so it will always fail.
To solve this, a change has been made to op-geth to limit the maximize amount of gas that can be used for these precompiles on L 2 .
Onchain data shows that the new limit is 2 x higher than the largest call that has ever occurred, so there should be no user impact.
User Impact
This upgrade resets user withdrawals. Any proof submissions within the last 7 days from the upgrade will be invalid and they will be have to resubmit the proofs.
Appendix
Cantina 3 . 1 . 1 : During fault proof execution, there are certain assumptions taken by the program. This bug, if realized, can invalidate those assumption leading to wrong verification of fault proof. Although, no proof of concept was given to show an exploit, this bug fix is prioritized as a cautionary measure.
C 4 H- 01 : A claim proposed for block n can be countered with using information from block n+ 1 . This is incorrect behavior as information only up to block n should be relevant in this scenario.
Spearbit 5 . 1 . 1 : Whenever a smart contract A calls another smart contract B, the gas it can forward to the call is limited ( 63 / 64 th of A’s gas budget). If the call to B reverts, A can continue executing the code that comes after the call. until it itself runs out of gas or executes its own code or reverts. The audit discovered that when PreimageOracle.loadPrecompilePreimagePart (Smart contract A) calls precompiles (B), the call can revert if enough gas isn’t passed to the call. On revert, the error message is used by the function instead of the value that would have been returned had the call succeeded.
Only one precompile is vulnerable to this attack. The gas needed for precompile execution depends on the size of the input passed to it. The proposed fix is to put a upper limit on this input size. From onchain data, it has been found that this limit is twice the size of the maximum data size that has been passed to it ever, hence there is no user impact.
C 4 H- 02 : In a particular section of the fault dispute game (requesting preimage of hash values), there is mismatch between relative values of time an honest actor and a malicious actor gets. This fix adjusts these values so that an honest actor gets more time to post the response.
C 4 H- 05 : Invalid data can be used in fault dispute game by calling a function with passing false to a function call. This leads to the challenge period never kicking in.
Rest are low-severity issues.
On behalf of the Developer Advisory Board, we approve this upgrade to move to a vote.
On behalf of the Developer Advisory Board, we approve this upgrade to move to a vote.
Although the fallback was activated, strictly speaking, an emergency upgrade has not been performed…
Although the fallback was activated, strictly speaking, an emergency upgrade has not been performed. The contracts deployed on op-mainnet have not been changed from those approved in Protocol Upgrade # 7 : Fault Proofs. The ability for the Guardian or Deputy Guardian to switch to the permissioned fallback was built into that proposal as part of a staged, responsible rollout of fault proofs.
That said, other applications that are using the OptimismPortal.respectedGameType will be affected by the change and need to wait for new dispute games to resolve. The exact impact will depend on the details of the application and how they’re using dispute game results.
Apologies, the description of this change is inaccurate - I’ll get the post updated to fix that. Th…
Apologies, the description of this change is inaccurate - I’ll get the post updated to fix that. There is a new setAnchorState method on the AnchorStateRegistry which can only be called by the guardian or deputy guardian, but it accepts as input a reference to an existing dispute game which resolved as DEFENDER_WINS. This provides an ability to reset the anchor state back to a valid game in the event that a game with an invalid proposal incorrectly resolves as DEFENDER_WINS and updates the anchor state registry with that invalid proposal. Previously this would need to be done by upgrading the FaultDisputeGame contract used for a game type to one that uses a new AnchorStateRegistry. Referencing an existing game ensures the Guardian or Deputy Guardian can’t set an arbitrary anchor state - it has to be one that the fault dispute game found to be valid.
The situation of having old anchor state when we switch back to the permissioned game can be solved without needing a permissioned role by creating a permissionless game periodically. While the portal won’t respect that game while in the fallback state, it will still update the anchor state if it resolves as DEFENDER_WINS, which will subsequently be respected by the fault proof game.
Although the fallback was activated, strictly speaking, an emergency upgrade has not been performed…
Although the fallback was activated, strictly speaking, an emergency upgrade has not been performed. The contracts deployed on op-mainnet have not been changed from those approved in Protocol Upgrade # 7 : Fault Proofs 1 . The ability for the Guardian or Deputy Guardian to switch to the permissioned fallback was built into that proposal as part of a staged, responsible rollout of fault proofs.
The safe guards, including the ability to change the respected game type are documented in the original fault proofs governance proposal 1 , the optimism docs for fault proofs and the OP Stack specification.
If you need help with understanding how to work with these safe guards the OP Stack developer forum is a good place to ask questions. We welcome feedback to help us improve, especially if there are areas where we can increase clarity!
The staged rollout of fault proofs is designed to ensure that the chain and user assets remain secu…
The staged rollout of fault proofs is designed to ensure that the chain and user assets remain secure even if there are issues within the fault proof system, and we believe that passing this upgrade is appropriate and consistent with our previously published audit framework. This allows to continue to improve the system safely and build confidence in it over time. The safeguards were audited prior to the initial launch of fault proofs and none of the findings from these three audits allowed those safeguards to be circumvented. Notably, this proposal fixes three issues which were not identified by these audits - despite the very high quality of the auditors. If you have any specific concerns with the changes outlined in the proposal, always happy to discuss further! The audit framework gives some more detail on how OP Labs thinks about audits.
could you also elaborate on Cantina 3 . 3 . 5 , i.e. the incorrect implementation of the srav opc…
could you also elaborate on Cantina 3 . 3 . 5 , i.e. the incorrect implementation of the srav opcode? why is it considered low severity?
The srav opcode deviating from the spec is only considered low priority because the actual MIPS instructions generated by the go compiler when compiling op-program are not affected by the difference in behavior. So it doesn’t currently have an impact on the fault proof system but may in the future due to changes in the go compiler or op-program. Additional information about this class of bugs can be found on this section of the Cannon FPVM spec 1
secondly, to me it’s a bit concerning that the fault proof system is outside the scope of external audits. I understand that in Stage 1 , given the fallback, the blacklisting mechanism, the pause button and the Security Council, a bug is unlikely to affect the safety of the system. Citing Vitalik, the amazing property of Stage 1 rollups is being able to be safe AND live assuming < 25 % of honest members in the SC, also assuming small probability of bugs. If this probability is not small, and the system is often in the fallback state, the system mostly relies on a > 75 % honesty, defeating the purpose of Stage 1 .
The audit framework 1 gives some more detail on the reasoning behind the choice to leave the fault proof system outside of the initial scope. We feel that one of the best ways to gain confidence in the fault proof system is to operate it in the real world. This proposal is a good example of that - three audits have been completed and while they did find a number of issues, there were also issues they did not find that were identified either through the bug bounty program or from experience actually working with the fault proof system.
You are absolutely right that reducing the probability of bugs in the fault proof–as this upgrade itself seeks to accomplish–is important, and we felt that the path we took would ultimately get us to a bug-minimized system the fastest. At the end of the day, the community should be empowered to hold us accountable to how upgrades happen. While we (and even those who voted no on the original upgrade 5 ) continue to believe that the risks posed by this approach are not about the fundamental security of the system, there should continue to be space for the community to push back if they feel that other (i.e. reputational) risks are too high going forward.
Hi @inphi,
Thanks for putting up the detailed proposal about a potential upgrade to fix the vulnera…
Hi @inphi,
Thanks for putting up the detailed proposal about a potential upgrade to fix the vulnerabilities found in the conducted audits after the deployment of the Fault Proofs upgrade. We appreciate the team’s effort on fixing the issues and improving the system further while activating the permissioned fallback mechanism with proper coordinations and cautions.
Let us clarify two points,
Looking at the Cantina’s audit report 4 , Cantina 3 . 1 . 1 was considered “Critical”, the most severe bug type of which must be fixed ASAP while you indicated this bug’s severity as “High”. That’s possibly because the team considered a potential exploit is not feasible with the Go runtime memory protection, but we believe it’s misleading as it’s an important information for us to evaluate how the Fault Proofs system should be reviewed and audited going forward. You mentioned other issues that weren’t found from the audit were identified because of running the system in production, but this is not necessarily because of deploying the system without audits complete.
In the last upgrade proposal, we (alongside @zachobront) expressed the concern 3 about the fact that the system would be deployed without proper audits on the upgrade code while we understood that you made the clarification 1 on how OP Labs considered the upgrade and audit on it. Yet, we suggested that coordinating with the security council, the Labs could reconsider the deployment timing. Apparently, the deployment was occurred as planned and now, there was a critical bug that caused a fallback operation. Was there even a discussion about the concerns that we made? How’s Security Council responsible for the situation?
The following reflects the views of L 2 BEAT’s governance team, composed of @krst and @Sinkas, and …
The following reflects the views of L 2 BEAT’s governance team, composed of @krst and @Sinkas, and it’s based on the combined research, fact-checking, and ideation of the two.
We’ll be voting FOR this proposal as we find it important to fix the already known bugs in the production environment. However, we would like to raise our concern as to whether the current approach of releasing early and relying on fallback mechanism to prevent anything bad to happen is the right one.
As @Zachobront mentioned in a comment under the Protocol Upgrade # 7 proposal, the Foundation’s approach to the fault dispute mechanism poses a reputational risk. As it was proven, Zach’s concerns were on point, and we now had to revert to the permission fallback mechanism while the bugs found in the fault dispute mechanism are patched.
While it might not seem like a big deal, given users’ funds were not at risk due to fallbacks, it actually is since there’s a very thin line between the current situation and a case where the Security Council is needed to secure the chain.
Luca Donnoh, a researcher at L 2 BEAT, has written an article 4 that explains the risks associated with potential lack of trust in the fault proof mechanism:
… Even if the protocol requires a lot of funds to be pooled to protect it, one can argue that finding liquidity is not a difficult task since it eventually guarantees very high profits, assuming that the proof system works correctly. We argue that this assumption shouldn’t be taken lightly. Let’s say that an attacker actually spends billions of dollars to attack a protocol, and then signals on a social or with an onchain message that they found a bug in the challenge protocol where defenders are guaranteed to lose their funds. No one knows if the bug actually exists or if it’s just a bluff, but it can be used as an effective deterrent to prevent reaching the target amount of funds needed to save the chain. …
In simple terms, while the approach of deploying early and “testing in production” is safe in terms of there’s no (or very limited) risk to users’ funds as there are fallback mechanisms in place, we feel that if it leads to continuous instances where we actually have to use those fallbacks, it can damage confidence in the design of the system in the long run, and therefore make it much harder to get it working in a Stage 2 environment where no such fallbacks will be available.
Apparently I was confused about the deadline for this proposal, I was certain that it ends 7 pm UT…
Apparently I was confused about the deadline for this proposal, I was certain that it ends 7 pm UTC but it ended few minutes before me posting the rationale. However, the vote still passed and we were supportive of it so no harm done, but I am sorry about it and we will make sure to vote earlier in the future to avoid such cases.
We vote FOR this proposal.
In order to explain our rationale behind this decision, we want to provi…
We vote FOR this proposal.
In order to explain our rationale behind this decision, we want to provide some context and considerations from our perspective.
Background:
The Fault Proof proposal was introduced three months ago. The upgrade included one of the most anticipated implementations: real fraud proofs. However, the proposal included two key aspects that raised many questions and doubts about the actual impact and risk of the upgrade: the lack of a complete audit of the system and the conception of the Guardian roles as a consequence.
Amid various concerns and questions, a key comment by Zach was raised regarding the risks introduced by this upgrade; most of them were understood by us within the “reputational risk” category, as the Guardian role was specifically set to minimize any existential risk. For us, our position was to abstain due to the present risk, aligning with the opinion of the DAB leader. We highly value the minimization of reputational risk. Nevertheless, the Collective sent a strong signal in favor of the upgrade.
Granite:
As detailed, bugs were found, which was certainly an expected outcome given the circumstances. Most of the fixes are related to the findings identified in the audit results. In order to move forward, that is, to return to the mode where fault proofs are fully operational, these fixes need to be implemented.
Implementing this upgrade should objectively move us to a safer stage than before, prior to the permissioned mode being triggered. However, it is mentioned how complex Fault Proofs are, so more bugs could still be present. As the safeguards are assumed to be well-audited and managed by the Security Council and Foundation, it should be acceptable to continue having the system as is, even though there are multiple concerns about its design, implementation, and maturity.
Going ahead:
All the discussions across various instances about this upgrade have left us with several points to consider for the Collective:
Highlight the importance of the Developer Advisory Board in keeping delegates well-informed and, in a sense, making recommendations and outlining expected outcomes for each possible choice. Also, all delegates should ensure that every aspect of protocol upgrade proposals is sufficiently understood before offering support.
The preference for more conservative measures has been expressed by some members of the Collective that should be taken into account. In a scenario where Audits vs. Shipping, the balance might lean more towards the former.
Related to point ( 2 ), the Collective should revisit the Audit Framework, as the sense of reputational risk could be more highly valued than the current version suggests.
Expectations on what the Fault Proof roadmap should look like, including the communication of the current constraints and challenges around it and how the system should evolve, regardless of the approach to a multi-proof system.
The pertinent disclosure of how the running and monitoring of the system actually work, and which nice-to-have features would be appropriate to encourage, for governance’s awareness. This includes any action that could favor the redundancy of the system’s monitoring.
We are aware of the current security gaps in the fault proofs and how crucial the proposed solution…
We are aware of the current security gaps in the fault proofs and how crucial the proposed solution is for the future of the chain. With these debugging processes, even the weakest links in the Optimism chain will be strengthened, resulting in a more robust structure. Therefore, as ITU Blockchain Delegation Committee, we support this proposal.