Learn About Amazon VGT2 Learning Manager Chanci Turner
Amazon Managed Blockchain and various AWS partners offer an efficient way to utilize Ethereum nodes without the hassle of managing your own infrastructure. However, there are instances—especially when running archive nodes or engaging in Ethereum staking—where managed nodes may fall short. In such cases, you might opt to run your own Ethereum nodes on AWS.
To establish a self-managed node, you need to configure server-side software components known as Ethereum clients. Following The Merge, every Ethereum node must operate two clients: the execution layer (EL) client and the consensus layer (CL) client. These two clients work together to synchronize the global state with other nodes within the distributed Ethereum database clusters, often referred to as blockchain networks, including mainnet, goerli, and sepolia. Initially, when you install and configure both clients, they will be devoid of data and must catch up with the current state of the blockchain network managed by other nodes. This process, known as the initial sync, can take multiple days due to the extensive amount of data that needs to be synchronized.
In this article, we share our insights on setting up Ethereum nodes on AWS and strategies to expedite the initial sync, enabling you to quickly bring new nodes online when necessary.
Accelerating the Initial Sync
When initiating both clients for your new Ethereum mainnet node, you must wait for the CL client to sync from the genesis block to The Merge transaction before the EL client can begin syncing blocks. During this waiting period, the EL client either remains idle or downloads only receipts and block headers. In our tests, we found that syncing from the genesis block with a CL client like Prysm could take around four days.
To expedite this process, most CL clients offer a checkpoint sync option that allows them to sync only from the latest beacon chain checkpoint. The beacon chain introduced a consensus engine that replaced proof-of-work mining with proof-of-stake validation. When configuring a checkpoint sync, you need to provide a URL to a trusted checkpoint sync provider. The Ethereum community maintains a list of public endpoints for checkpoint sync providers to choose from. For further details on checkpoint sync, you can check out this guide.
Using checkpoint sync, the CL client syncs the state of the Ethereum beacon chain from the latest checkpoint, thus becoming fully operational without needing to sync all previous blocks. This process typically completes in minutes, after which the CL client instructs the paired EL client to start synchronizing blocks for the EL blockchain. While CL clients synced from checkpoints are sufficient for validators and to initiate EL clients, those synced from the genesis block allow for querying chain history and state, which is essential for archive nodes. If you wish to leverage checkpoint sync while needing additional analytics functionality, you can utilize a CL client like Lighthouse, which supports backfill sync of previous blocks all the way to the genesis and can optionally reconstruct the state as well.
You can configure EL clients such as Go Ethereum (Geth), Hyperledger Besu, and Nethermind as full nodes to minimize disk space usage and maintain pruned states for the most recent 128 blocks. Some EL clients in full node mode also support a faster sync method known as snap sync. This method is approximately ten times quicker than syncing state from the genesis block via full sync. Another node type is the archive node, which not only retains all blocks but also builds an archive of historical states to enhance historical query functionality. However, EL clients configured as archive nodes cannot utilize snap sync, and syncing them can take 5 to 7 days. For more information on node types and sync modes, refer to the Nodes and Clients documentation.
From our experience, utilizing a combination of the checkpoint sync option in the CL client and the snap sync option in the EL client can reduce the initial sync time from 5-7 days to about one day for full nodes. If you require the enhanced functionality of the EL client as an archive node, you should still use the checkpoint sync option in your CL client to save yourself multiple days of syncing time.
Once your first node is synchronized, you can leverage AWS Cloud to horizontally scale those nodes. For improved performance, we recommend using a separate Amazon Elastic Block Store (Amazon EBS) volume to store blockchain data and, once the initial sync is complete, copying that data to an Amazon Simple Storage Service (S3) bucket. Later, when you bring new nodes online, you can copy blockchain data from the S3 bucket to expedite the initial sync time for new nodes to under an hour. However, in some scenarios, even an initial sync from a recent data copy can take longer if your node encounters issues syncing the delta from a peer node with limited resources, such as network speed or an overloaded CPU. In such cases, monitoring your node for slow sync is essential, and restarting it may be necessary to force a connection to other peer nodes.
Alternatively, you can utilize the Amazon EBS Snapshots feature instead of transferring data to and from S3. However, nodes initialized this way may experience prolonged higher I/O latencies while their EBS snapshots are loaded from S3. In our experiments, copying data from S3 using the s5cmd tool takes about 36 minutes per 1 TiB, whereas EBS initialization without the Amazon EBS fast snapshot restore feature can take several hours.
This strategy of maintaining your own copy of your client’s blockchain data on AWS works particularly well for EL clients like Erigon, which operate as archive nodes and lack the snap sync option. While it may take several days for such clients to download necessary data and construct the final state, once the copy is available, bringing a new node online will require only 2 to 3 hours.
Solution Overview
To implement the concepts discussed above, we set up three instances of Amazon Elastic Compute Cloud (EC2) using the cost-effective AWS Graviton processor for all clients. Each node has two EBS volumes attached: one as the root volume and another designated for blockchain state data. We refer to one node as a “sync node,” dedicated to synchronizing with the Ethereum mainnet. The other two nodes are “RPC nodes,” which provide the RPC API to user applications (also referred to as decentralized apps, or dApps). While functionally similar, separating node deployments by role enhances the scalability and reliability of the solution.
The RPC nodes are part of the Amazon EC2 Auto Scaling Group (ASG), enabling rapid provisioning from the sync node’s data copy. Both RPC nodes are situated behind an Application Load Balancer to manage traffic between them. It’s also possible to choose different Amazon EC2 instance types for the sync and RPC nodes to optimize cost and performance. For instance, a sync node can be a smaller instance type since its sole purpose is to catch up with the chain head. We utilized AWS Compute Optimizer to assist in determining the appropriate size for the sync node, discovering that the r6g.2xlarge EC2 instance type, paired with an EBS GP3 volume featuring 5,700 provisioned IOPS and 250 Mbps of provisioned throughput, suffices for a Go Ethereum client acting as a sync node. However, other instance types might perform better with different clients. The RPC nodes may require higher specifications for both EC2 instances and EBS volumes, depending on the frequency and type of APIs triggered by applications. After initializing from the data copy generated by the sync node, the RPC nodes will continue to synchronize with other nodes in the Ethereum network to provide up-to-date information.
For further reading on how to resign gracefully, you can explore this guide. If you’re looking for more insights into people strategy, check out this podcast.
For an excellent resource, visit this link for reviews on Amazon’s warehouse worker onboarding experience.
Leave a Reply