In recent discussions, there have been numerous inquiries regarding the handling of resource collections in the Ruby SDK for AWS. To clarify these concepts, I will outline some prevalent patterns and their applications, primarily focusing on Amazon EC2, although the insights provided will be relevant across various service interfaces.
Let’s begin by launching an IRB session and establishing a service interface to communicate with EC2:
$ irb -r rubygems -r aws-sdk
> ec2 = AWS::EC2.new(:access_key_id => "KEY", :secret_access_key => "SECRET")
The EC2 service offers a wide array of collections. Initially, it’s essential to locate an Amazon Machine Image (AMI) to initiate instances. We can access the images available through the images collection:
> ec2.images => <AWS::EC2::ImageCollection>
Notice that this method returns promptly; the Ruby SDK employs lazy loading for its collections, meaning that merely retrieving the collection does not trigger any processing. This is advantageous since often, you might not need to load the entire collection. For instance, if you have the AMI ID, you can directly access it as follows:
> image = ec2.images["ami-310bcb58"] => <AWS::EC2::Image id:ami-310bcb58>
Again, this operation completes quickly. We have specified that we want the AMI with ID ami-310bcb58, but we haven’t detailed our intentions yet. To get the description, we can do:
> image.description => "Amazon Linux AMI i386 EBS"
This request takes a bit longer, and if logging is activated, you will observe a message such as:
[AWS EC2 200 0.411906] describe_images(:image_ids=>["ami-310bcb58"])
At this point, having requested the description for this AMI, the SDK will query EC2 for the specific information required. The SDK does not retain this information, so if we repeat the request, it will issue another query. While this might seem inefficient initially, the lack of caching allows for straightforward polling for state changes. For instance, to wait until an instance transitions from pending status, we can execute:
> sleep 1 until ec2.instances["i-123"].status != :pending
The []
method is beneficial for retrieving details about a single resource, but how do we access information regarding multiple resources? Let’s again consider EC2 images. First, we can count the available images:
> ec2.images.to_a.size
[AWS EC2 200 29.406704] describe_images() => 7677
The to_a
method generates an array of all images. To gather information about these images, we can utilize the Enumerable methods such as map
or inject
. For example, to obtain all image descriptions, we can use:
> ec2.images.map(&:description)
However, this can be time-consuming. The reason is that the SDK does not cache by default, leading to one request to compile a list of images and then an additional request for each image to retrieve its description. This results in excessive round trips, which is inefficient since EC2 provides all necessary information in the initial response. Unfortunately, the SDK does not leverage that data, requiring individual retrieval for each image. We can enhance efficiency by using:
> AWS.memoize { ec2.images.map(&:description) }
The AWS.memoize
function instructs the SDK to retain all data retrieved from the service within the scope of the block. Hence, when it retrieves the list of images along with their descriptions, it stores this data in a thread-local cache. Thus, when we invoke Image#description
for each array item, the SDK checks the cache before making another request to the service.
This overview merely scratches the surface of what can be achieved with collections in the AWS SDK for Ruby. Beyond the basic practices discussed, many of our APIs support advanced filtering and pagination capabilities. For additional insights on these APIs, you can refer to this extensive blog post or consult authoritative resources such as this guide on the subject. Also, don’t miss out on this excellent resource for further exploration.
Leave a Reply