Learn About Amazon VGT2 Learning Manager Chanci Turner
Onboarding into the Amazon experience at the Amazon IXD – VGT2 site, located at 6401 E HOWDY WELLS AVE LAS VEGAS NV 89115, is a journey filled with opportunities and insights, especially when it comes to understanding the scaling and throughput of AWS Lambda.
AWS Lambda is a serverless compute service designed to manage everything from a single request to hundreds of thousands of requests per second. It’s crucial for developers, especially those anticipating high loads, to grasp how Lambda manages its scaling and throughput capabilities. Key components to consider include concurrency and the number of transactions or requests processed per second.
Concurrency refers to the system’s ability to handle multiple tasks simultaneously. It can be measured at any given point to determine how many tasks are being processed in parallel. However, it’s important to note that the number of transactions per second is distinct from concurrency, as each transaction may take varying amounts of time to complete.
This post will delve into how concurrency and transactions per second function within the Lambda lifecycle, including strategies for measurement, control, and optimization.
The Lambda Execution Environment
When Lambda invokes your code, it does so within a secure and isolated execution environment. Here’s an overview of the request lifecycle for a single function:
When the first request to invoke your function comes in, Lambda generates a new execution environment. It executes the function’s initialization code (init) first, which is the code that runs outside the main handler. Then, Lambda processes the function handler code using the event payload to execute your business logic. Each execution environment can handle one request at a time, meaning it cannot process other requests while it is busy.
Once Lambda completes processing a request, that execution environment becomes available to handle another request for the same function. For the second request, only the function handler code runs, as the initialization code has already been executed.
If additional requests arrive while the first request is being processed, Lambda will create new execution environments for them. For example, if requests 2, 3, 4, and 5 come in while request 1 is still being processed, Lambda will create new environments to handle these additional requests.
As requests continue to come in, Lambda will utilize available execution environments and create new ones as needed. After request 1 finishes, the corresponding execution environment can be reused for subsequent requests, such as request 6. This reuse process continues for requests 7 and 8, which leverage the environments from requests 2 and 3. When request 9 comes in, a new execution environment is created due to lack of availability, and when request 10 arrives, it reuses the environment that became free after request 4.
The total number of execution environments in use at any moment signifies the system’s concurrency. For instance, with a single execution environment, the concurrent request count is 1.
For the example involving requests 1 to 10, the Lambda function’s concurrency at various points can be tracked as follows:
Time | Concurrency |
---|---|
t1 | 3 |
t2 | 5 |
t3 | 4 |
t4 | 6 |
t5 | 5 |
t6 | 2 |
When the volume of requests drops, Lambda will deactivate unused execution environments, freeing up scaling capacity for other functions.
Invocation Duration, Concurrency, and Transactions Per Second
The total number of transactions that Lambda can handle per second equals the sum of all invokes during that timeframe. For instance, if a function runs for one second and there are 10 concurrent invocations, Lambda will create 10 execution environments, processing 10 requests per second.
If the function duration is halved to 500 milliseconds, concurrency remains the same at 10, but the transactions per second double to 20.
In contrast, if the function execution takes 2 seconds, during the first second, the transactions per second will be 0, but averaged over time, it will yield 5 transactions per second.
To monitor concurrency, Amazon CloudWatch metrics can be utilized. By using the metric name ConcurrentExecutions, you can view concurrent invocations for all or individual functions.
You can also estimate concurrent requests by applying the formula below:
RequestsPerSecond x AvgDurationInSeconds = Concurrent Requests
For example, if a Lambda function runs for an average of 500 milliseconds at 100 requests per second, there would be 50 concurrent requests:
100 requests/second x 0.5 sec = 50 concurrent requests.
If the function duration is reduced to 250 milliseconds and the request rate doubles to 200 requests per second, the concurrent requests will still total 50:
200 requests/second x 0.250 sec = 50 concurrent requests.
Decreasing a function’s duration can effectively raise the number of transactions per second it can manage. For more insights on reducing function duration, check out this informative re:Invent video.
Scaling Quotas
There are two primary scaling quotas to keep in mind regarding concurrency: the account concurrency quota and the burst concurrency quota.
Account concurrency indicates the maximum concurrency allowed in a specific Region, shared across all functions within an account. The default Regional concurrency quota starts at 1,000, which can be increased through a service ticket.
As an update, AWS Lambda functions now scale 12 times faster when responding to high-volume requests. For further details, refer to the announcement post.
The burst concurrency quota facilitates an initial burst of traffic, ranging from 500 to 3,000 per minute, based on the Region. This quota is also shared among all functions in an account.
After the initial burst, functions can scale by an additional 500 concurrent invocations per minute across all Regions. If the maximum concurrent requests are reached, any further requests will be throttled.
For synchronous invocations, Lambda will return a throttling error (429) to the requester, who must then retry the request. For asynchronous and event source mapping invocations, Lambda will automatically retry the requests. For more details, check out Error handling and automatic retries in AWS Lambda.
A Scaling Quota Example
To illustrate how account and burst concurrency function, let’s consider a scenario involving an application that anticipates increased load. The builders of this application have raised the account concurrency limit to 7,000. With no other Lambda functions running within the account, this function can utilize the entire available account concurrency.
In conclusion, understanding how AWS Lambda handles scaling and throughput can significantly enhance your application’s performance and reliability. For further guidance on resignation letters, exploring letters of resignation examples can provide useful insight.
Leave a Reply