By Rehan Van Der Merwe
In this post, we’ll be diving into the technical details, like configuration options and any limitations you need to know about, as well as looking at how to use this technical knowledge to effectively design serverless and Lambda systems.
At the end of it all, you should have a clearer understanding of the key considerations you need to bear in mind when designing around AWS Lambda.
When you hear the word ‘serverless’, AWS Lambda is most likely the first thing you think about. That’s no surprise; the tech hit our industry by storm and brings with it a whole new paradigm of solutions.
AWS Lambda was the first Function as a Service (FaaS) technology I was exposed to, and like others, I was highly critical at first. There are no servers to manage, it auto-scales, has fault tolerance built-in, and is pay per usage—all of which sounds like a dream.
With great power comes great responsibility. Serverless design requires knowledge of different services and how they interact with each other.
Just like any other technology, there are some tricky waters to navigate, but they are far outweighed by the power of what serverless has to offer. To stop this dream from turning into a nightmare, here are a few things to keep in mind when designing with AWS Lambda.
The memory setting of your Lambda determines both the amount of power and unit of billing. There are 44 options to choose from between the slowest or lowest 128 MB, and the largest, 3,008 MB. This gives you quite a variety to choose from!
If you allocate too little memory, your program might take longer to execute and might even exceed the time limit, which stands at 15 minutes. On the other hand, if you assign too much memory, your function might not even use a quarter of all that power and end up costing you a fortune.
It’s crucial to find your function’s sweet spot. AWS states that if you assign 1,792 MB, you get the equivalent of 1 vCPU, which is a thread of either an Intel Xeon core or an AMD EPYC core. That’s about as much as they say about the relationship between the memory setting and CPU power.
There are a few people who’ve experimented and come to the conclusion that after 1,792 MB of memory, you do indeed get a second CPU core and so on, however, the utilization of these cores can’t be determined.
Cheaper isn’t always better—sometimes choosing a higher memory option that is more expensive upfront can reduce the overall execution time. This means that the same amount of work can be done within a smaller time period, so by fine-tuning the memory settings and finding the optimal point, you can make your functions execute faster as opposed to the same low memory setting.
You may end up paying the same—or even less—for your function than with the lower alternative.
The bottom line is that CPU and memory should not be high on your design consideration list. AWS Lambda, just like other serverless technologies, is meant to scale horizontally.
Breaking the problem into smaller, more manageable pieces and processing them in parallel is faster than many vertically scaled applications. Design the function and then fine-tune the memory setting later as needed.
Breaking the problem into smaller manageable pieces and processing them in parallel is faster than many vertically scaled applications.
AWS Lambda has two invocation models and three invocation types. What this means is that there are two methods of acquiring the data and three methods through which the data is sent to the Lambda function.
The invocation model and type determine the characteristics behind how the function responds to things like failures, retries and scaling that we’ll use later on.
The sending part can then be done in one of three ways, and is known as the invocation type:
Below are a few examples that showcase the different models and invocation types available:
To my knowledge, there are no Pull models that do Event type invocations. Pull models are further divided into two sections, stream-based and non-stream based. Also, note that the API Gateway invocation type can be changed to Event (async) by adding a header before sending the data to the Lambda.
This is most probably one of the most important considerations: how a Lambda fails and retries is based on the invocation type. For all Event-based invocations, if Lambda throws an error it will be invoked two more times—so three times in total, separated by a delay.
This behavior can only be configured for async invocations, the retry value can be between 0 and 2.
If a Dead Letter Queue (DLQ) is configured, the message will be sent to the configured SQS or SNS topic, or the error will just be sent to CloudWatch.
Register with us and get the latest hand-picked AWS roles direct to your inbox.Sign me up!
With the RequestResponse invocation type, the caller needs to act on the error returned. For API Gateway (Push + Request Response) the caller can maybe log the failure, then retry again.
When it comes to Kinesis Streams (Pull stream-based + Request Response) it acts as a FIFO queue/stream. This means if the first message is processed in error by the Lambda, it will block the whole stream from being processed until that message either expires or is processed successfully.
A failed message or batch can be removed by using properties on the event source mapping such as; Maximum Record Age, Maximum Retry Attempts and Bisect Batch on Function Failure, more on this in Part 2 under Handling errors.
Idempotent system: A system will always output the same result given the same input.
It’s important to understand the failure and retry behavior of each invocation type, as a general rule of thumb, design all your functions to be idempotent.
This basically just means that if the function is invoked multiple times with the same input data then the output will/must always be the same. When you design like this, the retry behavior will not be a problem in your system.
AWS provides Versions and Aliases out of the box for your Lambda code. This might not be as straightforward and useful as you would think. A few things to keep in mind:
There are three ways in which you can use versioning and aliases. A single Lambda function that gets a new version number whenever there is a change to code or configuration. The alias will be used as the stage and pointed to the correct version of the Lambda function.
Again, it’s imperative to note that if something for the older versions, for example, version 3 (now the Live alias/stage) needs to change it cannot, so you can’t even quickly increase the timeout setting.
In order to change it, you would need to redeploy version 3 as version 5 with the new setting and then point the Live alias to version 5. Then keeping in mind that Version 5 is actually older than version 4, this gets unnecessarily complex very quickly.
The second method that comes to mind is a blue-green deployment. Which is a little less complex where you would have three different Lambdas, one for each stage—blue being the old version and green being the new version.
Just like before each new deployment of a Lambda is versioned. Then when you are ready to make the new code changes live, you create an alias that specifies, for example, 10% of traffic uses the old version and then 90% of the requests go to the new version.
This is called Canary Deployments, although AWS doesn’t label it as such, it allows you to gradually shift traffic to the new version.
The third method is the simplest and plays nicely with IaC (Infrastructure as Code) tools like the AWS CDK, CloudFormation, SAM, terra and CICD (Continuous Integration Continuous Deployment) pipelines. It’s based on the principle that each Lambda is “tightly” coupled with its environment/infrastructure.
The whole environment and Lambda are deployed together, any rollback will mean that a previous version of the infrastructure and Lambda needs to be deployed again.
This offloads the responsibility of versioning to the IaC tool being used. Each Lambda function name includes the stage and is deployed as a whole, with the infrastructure.
The main reason to place a Lambda inside a VPC is so that it can access other AWS resources inside the VPC on their internal IP addresses/endpoints.
If the function does not need to access any resources inside the VPC, it is strongly advised to leave it outside the VPC. The reason being that inside the VPC each Lambda container will create a new Elastic Network Interface (ENI) and be IP address.
Your Lambda will be constrained by how fast this can scale and the amount of IP addresses and ENIs you have.
AWS Lambda creates a Hyperplane Elastic Network Interface (ENI) for every unique security group & subnet combination across all your functions.
This one-time setup can take up to 90 seconds, all subsequent invocations will use this shared network interface. They won’t suffer any additional cold start times waiting for an ENI to be attached at invocation time.
As soon as you place the Lambda inside the VPC, it loses all connectivity to the public internet. This is because the ENIs attached to the Lambdas only have private IP addresses.
So it is best practice to assign the Lambda to three private subnets inside the VPC, then connect the private subnets to go through a NAT in one of the public subnets. The NAT will then have a public IP and send all traffic to the Internet Gateway.
This also has a benefit that the egress traffic from all Lambdas will come from a single IP address, but it introduces a single point of failure, this is of course mitigated by using the NAT Gateway over the NAT instance.
As with all AWS services, the principle of least privilege should be applied to the IAM Roles of Lambda functions. When creating IAM Roles, don’t set the Resource to all (*), set the specific resource. Setting and assigning IAM roles this way can be annoying, but is worth the effort in the end.
By glancing at the IAM Role you will then be able to know what resources are being accessed by the Lambda and then also how they are being used (from the Action attribute). It can also be used for discovering service dependencies at a glance.
If your function is inside a VPC, there must be enough IP addresses and ENIs for scaling. A Lambda can potentially scale to such an extent that it depletes all the IPs and/or ENIs for the subnets/VPC it is placed in.
To prevent this, set the concurrency of the Lambda to something reasonable. Advance networking and Hyperplane ENIs mitigate the risk of running out of IPs within your subnets, but it can still happen if your subnets are very small.
By default, AWS sets a limit of 1000 concurrent executions for all the Lambdas combined in your account, of which you can assign 900 and the other 100 is reserved for functions with no limits.
Reserving concurrency like this is a “poor man’s method of throttling” and a safety net in many cases.
For Push model invocations (ex: S3 Events), Lambda scales with the number of incoming requests until concurrency or account limit is reached. For all Pull model invocation types, scaling is not instant.
For the stream-based Pull model with Request Response invocation types (ex: DynamoDB Streams and Kinesis) the amount of concurrent Lambdas running will be the same as the number of shards for the stream.
As opposed to the non-stream based Pull model with Request Response invocation types (ex: SQS), Lambdas will be gradually spun up to clear the Queue as quick as possible. Starting with five concurrent Lambdas, then increasing with 60 per minute up to 1000 in total, or until the limits are reached again.
Browse the latest AWS jobs and find the perfect role for you, wherever you are in the world.Get started
Each Lambda is an actual container on a server. When your Lambda is invoked it will try to send the data to a warm Lambda, a Lambda container that is already started and just sitting there waiting for event data.
If it does not find any warm Lambda containers, it will start/launch a new Lambda container, wait for it to be ready and then send the event data. This wait time can be significant in certain cases.
When your Lambda is inside a VPC, the cold start time increases even more as it needs to wait for an ENI (private IP address) before being ready.
Even milliseconds can be significant in certain environments.
There are two methods for keeping a Lambda container warm. The first and only method until re:Invent 2019 was to manually ping the Lambda function.
This is usually done with a Cloudwatch Event Rule (cron) and another Lambda, the cron can be set for five minutes.
The CloudWatch rule will invoke the Lambda that will ping the function that you want to keep warm, keep in mind that one ping will only keep one warm Lambda container alive. If you want to keep three Lambda containers warm, then the ping Lambda must concurrently invoke the function three times in parallel.
The second, and now preferred, method is setting Provisioned Concurrency. Provisioned Concurrency will keep a specified amount of warm lambda containers around for you.
Similar to the ping method, you still experience a cold start but that happens in the background. Your users will then immediately hit one of the warm lambdas and not experience any latency associated with cold starts.
You can even use Application Auto Scaling to scale the number of warm containers to keep around based on a Schedule or Target tracking policy.
How you handle errors and failures all depends on the use case and the Lambda service that invoked the Lambda. There are different types of errors that can occur:
Remember—certain errors can’t be caught by the runtime environment. As an example, in NodeJS, if you throw an error inside a promise without using the reject callback, then the whole runtime will crash. It won’t even report the error to CloudWatch Logs or Metrics, it just ends.
These must be caught by your code, as most runtimes have events that are emitted on exit and report the exit code and reason before exiting. When it comes to SQS, a message can be delivered more than once, and if it fails, it’ll be re-queued after the visibility timeout and then retried. When your function has a concurrency of less than five, the AWS polling function will still take messages from the queue and try to invoke your function.
This will return a concurrency limit reached exception, and the message will then be marked as being unsuccessful and returned to the queue—this is unofficially called “over-polling.” If you have a DLQ configured on your function, messages might be sent there without being processed, but we’ll say more about this later.
Then for stream-based services like DynamoDB and Kinesis streams, you have to handle the error within the function or it’ll be retried indefinitely; you can’t use the built-in Lambda DLQs here. For all other async invocations, if it fails the first invocation, it will retry two or more times.
These retries mostly happen within three minutes of each other, but in rare cases, it may take up to six hours, and there might also be more than three retries.
Dead Letter Queues (DLQ) to the rescue. Maybe not, DLQs only apply to async invocations; it doesn’t work for services like SQS, DynamoDB streams and Kinesis streams. For SQS use a Redrive Policy on the SQS queue and specify the Dead Letter Queue settings there.
It’s important to set the visibility timeout to at least six times the timeout of your function and the maxReceiveCount value to at least five. This helps prevent over-polling, messages being throttled and then being sent to the DLQ when the Lambda concurrency is low. Alternatively, you could handle all errors in your code with a try-catch-finally block.
You get more control over your error handling this way and can send the error to a DLQ yourself. Now that the events/messages are in the DLQ and the error is fixed, these events have to be replayed so that they’re processed. They need to be taken off the DLQ, and then that event must be sent to the Lambda once again so that it can be processed successfully.
There are different methods to do this and it might not happen often, so a small script to pull the messages and invoke the Lambda will do the trick. Replaying functionality can also be built into the Lambda, so that it knows it if receives a message from the DLQ to extract the original message and run the function.
At re:Invent 2019, AWS announced new features to combat a few bad messages from holding up the whole queue for (Kinesis and DynamoDB streams). The expire time can now be controlled by setting the Maximum Record Age on the event source mapping, this value can be between 60 and 21600 seconds.
The Maximum Retry Attempts can also be set, this value can be between 0 and 10,000.
Bisect Batch on Function Failure can be used to help isolate these “poison pill” events within a batch. It will recursively split failed batches into smaller batches and retry them until it has isolated the problematic records. These will then be sent to your DLQ if you have one configured.
Lambda Destinations can be used to notify other services about a function’s success or failure. This does not replace the DLQs on Lambdas, both can be used in parallel.
Destination on Failure can be used to send info about these failed events to an SQS queue, SNS topic, another Lambda, or an Event Bridge topic. Unlike DLQs that only get the message, the destination includes additional function execution information like stack traces if the function failed.
AWS Step Functions also give you granular control over how errors are handled. We can control how many times it needs to be retried, the delay between retries, and the next state. With all these methods available, it’s crucial that your function is idempotent. Even for something complex like credit card transactions, it can be made idempotent by first checking if the transaction with your stored transaction callback ID has been successful, or if it exists.
If it doesn’t, then only carry out the credit deduction. If you can’t get your functions to be idempotent, consider the Saga pattern. For each action, there must also be a rollback action.
Taking the credit card example again, the Lambda that has a Create Transaction function must also have a Reverse Transaction function, so that if an error happens after the transaction has been created, it can propagate back and the reverse transaction function can be fired. So that the state is exactly the same as it was before the transaction began.
Of course, it’s never this straightforward when working with money, but it’s a solid example.
If you can’t get your functions to be idempotent, consider the Saga pattern. For each action, there must also be a rollback action.
Take a look at our database of pre-screened AWS professionals and take the first step toward landing the best administrators, developers, and consultants in the market.Take a look
Duplicate messages can be identified by looking at the context.awsRequestId inside the Lambda. It can be used to de-dupe duplicate messages, if a function cannot be idempotent then this should be used.
Store this ID in a cache like Redis or a DB to use it in the de-dupe logic; this introduces a new level of complexity to the code, so keep it as a last resort and always try to code your functions to be idempotent.
A Lambda can also look at the context.getRemainingTimeInMillis() function to know how much time is left before the function will end. This is so that if processing takes longer than usual, it can stop, gracefully do some end function logic, and return a soft error to the caller.
Coupling goes beyond Lambda design considerations—it’s more about the system as a whole. Lambdas within a microservice are sometimes tightly coupled, but this is nothing to worry about as long as the data passed between Lambdas within their little black box of a microservice is not over-pure HTTP and isn’t synchronous.
Lambdas shouldn’t be directly coupled to one another in a Request Response fashion, but asynchronously. Consider the scenario when an S3 Event invokes a Lambda function, then that Lambda also needs to call another Lambda within that same microservice and so on. You might be tempted to implement direct coupling, like allowing Lambda 1 to use the AWS SDK to call Lambda 2 and so on. This introduces some of the following problems:
This process can be redesigned to be event-driven: Not only is this the solution to all the problems introduced by the direct coupling method, but it also provides a method of replaying the DLQ if an error occurred for each Lambda. No message will be lost or need to be stored externally, and the demand is decoupled from the processing.
The direct coupling method would have had failures if more than 1,000 objects were uploaded at once and generated events to invoke the first Lambda. This way, Lambda 1 can set its concurrency to be five and use the batch size to only take X amount of records from the queue and thus control maximum throughput.
Going beyond a single microservice, when events are passed between them, both need to understand and agree upon the structure of the data. Most of the time both microservices can’t be updated at the exact same time, so be sure to version all your events.
This way you can change all the microservices that listen for event version 1 and add the code to handle version 2. Then update the emitting microservice to emit version 2 instead of 1, always with backward compatibility in mind.
Batching is particularly useful in high transaction environments. SQS and Kinesis streams are some services that offer batching messages, sending the batch to the Lambda function instead of each and every message separately.
By batching the values in groups of around 10 messages instead of one, you might reduce your AWS Lambda bill by 10 and see an increase in system performance throughput.
By batching the values in groups of around 10 messages instead of one, you might reduce your AWS Lambda bill by 10 and see an increase in system performance throughput. One of the downsides to batching is that it makes error handling complex. For example, one message might throw an error and the other nine are processed successfully.
Then Lambda needs to either manually put that one failure on the DLQ, or return an error so that the external error handling mechanisms, like the Lambda DLQ, do their jobs. It could be that case that a whole batch of messages need to be reprocessed; here, being idempotent is again the key to success.
If you’re taking things to the next level, sometimes batching is not enough. Consider a use case where you’ve got a CSV file with millions of records that need to be inserted into DynamoDB. The file is too big to load into the Lambda memory, so instead, you stream it within your Lambda from S3.
The Lambda can then put the data on an SQS queue and another Lambda that can take the rows in batches of 10 and write them to DynamoDB using the batch interface.
This sounds okay, right? The thing is, a much higher throughput and lower cost can actually be achieved if the Lambda function that streams the data writes to DynamoDB in parallel. Start building groups of batch write API calls, where each can hold a maximum of 25 records.
These can then be started and limited to roughly 40 parallel/concurrent batch writes, without much tuning, you will be able to reach 2k writes per second.
Modern problems require modern solutions. Since traditional tools won’t work for a Lambda and serverless environment, it’s difficult to find visibility and to monitor the system.
There are many tools out there to help with this including internal AWS Services like X-Ray, CloudWatch Logs, CloudWatch Alarms, and CloudWatch Insights. You could also turn to third-party tools like Epsagon, IOPipe, Lumigo, Thunderbird, and Datadog to just name a few. All these tools deliver valuable insights in the form of logs and charts to help evaluate and monitor the serverless environment. One of the best things you can do is to get visibility early on and fine-tune your Lambda architecture.
Finding the root cause of a problem and tracing the event as it goes through the whole system can be extremely valuable.
Writing structured logs is extremely important, since it allows you to run CloudWatch Insight queries over all your Lambda functions, giving you insights that were previously only available using third-party tools.
Always try to make Lambda functions stateless and idempotent, regardless of the invocation type and model. Lambdas aren’t designed to work on a single big task, so break it down to smaller tasks and process it in parallel. After that, the single best thing to do would be to measure three times and cut once; do a lot of upfront planning, experiments, and research and you’ll soon develop a good intuition for serverless design.
The Jefferson Frank Salary Survey provides a unique insight into the Amazon Web Services community.Download the report
AWS insights now