Ask the Expert: 13 AWS Lambda design considerations you need to know about
In this post, we’ll be diving into the technical details, like configuration options and any limitations you need to know about, as well as looking at how to use this technical knowledge to effectively design serverless and Lambda systems.
At the end of it all, you should have a clearer understanding of the key considerations you need to bear in mind when designing around AWS Lambda.
When you hear the word ‘serverless’, AWS Lambda is most likely the first thing you think about. That’s no surprise; the tech hit our industry by storm and brings with it a whole new paradigm of solutions.
AWS Lambda was the first Function as a Service (FaaS) technology I was exposed to, and like others, I was highly critical at first. There are no servers to manage, it auto-scales, has fault tolerance built-in, and is pay per usage—all of which sounds like a dream.
With great power comes great responsibility. Serverless design requires knowledge of different services and how they interact with each other.
Just like any other technology, there are some tricky waters to navigate, but they are far outweighed by the power of what serverless has to offer. To stop this dream from turning into a nightmare, here are a few things to keep in mind when designing with AWS Lambda.
The memory setting of your Lambda determines both the amount of power and unit of billing. There are 44 options to choose from between the slowest or lowest 128 MB, and the largest, 3,008 MB. This gives you quite a variety to choose from!
If you allocate too little memory, your program might take longer to execute and might even exceed the time limit, which stands at 15 minutes. On the other hand, if you assign too much memory, your function might not even use a quarter of all that power and end up costing you a fortune.
It’s crucial to find your function’s sweet spot. AWS states that if you assign 1,792 MB, you get the equivalent of 1 vCPU, which is a thread of either an Intel Xeon core or an AMD EPYC core. That’s about as much as they say about the relationship between the memory setting and CPU power.
There are a few people who’ve experimented and come to the conclusion that after 1,792 MB of memory, you do indeed get a second CPU core and so on, however, the utilization of these cores can’t be determined.
Cheaper isn’t always better—sometimes choosing a higher memory option that is more expensive upfront can reduce the overall execution time. This means that the same amount of work can be done within a smaller time period, so by fine-tuning the memory settings and finding the optimal point, you can make your functions execute faster as opposed to the same low memory setting.
You may end up paying the same—or even less—for your function than with the lower alternative.
The bottom line is that CPU and memory should not be high on your design consideration list. AWS Lambda, just like other serverless technologies, is meant to scale horizontally.
Breaking the problem into smaller, more manageable pieces and processing them in parallel is faster than many vertically scaled applications. Design the function and then fine-tune the memory setting later as needed.
Breaking the problem into smaller manageable pieces and processing them in parallel is faster than many vertically scaled applications.
AWS Lambda has two invocation models and three invocation types. What this means is that there are two methods of acquiring the data and three methods through which the data is sent to the Lambda function.
The invocation model and type determine the characteristics behind how the function responds to things like failures, retries and scaling that we’ll use later on.
- Push: when another service sends information to Lambda.
- Pull: an AWS managed Lambda polls data from another service and sends the information to Lambda.
The sending part can then be done in one of three ways, and is known as the invocation type:
- Request Response: this is a synchronous action; meaning that the request will be sent and the response will be waited on. This way, the caller can receive the status for the processing of the data.
- Event: this an asynchronous action; the request data will be sent and the Lambda only acknowledges that it received the event. In this case, the caller doesn’t care about the success of processing that particular event. Its only job was to deliver the data.
- Dry Run: this is just a testing function to check that the caller is permitted to invoke the function.
Below are a few examples that showcase the different models and invocation types available:
- API Gateway Request is a Push model and by default has a Request Response invocation The HTTP request is sent through to the Lambda function, the API gateway then waits for the Lambda function to return the response.
- S3 Events notifications, SNS Message, Cloudwatch Events is a Push model and Event invocation
- SQS Message is a Pull model and a Request Response invocation AWS has a Lambda function that pulls data from the Queue and then sends it to your Lambda function. If it returns successfully, the AWS-managed polling Lambda will remove it from the queue.
- DynamoDB Streams and Kinesis Streams are Pull models and have a Request Response invocation. This one is particularly interesting as it pulls data from the stream and then invokes our Lambda synchronously. Later, you’ll see that if the Lambda fails it will try and process that message indefinitely (or until it expires), keeping other messages from being processed as a result.
To my knowledge, there are no Pull models that do Event type invocations. Pull models are further divided into two sections, stream-based and non-stream based. Also, note that the API Gateway invocation type can be changed to Event (async) by adding a header before sending the data to the Lambda.
This is most probably one of the most important considerations: how a Lambda fails and retries is based on the invocation type. For all Event-based invocations, if Lambda throws an error it will be invoked two more times—so three times in total, separated by a delay.
If a Dead Letter Queue (DLQ) is configured, the message will be sent to the configured SQS or SNS topic, or the error will just be sent to CloudWatch.
Be the first to know about the jobs you want.
Get the latest hand-picked AWS roles direct to your inbox with our jobs by email service.
With the RequestResponse invocation type, the caller needs to act on the error returned. For API Gateway (Push + Request Response) the caller can maybe log the failure, then retry again.
When it comes to Kinesis Streams (Pull stream-based + Request Response) it acts as a FIFO queue/stream. This means if the first message is processed in error by the Lambda, it will block the whole stream from being processed until that message either expires or is processed successfully.
Idempotent system: A system will always output the same result given the same input.
It’s important to understand the failure and retry behavior of each invocation type, as a general rule of thumb, design all your functions to be idempotent.
This basically just means that if the function is invoked multiple times with the same input data then the output will/must always be the same. When you design like this, the retry behavior will not be a problem in your system.
AWS provides Versions and Aliases out of the box for your Lambda code. This might not be as straightforward and useful as you would think. A few things to keep in mind:
- Versioning only applies to the Lambda code, not to the Infrastructure that it uses and depends on.
- Once a version is published, it basically becomes read-only.
There are three ways in which you can use versioning and aliases. A single Lambda function that gets a new version number whenever there is a change to code or configuration. The alias will be used as the stage and pointed to the correct version of the Lambda function.
Again, it’s imperative to note that if something for the older versions, for example, version 3 (now the Live alias/stage) needs to change it cannot, so you can’t even quickly increase the timeout setting.
In order to change it, you would need to redeploy version 3 as version 5 with the new setting and then point the Live alias to version 5. Then keeping in mind that Version 5 is actually older than version 4, this gets unnecessarily complex very quickly.
The second method that comes to mind is a blue-green deployment. Which is a little less complex where you would have three different Lambdas, one for each stage—blue being the old version and green being the new version.
Just like before each new deployment of a Lambda is versioned. Then when you are ready to make the new code changes live, you create an alias that specifies, for example, 10% of traffic uses the old version and then 90% of the requests go to the new version.
This is called Canary Deployments, although AWS doesn’t label it as such, it allows you to gradually shift traffic to the new version.
The third method is the simplest and plays nicely with IaC (Infrastructure as Code) tools like CloudFormation, SAM and CICD (Continuous Integration Continuous Deployment) pipelines. It’s based on the principle that each Lambda is “tightly” coupled with its environment/infrastructure.
The whole environment and Lambda are deployed together, any rollback will mean that a previous version of the infrastructure and Lambda needs to be deployed again.
This offloads the responsibility of versioning to the IaC tool being used. Each Lambda function name includes the stage and is deployed as a whole, with the infrastructure.
The main reason to place a Lambda inside a VPC is so that it can access other AWS resources inside the VPC on their internal IP addresses/endpoints.
If the function does not need to access any resources inside the VPC, it is strongly advised to leave it outside the VPC. The reason being that inside the VPC each Lambda container will create a new Elastic Network Interface (ENI) and be IP address.
Your Lambda will be constrained by how fast this can scale and the amount of IP addresses and ENIs you have.
As soon as you place the Lambda inside the VPC, it loses all connectivity to the public internet. This is because the ENIs attached to the Lambdas only have private IP addresses.
So it is best practice to assign the Lambda to three private subnets inside the VPC, then connect the private subnets to go through a NAT in one of the public subnets. The NAT will then have a public IP and send all traffic to the Internet Gateway.
This also has a benefit that the egress traffic from all Lambdas will come from a single IP address, but it introduces a single point of failure, this is of course mitigated by using the NAT Gateway over the NAT instance.
As with all AWS services, the principle of least privilege should be applied to the IAM Roles of Lambda functions. When creating IAM Roles, don’t set the Resource to all (*), set the specific resource. Setting and assigning IAM roles this way can be annoying, but is worth the effort in the end.
By glancing at the IAM Role you will then be able to know what resources are being accessed by the Lambda and then also how they are being used (from the Action attribute). It can also be used for discovering service dependencies at a glance.
If your function is inside a VPC, there must be enough IP addresses and ENIs for scaling. A Lambda can potentially scale to such an extent that it depletes all the IPs and/or ENIs for the subnets/VPC it is placed in.
To prevent this, set the concurrency of the Lambda to something reasonable. By default, AWS sets a limit of 1000 concurrent executions for all the Lambdas combined in your account, of which you can assign 900 and the other 100 is reserved for functions with no limits.
For Push model invocations (ex: S3 Events), Lambda scales with the number of incoming requests until concurrency or account limit is reached. For all Pull model invocation types, scaling is not instant.
For the stream-based Pull model with Request Response invocation types (ex: DynamoDB Streams and Kinesis) the amount of concurrent Lambdas running will be the same as the number of shards for the stream.
As opposed to the non-stream based Pull model with Request Response invocation types (ex: SQS), Lambdas will be gradually spun up to clear the Queue as quick as possible. Starting with five concurrent Lambdas, then increasing with 60 per minute up to 1000 in total, or until the limits are reached again.
Be the master of your AWS destiny.
Let our expert consultants find the perfect role for you, wherever you are in the world.
Each Lambda is an actual container on a server. When your Lambda is invoked it will try to send the data to a warm Lambda, a Lambda container that is already started and just sitting there waiting for event data.
If it does not find any warm Lambda containers, it will start/launch a new Lambda container, wait for it to be ready and then send the event data. This wait time can be significant in certain cases.
When your Lambda is inside a VPC, the cold start time increases even more as it needs to wait for an ENI (private IP address) before being ready.
Even milliseconds can be significant in certain environments. The only method to keep a container warm is to manually ping it. This is usually done with a Cloudwatch Event Rule (cron) and another Lambda, the cron can be set for five minutes.
The CloudWatch rule will invoke the Lambda that will ping the function that you want to keep warm, keep in mind that one ping will only keep one warm Lambda container alive. If you want to keep three Lambda containers warm, then the ping Lambda must concurrently invoke the function three times in parallel.
General design considerations
How you handle errors and failures all depends on the use case and the Lambda service that invoked the Lambda. There are different types of errors that can occur:
- Configuration These happen when you’ve incorrectly specified the file or handler, or have incorrect directory structures, missing dependencies, or the function has insufficient privileges. Most of these won’t be a surprise and can be caught and fixed after deployment.
- Runtime These are usually related to the code running in the function, unforeseen bugs that we introduce ourselves and will be caught by the runtime environment.
- Soft errors Usually not a bug, but an action that our code identifies as an error. One example being when the Lambda code intentionally throws an error after retrying three times to call a third-party API.
- Timeouts and memory They fall in a special kind of category as our code usually runs without problems, but it might have received a bigger event than we expected and has to do more work than we budget for. They can be fixed by either changing the code or the configuration values.
Remember—certain errors can’t be caught by the runtime environment. As an example, in NodeJS, if you throw an error inside a promise without using the reject callback, then the whole runtime will crash. It won’t even report the error to CloudWatch Logs or Metrics, it just ends.
These must be caught by your code, as most runtimes have events that are emitted on exit and report the exit code and reason before exiting. When it comes to SQS, a message can be delivered more than once, and if it fails, it’ll be re-queued after the visibility timeout and then retried. When your function has a concurrency of less than five, the AWS polling function will still take messages from the queue and try to invoke your function.
This will return a concurrency limit reached exception, and the message will then be marked as being unsuccessful and returned to the queue—this is unofficially called “over-polling.” If you have a DLQ configured on your function, messages might be sent there without being processed, but we’ll say more about this later.
Then for stream-based services like DynamoDB and Kinesis streams, you have to handle the error within the function or it’ll be retried indefinitely; you can’t use the built-in Lambda DLQs here. For all other async invocations, if it fails the first invocation, it will retry two or more times.
These retries mostly happen within three minutes of each other, but in rare cases, it may take up to six hours, and there might also be more than three retries.
Dead Letter Queues (DLQ) to the rescue. Maybe not, DLQs only apply to async invocations; it doesn’t work for services like SQS, DynamoDB streams and Kinesis streams. For SQS use a Redrive Policy on the SQS queue and specify the Dead Letter Queue settings there.
It’s important to set the visibility timeout to at least six times the timeout of your function and the maxReceiveCount value to at least five. This helps prevent over-polling, messages being throttled and then being sent to the DLQ when the Lambda concurrency is low. Alternatively, you could handle all errors in your code with a try-catch-finally block.
You get more control over your error handling this way and can send the error to a DLQ yourself. Now that the events/messages are in the DLQ and the error is fixed, these events have to be replayed so that they’re processed. They need to be taken off the DLQ, and then that event must be sent to the Lambda once again so that it can be processed successfully.
There are different methods to do this and it might not happen often, so a small script to pull the messages and invoke the Lambda will do the trick. Replaying functionality can also be built into the Lambda, so that it knows it if receives a message from the DLQ to extract the original message and run the function.
The trigger between the DLQ and the Lambda will always be disabled but then enabled after the code is fixed and the message can be reprocessed. AWS Step Functions also give you granular control over how errors are handled. We can control how many times it needs to be retried, the delay between retries, and the next state. With all these methods available, it’s crucial that your function is idempotent. Even for something complex like credit card transactions, it can be made idempotent by first checking if the transaction with your stored transaction callback ID has been successful, or if it exists.
If it doesn’t, then only carry out the credit deduction. If you can’t get your functions to be idempotent, consider the Saga pattern. For each action, there must also be a rollback action.
Taking the credit card example again, the Lambda that has a Create Transaction function must also have a Reverse Transaction function, so that if an error happens after the transaction has been created, it can propagate back and the reverse transaction function can be fired. So that the state is exactly the same as it was before the transaction began.
Of course, it’s never this straightforward when working with money, but it’s a solid example.
Looking to take your AWS career to the next level?
Start your search today and find out exactly why 96% of job seekers would recommend us to a friend.
Duplicate messages can be identified by looking at the context.awsRequestId inside the Lambda. It can be used to de-dupe duplicate messages, if a function cannot be idempotent then this should be used.
Store this ID in a cache like Redis or a DB to use it in the de-dupe logic; this introduces a new level of complexity to the code, so keep it as a last resort and always try to code your functions to be idempotent.
A Lambda can also look at the context.getRemainingTimeInMillis() function to know how much time is left before the function will end. This is so that if processing takes longer than usual, it can stop, gracefully do some end function logic, and return a soft error to the caller.
Coupling goes beyond Lambda design considerations—it’s more about the system as a whole. Lambdas within a microservice are sometimes tightly coupled, but this is nothing to worry about as long as the data passed between Lambdas within their little black box of a microservice is not over-pure HTTP and isn’t synchronous.
Lambdas shouldn’t be directly coupled to one another in a Request Response fashion, but asynchronously. Consider the scenario when an S3 Event invokes a Lambda function, then that Lambda also needs to call another Lambda within that same microservice and so on. You might be tempted to implement direct coupling, like allowing Lambda 1 to use the AWS SDK to call Lambda 2 and so on. This introduces some of the following problems:
- If Lambda 1 is invoking Lambda 2 synchronously, it needs to wait for the latter to be done first. Lambda 1 might not know that Lambda 2 also called Lambda 3 synchronously, and Lambda 1 may now need to wait for both Lambda 2 and 3 to finish successfully. Lambda 1 might timeout as it needs to wait for all the Lambdas to complete first, and you’re also paying for each Lambda while they wait.
- What if Lambda 3 has a concurrency limit set and is also called by another service? The call between Lambda 2 and 3 will fail until it has concurrency again. The error can be returned to all the way back to Lambda 1 but what does Lambda 1 then do with the error? It has to store that the S3 event was unsuccessful and that it needs to replay it.
This process can be redesigned to be event-driven: Not only is this the solution to all the problems introduced by the direct coupling method, but it also provides a method of replaying the DLQ if an error occurred for each Lambda. No message will be lost or need to be stored externally, and the demand is decoupled from the processing.
The direct coupling method would have had failures if more than 1,000 objects were uploaded at once and generated events to invoke the first Lambda. This way, Lambda 1 can set its concurrency to be five and use the batch size to only take X amount of records from the queue and thus control maximum throughput.
Going beyond a single microservice, when events are passed between them, both need to understand and agree upon the structure of the data. Most of the time both microservices can’t be updated at the exact same time, so be sure to version all your events.
This way you can change all the microservices that listen for event version 1 and add the code to handle version 2. Then update the emitting microservice to emit version 2 instead of 1, always with backward compatibility in mind.
Batching is particularly useful in high transaction environments. SQS and Kinesis streams are some services that offer batching messages, sending the batch to the Lambda function instead of each and every message separately.
By batching the values in groups of around 10 messages instead of one, you might reduce your AWS Lambda bill by 10 and see an increase in system performance throughput. One of the downsides to batching is that it makes error handling complex. For example, one message might throw an error and the other nine are processed successfully.
Then Lambda needs to either manually put that one failure on the DLQ, or return an error so that the external error handling mechanisms, like the Lambda DLQ, do their jobs. It could be that case that a whole batch of messages need to be reprocessed; here, being idempotent is again the key to success.
If you’re taking things to the next level, sometimes batching is not enough. Consider a use case where you’ve got a CSV file with millions of records that need to be inserted into DynamoDB. The file is too big to load into the Lambda memory, so instead, you stream it within your Lambda from S3.
The Lambda can then put the data on an SQS queue and another Lambda that can take the rows in batches of 10 and write them to DynamoDB using the batch interface.
This sounds okay, right? The thing is, a much higher throughput and lower cost can actually be achieved if the Lambda function that streams the data writes to DynamoDB in parallel. Start building groups of batch write API calls, where each can hold a maximum of 25 records.
These can then be started and limited to roughly 40 parallel/concurrent batch writes, without much tuning, you will be able to reach 2k writes per second.
Modern problems require modern solutions. Since traditional tools won’t work for a Lambda and serverless environment, it’s difficult to find visibility and to monitor the system.
There are many tools out there to help with this including internal AWS Services like X-Ray, CloudWatch Logs, CloudWatch Alarms, and CloudWatch Insights. You could also turn to third-party tools like Epsagon, IOPipe, Lumigo, Thunderbird, and Datadog to just name a few. All these tools deliver valuable insights in the form of logs and charts to help evaluate and monitor the serverless environment. One of the best things you can do is to get visibility early on and fine-tune your Lambda architecture.
Finding the root cause of a problem and tracing the event as it goes through the whole system can be extremely valuable.
Within a Lambda your code can do parallel work. If you’re receiving a batch of 10 messages, instead of doing 10 downstream API calls synchronously, do them in parallel. It reduces the run time as well as the cost of the function. This should always be considered first before moving on to the more complex fan out-fan in.
Lambdas doing work in parallel will usually benefit from an increase in memory.
All functions must be idempotent, and do consider making them stateless. One example we can look at is when functions need to keep some small amount of session data for API calls. Let the caller send the SessionData with the SessionID and make sure both these fields are encrypted. Then, decrypt their values and use it in the Lambda, this can spare you from carrying out repeated external calls or using a cache.
You might not need an external cache; a small amount of data can be stored above the Lambda function handler method in memory. This data will remain for the duration of the Lambda container.
Alternately, each Lambda is allowed 500 MB of file storage in /tmp, and data can be cached here as a file system call will always be faster than a network call.
Keep data inside the same region, to avoid paying for data to be transferred beyond that region.
Only put your Lambda inside a VPC if it needs to access private services or needs to go through a NAT for downstream services that whitelist IPs.
Remember NAT data transfer costs money, and services like S3 and DynamoDB are publicly available. All data that flows to your Lambdas inside the VPC will need to go through the NAT.
Consider using S3 and DynamoDB VPC Gateway Endpoints—they’re free and you only pay for Interface Endpoints.
Batching messages can increase throughput and reduce Lambda invocations, but this also increases complexity of error handling.
Big files can be streamed from S3.
Step functions are expensive, so try and avoid them for high throughput systems.
If you have a monorepo where all the Lambda functions live for that microservice, consider creating a directory that gets symlinked into each of those Lambda directories. Shared code only needs to be managed in one place; alternately, you could look into putting shared code into a Lambda layer.
A lot of AWS Services make use of encryption to protect data, so consider using that instead of encrypting on an application level.
The less code you write the less technical debt you build up.
Where possible use the AWS Services as they are intended to be used, or you could end up needing to build tooling around your specific use case and manipulate those services. This once again adds to your technical debt. AWS Lambda scales quickly and integration with other systems—like a MySQL DB with a limited amount of open connections—can quickly become a problem. There is no silver bullet for integrating a service that scales and one that doesn’t. The best thing that can be done is to either limit the scaling or implement queuing mechanisms between the two.
Environment variables should not hold security credentials, so try using AWS SSM‘s Parameter Store instead. It’s free and is great for most use cases; when you want higher concurrency, consider using Secret Manger, it also supports secret rotation for RDS but it comes at higher cost than Parameter Store.
Consider using Infrastructure as Code (IaC) if you aren’t.
API Gateway has the ability to proxy the HTTP request to other AWS Services, eliminating the need for an intermediate Lambda function. Using Velocity Mapping Templates you can change the normal POST parameters into the arguments that DynamoDB requires for the PUT action. This is great for simple logic where the Lambda would have just transformed the request before doing the DynamoDB command.
Always try to make Lambda functions stateless and idempotent, regardless of the invocation type and model. Lambdas aren’t designed to work on a single big task, so break it down to smaller tasks and process it in parallel. After that, the single best thing to do would be to measure three times and cut once; do a lot of upfront planning, experiments, and research and you’ll soon develop a good intuition for serverless design.
Find great AWS jobs in your sleep
Upload your resume and let our expert consultants find the perfect role for you.