AWS Lambda: Concurrency | Blog

AWS Lambda concurrency is crucial for application performance and stability. You may notice lambda concurrency once it is too late, in the middle of a production incident.

For example, you may not do any concurrency config, and suddenly, your requests and processing do not work because AWS throttles new lambda executions. Or you may do some concurrency configuration, and you still get throttles. Finally, you may tie things down so that it cannot catch up with the request throughput, and your requests may wait in a queue for a while.

Sounds familiar? Let's discover various lambda concurrency configurations so you can feel confident about the topic.

AWS Lambda: concurrency types

Here is a general list of possible concurrency configurations that are available when we talk about lambdas in AWS:

Unreserved concurrency
Reserved concurrency
Provisioned concurrency
For SQS events - maximum concurrency
For Kinesis events - the parallelization factor

AWS Lambda: unreserved concurrency

Every AWS account has a so-called lambda concurrency pool. Its size is 1000 by default, and you can reach out to AWS Support and ask them to increase the limit. Whenever your Lambda is started, it takes one slot from the pool. Once it finishes, it returns a slot to the pool. As a result, you can have 1000 concurrent lambda executions at a time. If you need to start 1001st Lambda when you already have 1000 Lambda running, you won't be able to, and your Lambda will be throttled. So, if your pool is empty, you can only start a new lambda execution once other lambdas complete their execution.

AWS Lambda: reserved concurrency

You can configure a lambda with a reserved concurrency. With reserved concurrency, Lambda will get a piece of the pool. As a result, if you configured your Lambda to have 5 reserved concurrency executions, your unreserved concurrency pool will be smaller by 5 and become a 995 value pool. You can have fewer lambda executions at a time, but you cannot have more. So if your Lambda is 5 reserved concurrency, you have 5 instances already running and won't be able to start a new lambda execution. The pros of this approach are that even if your unreserved pool is empty, you can still run reserved concurrency lambdas if they have reserved concurrency pool capacity.

AWS Lambda: provisioned concurrency

With this concurrency configuration, you can have several instances in a hot start state (read more about AWS Lambda instances and start types). As a result, you will have some number of lambda instances ready to handle requests that come in - without waiting for additional init time (cold start time). Once you have more lambda executions than you configured in your provisioned concurrency configuration, you will get new instances cold started using unreserved pool executions.

AWS will charge you for the running instances, like if it is a VM running, so be mindful of it. Lambda provisioned concurrency does not come for free.

To configure provisioned concurrency for a lambda, you have to use aliases and then reference the Lambda using an alias. This is an AWS requirement, so I will not go too deep into how to configure provisioned concurrency for a lambda.

AWS Lambda: SQS trigger max concurrency

When Lambda has an SQS trigger configured, AWS has a polling process running in the background. It constantly reads from the queue, and when new messages come in, it starts a new lambda execution based on how it is configured (batch size, batching window).

It is possible to configure how many lambda executions can be started by this polling process.

For example, you configure max concurrency to be 5 and not batching. Once you receive 7 messages, the polling process will start 5 executions. Two messages will be waiting for some of the executions to complete.

The executions here are taken from the unreserved concurrency pool. So, if the unreserved concurrency pool is empty, new executions will be throttled.

Another thing to note is that SQS trigger invocation is synchronous (read more about AWS Lambda Invocation Types).

Combine SQS trigger max concurrency with reserved concurrency

You can use both reserved concurrency executions and SQS trigger max concurrency. This way, your Lambda can always start executions for critical processing tasks. It is better to use the same number for both concurrency configurations.

AWS Lambda: Kinesis events parallelization factor

AWS Lambda's Parallelization Factor for Amazon Kinesis allows you to increase the speed at which your function processes records by specifying the number of concurrent batches that Lambda polls from a single shard. This feature helps with faster stream processing without the need to over-scale the number of shards, while still ensuring the order of records processed. By setting the Parallelization Factor, you can have multiple Lambda invocations processing data from the same Kinesis shard concurrently, enhancing scalability and throughput. The default factor is 1, but it can be increased up to 10, with each parallelized shard maintaining message order to guarantee processing accuracy.