Use Concurrency on AWS Lambda to Maximize Performance

September 17, 2024

Cloud Native App Dev

Learn the differences between multithreading and multiprocessing, and the right configuration for different types of applications. This comprehensive guide uses a relatable post office analogy to explain complex concepts, helping you optimize your Lambda functions for better performance and cost-efficiency.

In this article, we will be discussing the benefits of running application code concurrently within AWS Lambda functions.

Understanding Concurrency: Multithreading vs Multiprocessing

Concurrency is the ability to run multiple instances of the same program in parallel or seemingly in parallel. Concurrency can be achieved using a variety of methods, and most commonly it is done through multithreading, multiprocessing, or asynchronous programming. The first two methods refer to the manipulation of the computer’s CPU while asynchronous programming is more of a programming paradigm.

In this article, we will focus on multithreading and multiprocessing. To understand these two concepts, we will walk through the scenario of sending five letters at the post office.

Single-thread

In a single-threaded model, a post office worker will send one letter, wait for the return letter to arrive from the recipient, and then proceed to send the next letter. The worker waits for each letter to return before being able to send the next one. If you think that’s inefficient, you’re right, and of course post offices are not designed that way.

Multithreading

In a multithreaded model, the worker is able to send the first, then the next, until all five letters are sent. Since the worker can send each letter one after the other in sequence it is not blocked from sending new letters while waiting on the recipient.

As you can see, in the case of multithreaded concurrency, the worker does not need to wait for a response before sending the other mail.

Single-process

Now that the letters have been sent to be delivered, we turn our focus to the delivery process where a worker will deliver the letters.

In a single-process model, imagine a single delivery worker delivering each of the five letters one by one. The second letter cannot be delivered until the first is completed. Also, any new letter that gets added to the process will require more time for the additional delivery. Just like with the single-threaded scenario, this process is not efficient at all. If this were the actual delivery process, the post office would have a mailing crisis.

Multiprocessing

In a real mail delivery scenario, there’s not just one delivery worker delivering mail. The post office has multiple delivery workers that quickly and efficiently spread out the delivery process to work in parallel. Each worker can deliver a separate letter and that process is independent of the status of other workers. Adding a new delivery worker will speed up the process, and adding a new letter to one worker does not slow down the other workers.

Comparing the scenarios above, in the multithreaded example, the post office worker is more efficient than in the single-threaded model.

In the case of multiprocessing, we see that there is true concurrency as the delivery workers are sent out in parallel to deliver the mail. Each new delivery worker that is added to the process results in more letters being delivered with no significant time increase.

Default Concurrency in AWS Lambda

As a starting point it should be noted that Lambda already offers concurrency and can support a multiprocessing model. The best-case scenario for achieving concurrency from the post office example is exactly how Lambda operates. This is done in the context that multiple requests for a Lambda function can execute multiple instances of that single function in parallel. For example, as shown in the image below, if there are multiple requests for the Lambda function, AWS will automatically provision new environments to execute the required number of concurrent Lambda functions. As you may have guessed, this is especially useful for cases when the application receives a lot of traffic.

Expanding on our post office example, we can describe how concurrency is handled in Lambda functions - it is like adding a new post office as the city population grows larger. Each new post office can handle and process more letters and can do it independently of the other post offices. The new post offices are analogous to a new execution of the Lambda function, and within each function instance we can use multithreading (one worker sending letters without being blocked) or multiprocessing (multiple delivery workers delivering mail) to solve our concurrency problems.

In the diagram above, each new Lambda function represents a new instance of a single function that is being run multiple times in parallel. It shows the ability to scale out to do more work without requiring more time. This horizontal scaling has massive elasticity but it does reach an eventual limit. It is limited to 1,000 concurrent executions within an AWS account, but this limit can be increased by requesting a quota increase to AWS.

For more information on the horizontal scalability of AWS Lambda functions, the developer guide documentation provides excellent guidance.

Though it’s great that AWS can execute multiple instances of your Lambda function, it is important to remember that this will have an impact on your bill. Since Lambda is billed (mostly) based on the total execution time and the memory allocated to your function, any cost-conscious person will seek to improve their Lambda function efficiency as much as possible.

To achieve concurrency within the AWS Lambda environment, we have to turn to the concurrency options provided by the Lambda runtime. With Python, you can choose multithreading or multiprocessing based on the context of your application.

Code Examples to Explore Concurrency

With AWS Lambda, each environment is provided with vCPU based on the amount of memory allocated. There is no clear relationship between memory and vCPU, but based on their documentation:

At 1,769 MB, a function has the equivalent of one vCPU (one vCPU-second of credits per second).

For our examples, we will use Python 3.11 with 1,024 MB of memory allocated unless otherwise specified. We have two separate mock functions: one for I/O operations and one for CPU-intensive operations. For the I/O operation, the function sleeps for ten seconds. For the CPU-intensive operation, the function counts down from the INITIAL_COUNT variable, which is set to 200 million, until it hits zero.

import threading
import os
import time
from multiprocessing import Process, current_process

NUM_TIMES_TO_RUN = 3
INITIAL_COUNT = 200000000

def mock_io_function(index):
    start = time.time()
         print(f"Thread {index} executed.")
    time.sleep(10)
    print(f"Thread {index} complete in: {time.time() - start}.")

def mock_cpu_intensive_function(initial, index):
    start = time.time()
    pid = os.getpid()
    processName = current_process().name
    
    print(f"{pid} * {processName} ---> Process {index} executed.")
    while (initial > 0):
        initial -= 1
    print(f"{pid} * {processName} ---> Process {index} finished in: {time.time() - start}")

def single_thread_single_process_io_example():
    start_time = time.time()
    
    for i in range(NUM_TIMES_TO_RUN):
        mock_io_function(i)
    
    end_time = time.time()
    print(f"It took {end_time - start_time} seconds to finish running the single thread, single process IO example.")

def multithread_io_example():
    start_time = time.time()
    
    threads = []
    for i in range(NUM_TIMES_TO_RUN):
        thread = threading.Thread(target=mock_io_function, args=(i,))
        threads.append(thread)
        thread.start()
    
    # Wait for all threads to complete
    for thread in threads:
        thread.join()
    
    end_time = time.time()
    print(f"It took {end_time - start_time} seconds to finish running the multithreaded I/O example.")

def single_thread_single_process_cpu_example():
    start_time = time.time()
    
    for i in range(NUM_TIMES_TO_RUN):
        mock_cpu_intensive_function(INITIAL_COUNT, i)
    
    end_time = time.time()
    print(f"It took {end_time - start_time} seconds to finish running the single thread, single process CPU intensive function.")

def multithread_cpu_example():
    start_time = time.time()
    
    threads = []
    for i in range(NUM_TIMES_TO_RUN):
        thread = threading.Thread(target=mock_cpu_intensive_function, args=(INITIAL_COUNT, i,))
        threads.append(thread)
        thread.start()
    
    # Wait for all threads to complete
    for thread in threads:
        thread.join()
    
    end_time = time.time()
    print(f"It took {end_time - start_time} seconds to finish running the multithreaded CPU intensive function.")

def multiprocess_cpu_example():
    start_time = time.time()
    
    processes = []
    for i in range(NUM_TIMES_TO_RUN):
        process = Process(target=mock_cpu_intensive_function, args=(INITIAL_COUNT, i, ))
        processes.append(process)

    for process in processes:
        process.start()

    # Wait for all threads to complete
    for process in processes:
        process.join()
    
    end_time = time.time()
    print(f"It took {end_time - start_time} seconds to finish running the mock CPU intensive function.")

def lambda_handler(event, context):
    event_type = event.get("type")
    if event_type == "single_thread_single_process_io":
        single_thread_single_process_io_example()
    elif event_type == "multi_thread_io":
        multithread_io_example()
    elif event_type == "single_thread_single_process_cpu":
        single_thread_single_process_cpu_example()
    elif event_type == "multi_thread_cpu":
        multithread_cpu_example()
    elif event_type == "multi_process_cpu":
        multiprocess_cpu_example()

We will specify which test to run using the Lambda console test Event JSON, for example:

{
  "type": "multi_thread_io"
}

Example: Single-thread, single-process with I/O-intensive task

{
  "type": "single_thread_single_process_io"
}

Since each run of the mock I/O function takes ten seconds to complete, we can expect that the single-thread, single-process function takes thirty seconds to complete if we run it three times (NUM_TIMES_TO_RUN variable). From the output of the function, we can see that’s exactly what happened:

Thread 0 executed.
Thread 0 complete in: 10.00019884109497.
Thread 1 executed.
Thread 1 complete in: 10.000157356262207.
Thread 2 executed.
Thread 2 complete in: 10.00011134147644.
It took 30.00074791908264 seconds to finish running the single thread, single process IO example.

Example: Multithreading with I/O-intensive task

{
  "type": "multi_thread_io"
}

In the case of multiple threads, each thread executes the target function but rather than waiting for the target function to finish, it spawns the next thread in parallel. This reduces the completion time of running the mock I/O function three times to just ten seconds:

Thread 0 executed.
Thread 1 executed.
Thread 2 executed.
Thread 0 complete in: 10.000297546386719.
Thread 1 complete in: 10.000250101089478.
Thread 2 complete in: 10.000178813934326.
It took 10.000999927520752 seconds to finish running the multithreaded I/O example.

As you can see, multithreading is a viable solution to handling multiple I/O requests, as the Lambda environment does not have to wait idly for the I/O request to be complete. It should be noted that thorough testing should be done to configure the appropriate number of threads based on your application context. Having too few threads leads to a longer overall processing time, and too many threads leads to idle resources and inefficiency.

Example: Single-thread, single-process with CPU-intensive task

{
  "type": "single_thread_single_process_cpu"
}

When we run the mock CPU-intensive function three times sequentially, it takes around 62 seconds to complete:

8 * MainProcess ---> Process 0 executed.
8 * MainProcess ---> Process 0 finished in: 20.814173460006714
8 * MainProcess ---> Process 1 executed.
8 * MainProcess ---> Process 1 finished in: 20.759973526000977
8 * MainProcess ---> Process 2 executed.
8 * MainProcess ---> Process 2 finished in: 20.822417497634888
It took 62.3967809677124 seconds to finish running the single thread, single process CPU intensive function.

Example: Multithreading with CPU-intensive task

{
  "type": "multi_thread_cpu"
}

What should be expected for multiple threads when running a CPU-intensive task? Well, let's see:

8 * MainProcess ---> Process 0 executed.
8 * MainProcess ---> Process 1 executed.
8 * MainProcess ---> Process 2 executed.
8 * MainProcess ---> Process 1 finished in: 58.669029712677
8 * MainProcess ---> Process 0 finished in: 58.99575591087341
8 * MainProcess ---> Process 2 finished in: 59.18151044845581
It took 59.27164316177368 seconds to finish running the multithreaded CPU intensive function.

Well, that didn’t help much, did it? This is because Python has a Global Interpreter Lock (GIL). Unless a thread is idly waiting, multiple threads cannot be executed in parallel. Since we are actively counting down, each thread is executed, but it is not complete until all three runs are done.

Example: Multiprocessing with CPU-intensive task

{
  "type": "multi_process_cpu"
}

This is where multiprocessing comes in handy. Instead of spawning separate threads, we create multiple parallel processes to run these functions. You may be wondering, why can’t we do this for I/O tasks? The answer to that is you can, but it isn’t recommended when multithreading is a viable option. You are making multiple processes wait idly, and processes are costly.

Now, if we incorporate multiprocessing for our CPU-intensive task, it should be faster. But, we run into a weird situation where it isn’t more efficient:

10 * Process-1 ---> Process 0 executed.
11 * Process-2 ---> Process 1 executed.
12 * Process-3 ---> Process 2 executed.
10 * Process-1 ---> Process 0 finished in: 49.40802764892578
11 * Process-2 ---> Process 1 finished in: 53.059587240219116
12 * Process-3 ---> Process 2 finished in: 58.38159775733948
It took 58.39720845222473 seconds to finish running the mock CPU intensive function.

It takes around 58 seconds to finish, which is hardly an improvement! This is likely because the Lambda is currently only associated with 1 vCPU at 1,796 MB of memory, and multiprocessing on 1 vCPU can be highly inefficient for CPU-heavy tasks.

We can see the improvement from more vCPU by increasing the memory, to say 4,096 MB.

If we run the single-thread, single-process CPU-intensive task with 4,096 MB of memory, it speeds up the completion time:

9 * MainProcess ---> Process 0 executed.
9 * MainProcess ---> Process 0 finished in: 10.742210865020752
9 * MainProcess ---> Process 1 executed.
9 * MainProcess ---> Process 1 finished in: 10.742223024368286
9 * MainProcess ---> Process 2 executed.
9 * MainProcess ---> Process 2 finished in: 10.740402460098267
It took 32.22504734992981 seconds to finish running the single thread, single process CPU intensive function.

And for multiprocessing, we see an even better improvement:

10 * Process-1 ---> Process 0 executed.
11 * Process-2 ---> Process 1 executed.
12 * Process-3 ---> Process 2 executed.
10 * Process-1 ---> Process 0 finished in: 15.486387968063354
11 * Process-2 ---> Process 1 finished in: 15.498388051986694
12 * Process-3 ---> Process 2 finished in: 15.6626296043396
It took 15.672866344451904 seconds to finish running the mock CPU intensive function.

With increased memory and consequently increased vCPU, we can see that it took less than half the time to process.

Cost Considerations

The grid below shows a cost breakdown based on the run-time duration of the function. For each invocation of the function, the cost breakdown is:

We can see that for both multithreading and multiprocessing, they reduce the per invocation cost for their respective I/O and CPU-intensive tasks. We see that not only do these concurrency models run their workloads faster, they also reduce costs.

Conclusion

Concurrency is a powerful tool to make your applications more efficient. The three different types of concurrency discussed in this post have specific use cases where they are beneficial. If the application is I/O intensive, a multithreaded approach is preferred so that the thread isn’t blocked by the I/O request. If the application is CPU intensive, a multiprocessing approach is preferred so that each task can be executed in parallel. And if the application is traffic intensive, the application will horizontally scale by default so that multiple instances of the function can be running in parallel.

If you have questions about implementing the right concurrency model for your applications on AWS, Caylent can help. We can work with you and your team to build cost-conscious and efficient solutions to deliver value for your business and your customers.

Cloud Native App Dev

Kevin Nha

Kevin is a Sr. Software Engineer in the Cloud Native Applications practice at Caylent. He has built many solutions using TypeScript, Python, and Java, and has in-depth experience with building serverless applications on AWS. Having previously worked at Amazon, Kevin has an in-depth understanding of AWS technologies and closely works within the Leadership Principles. He enjoys building and rebuilding applications in the AWS ecosystem and helping clients build cloud-native applications.

View Kevin's articles

Accelerate your cloud native journey

Leveraging our deep experience and patterns

Get in touch

Automated Testing with Jest on AWS

Learn how to automate testing and safeguard your JavaScript apps using Jest with AWS CodeBuild and CodePipeline.

Cloud Native App Dev

Infrastructure & DevOps Modernization

AWS Announcements

July 20, 2023

AWS Lambda Performance Boost with SnapStart

Supercharge AWS Lambda cold start times by up to 90% by leveraging AWS Lambda SnapStart and Firecracker, helping you minimize latency without any additional caching costs.

AWS Announcements

Cloud Native App Dev

March 9, 2022

Caylent Catalysts

Learn how we develop and implement Caylent Catalysts - a set of accelerators designed to fuel your AWS cloud adoption initiatives.

Migrations

Cloud Native App Dev

Security

View all blog posts

Understanding Concurrency: Multithreading vs Multiprocessing

Single-thread

Multithreading

Single-process

Multiprocessing

Default Concurrency in AWS Lambda

Code Examples to Explore Concurrency

Example: Single-thread, single-process with I/O-intensive task

Example: Multithreading with I/O-intensive task

Example: Single-thread, single-process with CPU-intensive task

Example: Multithreading with CPU-intensive task

Example: Multiprocessing with CPU-intensive task

Cost Considerations

Conclusion

Kevin Nha

Accelerate your cloud native journey

Related Blog Posts

Automated Testing with Jest on AWS

AWS Lambda Performance Boost with SnapStart

Caylent Catalysts