Author:

Fernando Sigüenza

Published on:

November 25, 2024

34

Creating a Serverless AI Bot Fine


Introduction

As technology continues to advance, organisations and developers are constantly seeking efficient, scalable, and cost-effective solutions for their applications.

In this article, we’ll explore how to create a powerful, serverless AI bot by leveraging key Amazon Web Services (AWS) technologies like Lambda, Bedrock, and DynamoDB. 

Implementing this solution using serverless architecture is often more cost-effective and inherently more scalable than traditional server-based services. Furthermore, serverless can automatically scale to handle increased load without manual intervention, ensuring the AI bot can accommodate growing demand or traffic spikes without performance degradation.

This bot can significantly enhance daily work efficiency and reduce costs by automating routine tasks, providing instant responses, and scaling seamlessly to meet demand.


Understanding Serverless Architecture

Serverless architecture is a cloud computing execution model where the cloud provider dynamically manages the allocation and provisioning of servers, allowing developers to focus on writing code without worrying about server management.

One of the key advantages of serverless architecture is the pay-as-you-go pricing model, you’re only charged for the actual compute resources you use and there’s no concept of idle time; you’re not charged when your code isn’t running.

Let’s explore the AWS services we will be using in this article:

AWS Lambda allows you to execute code without provisioning or managing servers. iIt supports multiple programming languages, making it an ideal solution for building scalable, event-driven applications. You can focus on writing code while AWS handles all the underlying infrastructure. The service automatically scales your application by running code in response to each trigger, ensuring optimal performance during peak times and cost-effectiveness during quieter periods. 

Amazon DynamoDB is a serverless NoSQL database service that offers fast and predictable performance with seamless scalability. With its flexible data model and reliable performance, DynamoDB can handle a wide variety of use cases, from simple key-value storage to complex data structures for large-scale web applications. 

Amazon API Gateway  is a fully managed service that enables you to create, publish, maintain, monitor, and secure APIs at any scale. It acts as a ‘front door’ for applications to access data, business logic, or functionality from your backend services. API Gateway allows you to create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. With its integration capabilities, you can connect API Gateway to various AWS services like Lambda, EC2, or any web application.

AWS Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies, including Amazon, Anthropic, AI21 Labs, and Stability AI. It enables developers to build and scale generative AI applications quickly and securely, without the need for extensive machine learning expertise. The service provides tools for customising these models with your data, allowing to create tailored AI solutions for your specific needs.


Hands-On: Creating the Bot

This is a high-level architecture of what we will be building:


Now, let’s build the AI bot step by step:


Step 1: Creating the Lambda function that will call Bedrock

  1. The Lambda code

For this example, we will be using Claude as a model and the bedrock runtime, the main idea is that our Lambda will call the Bedrock client.

For the sake of the example, we will just go with some standard configurations for tokens and temperature, but you can consider sending these values in the events or maybe using environment variables.

The main idea here is that our Lambda function will do a simple passthrough from the user prompt to the Bedrock model and return the response.

Before diving into the code here are some key points about Lambda you should keep in mind:

  • Events: Lambda functions are triggered by other Services, such as HTTP requests, database changes, file uploads, or scheduled tasks. The event data is passed to the function as a parameter, in our case, the trigger will be API Gateway.
  • Handler: The entry point of the Lambda function. In Python, it’s typically named `lambda_handler(event, context)`.
  • Event parameter: This parameter contains the function’s input data. In our case, it will include the user’s prompt and ID from the API Gateway.
  • Context parameter: Provides runtime information about the function’s execution environment.
  • Environment Variables: Lambda allows you to set environment variables for your function.
  • Return value: The function’s output, which can be returned to the caller or used by other AWS services.

  1. Configuring the permissions so you can call bedrock

For the Lambda function to be able to call AWS Bedrock or any other AWS service, we will need to add a specific permission to the policy, supporting the specific action: Remember the principle of least privilege: avoid adding permissions you don’t need.


Step 2: Storing the context in DynamoDB

When interacting with bots, usually you will need to revisit previous messages, or you would like to ask things like “Can you expand on those concepts?”, or “As I asked before”, maintaining the context of the conversation. This might be difficult to achieve since each Lambda is a completely different execution from the previous one, so we will need an additional resource to store (store what? context?) as we can’t do this in memory.

We have several options for storage, each with its own benefits and use cases:

  • In-memory Cache (e.g., Redis or ElastiCache): Best for maintaining context during active, ongoing conversations

   – Pros: Extremely fast access, ideal for live conversations

   – Cons: Data is volatile, potential data loss on service restarts

  • DynamoDB: Ideal for preserving context between separate sessions (hours or days apart)

   – Pros: Persistent storage, scalable, good for long-term data retention

   – Cons: Slightly higher latency compared to in-memory solutions

A combined approach offers the best of both worlds, leveraging the speed of in-memory storage for active conversations while ensuring long-term persistence with DynamoDB, giving an optimal performance for live interactions and safeguards against data loss, making it ideal for systems that need both real-time responsiveness and historical context preservation.

We’ll implement a combined approach using Redis for in-memory caching and DynamoDB.


First set up a Redis cluster using Amazon ElastiCache:


Name: UserContextCache


Port: 6379 (default)Feel free to adjust the other parameters like Node type, Number of replicas, etc depending on your system performance needs
In our lambda function, we will initialize the Redis client like this:

  • And this is how we can get or store the context: 
  • You need to update your lambda policy and include these permissions to call Redis. The permissions for Redis are not managed through IAM policies, instead, you’ll need to configure your Lambda function to access the ElastiCache cluster within your VPC. 
💡  Ensure your Lambda function is in the same VPC as your ElastiCache cluster
 The Redis connection is managed at the network level, not through IAM policies.
– Configure the security group of your Lambda function to allow outbound traffic to the ElastiCache security group.
– Configure the ElastiCache security group to allow inbound traffic from the Lambda function’s security group on the Redis port (typically 6379)

Now for the long-term store, we will create a DynamoDB table that will store the different users’ context. DynamoDB is just an example that provides great flexibility, you can consider other database alternatives. Here are the steps:


1.Create a DynamoDB table:

      • Table name: UserContext
      • Partition key: user_id (String)
      • Sort key: timestamp (Number)

      2. Now that we have a table, we need to update the Lambda code to use the context as needed:

      • Create a DynamoDB client
          • Retrieve the user_id from the event
          • Fetch the previous context from DynamoDB
          • Append the new prompt to the context and call Bedrock with the full context


          3. And of course, update permissions for Lambda to be able to call DynamoDB


          So right now, you should have a Lambda that can store context in Redis and DynamoDB based on the different users. Let’s see how we can expose this using an API Gateway.It’s important to note that the current data model stores only one context per user. This means that if a user engages in multiple conversations or sessions, only the most recent context will be preserved. If you would like to store multiple sessions per user, you need to make some small modifications.

          You can include a session_id parameter, that will act as a unique identifier for each conversation a user has with the bot. Combining the user_id and the session_id will allow to handling of multiple separate conversations for each user, enabling context-switching between different topics or time periods.

          💡  Security Considerations for Storing Context 
          When storing conversation context in DynamoDB, it’s crucial to adopt robust security measures to protect potentially sensitive information. Enable encryption at rest using AWS KMS and implement fine-grained access controls with IAM policies. Consider using VPC endpoints to keep traffic within the AWS network and apply client-side encryption for highly sensitive data. Regularly audit and rotate access keys and encryption keys, and set up CloudTrail for monitoring DynamoDB API calls. Additionally, establish a data retention policy to automatically delete old conversation contexts. These measures will significantly enhance data protection and help ensure compliance with relevant regulations.


          Step 3: Exposing the Lambda with API Gateway

          • Open the AWS Management Console and navigate to API Gateway.
          • Click “Create API” and choose “REST API” (not private).
          • Select “New API” and give it a name (e.g., “BotAPI”), then click “Create API”.
          • Click “Actions” and select “Create Resource”.
            • Enter a resource name (e.g., “bot”)
            • Click “Create Resource”

            With the new resource selected, click “Actions” and select “Create Method”.

            1. Choose “POST” from the dropdown and click the checkmark.


            In the setup pane:

            • Select “Lambda Function” for Integration type
            • Check “Use Lambda Proxy integration”
            • Choose your Lambda function
            • Click “Save” and “OK” to add permission to your Lambda function


            Click “Actions” and select “Deploy API”

              1. Select “[New Stage]” for Deployment stage 
              2. Enter a stage name (e.g., “prod”)
              3. Click “Deploy”


              Note the “Invoke URL” at the top of the stage editor page. This is your API endpoint.


              You might need to update your Lambda function to handle the API Gateway proxy integration and parse the prompt, user_id, and session_id from the request body:


              Now your Lambda function is exposed via API Gateway and can be called from any client that can make HTTP requests.


              The key is that each client formats the request body correctly with a `user_id`, `session_id`, and `prompt`, sends a POST request to your API endpoint, and then handles the response.


              Conclusion

              By leveraging these AWS services, you can create a powerful, scalable, and cost-effective serverless AI bot that can handle a wide range of tasks.

              While we’ve created a single API endpoint, this doesn’t limit the bot to a single point of access. The RESTful nature of our API allows it to be easily integrated with various clients and services.

              You can connect it to multiple clients, such as other APIs or Slack, or even create custom plugins for Visual Studio Code, IntelliJ, etc, to interact with the bot directly from your IDE and improve your development skills without the distractions of going into the web browser and searching for information.

              Creating a centralised bot with multiple access points can significantly enhance productivity, streamline workflows, and provide consistent, intelligent assistance across your organisation.As AI and serverless technologies continue to evolve, the bot can easily adapt and improve, combining the best of cloud computing and artificial intelligence to create a powerful tool.



              Resources: Images and others: Serverless AI bot


              Scroll to Top