Amazon Bedrock Pricing Explained
Explore Amazon Bedrock's intricate pricing, covering on-demand usage, provisioned throughput, fine-tuning, and custom model hosting to help leaders forecast and optimize costs.
Learn everything you need to know about Amazon Nova Act, a groundbreaking AI-powered tool that combines intelligent UI understanding with a Python SDK, enabling developers to create more reliable browser automation compared to traditional methods.
Leveraging our accelerators and technical experience
Browse GenAI OfferingsExplore Amazon Bedrock's intricate pricing, covering on-demand usage, provisioned throughput, fine-tuning, and custom model hosting to help leaders forecast and optimize costs.
Learn how time-tested API design principles are crucial in building robust Amazon Bedrock Agents and shaping the future of AI-powered agents.
Explore how to use prompt caching on Large Language Models (LLMs) such as Amazon Bedrock and Anthropic Claude to reduce costs and improve latency.
Caylent Catalysts™
Accelerate your generative AI initiatives with ideation sessions for use case prioritization, foundation model selection, and an assessment of your data landscape and organizational readiness.
Caylent Catalysts™
Accelerate investment and mitigate risk when developing generative AI solutions.
Guille Ojeda is a Software Architect at Caylent and a content creator. He has published 2 books, over 100 blogs, and writes a free newsletter called Simple AWS, with over 45,000 subscribers. Guille has been a developer, tech lead, cloud engineer, cloud architect, and AWS Authorized Instructor and has worked with startups, SMBs and big corporations. Now, Guille is focused on sharing that experience with others.
View Guille's articlesAutomating interactions with web applications is a common, yet often frustrating, task. While APIs offer clean integration points, many essential internal tools, legacy systems, or third-party websites lack adequate API coverage. This forces us back to crafting browser automation scripts, which are notoriously brittle, often breaking with the smallest UI change, leading to maintenance headaches and unreliable processes. Can AI offer a more robust way to interact directly with web UIs?
Amazon is exploring this space with Amazon Nova Act, currently available as a research preview. It combines a new AI model trained for UI understanding with a Python SDK, aiming to let developers build agents capable of performing actions within a web browser more reliably than traditional methods or some current AI agent approaches. The idea isn't necessarily full autonomy, but rather providing developers with better tools to automate specific, necessary browser-based workflows where APIs fall short.
This article provides a technical deep dive into the Amazon Nova Act SDK, based entirely on the information released by Amazon. We'll explore its core philosophy, particularly its emphasis on reliability, examine how to get started with the SDK, dissect the practical building blocks it offers for browser automation, look at the provided examples, and discuss the important considerations and limitations that come with using a research preview technology.
Amazon Nova Act centers around an AI model specifically trained to understand web page structures and execute actions, exposed through the nova-act Python SDK. You interact with it by giving it natural language commands to perform tasks within a browser session it controls via Playwright.
The defining characteristic highlighted by Amazon is a deliberate focus on reliability. If you've experimented with AI agents prompted with high-level goals (like "Plan a complete vacation itinerary"), you might have encountered unpredictable behavior or frequent failures, especially with complex UI interactions. Amazon acknowledges this challenge, citing accuracy rates as low as 30-60% for state-of-the-art models on such tasks.
Nova Act takes a different path. Instead of aiming for end-to-end task completion from a single high-level prompt, it's designed for developers to break down workflows into a sequence of smaller, explicit, composable commands. Each command, representing a discrete step like "click the 'Submit' button" or "select 'Option 3' from the dropdown," is passed to the SDK's act()
method. The premise is that by focusing the AI on smaller, well-defined actions, the reliability of each step, and thus the overall workflow, can be significantly improved. Why? Because this reduces ambiguity and the scope for misinterpretation by the AI model, leading to more predictable outcomes for each step compared to open-ended goals.
To this end, Amazon states they concentrated on achieving high accuracy (over 90% on internal tests) for fundamental UI actions known to trip up other models, such as interacting correctly with date pickers, dropdown menus, and popups. They also published benchmark results on ScreenSpot Web (evaluating interaction with text and visual elements based on language instructions) and GroundUI Web (evaluating interaction with various UI elements), showing competitive performance which further underscores this focus on dependable UI actuation. Early tests even showed some capability transfer to novel environments like web games, suggesting a degree of generalization in its UI understanding.
It's important to frame this correctly, however. Based on the provided information, Nova Act in its current form isn't positioned as a fully autonomous agent ready for complex, open-ended problem solving. It appears to be a tool for developers to build more reliable automation for specific, often multi-step, browser-based tasks where precision and predictability are key.
And critically, Nova Act is currently a research preview. It's experimental technology, evolving, and comes with specific caveats and limitations we'll detail later.
Before you can start automating, you need to set up your environment. You'll need MacOS or Ubuntu, and Python 3.10 or higher.
The setup involves two main steps: authentication and SDK installation.
1. Authentication: Nova Act requires an API key to authenticate your requests.
First, get a key by visiting the Nova Act portal and following the instructions to generate one. Next, set it in an environment variable named NOVA_ACT_API_KEY
.
Security Reminder: Your API key grants access to Nova Act under your account. Protect it carefully. Do not embed it directly in your code, commit it to version control, or share it. Anyone with access to your key can use Nova Act under your account. If you believe your key has been compromised, you should contact Amazon support via nova-act@amazon.com to have it deactivated and request a new one.
2. Installation: With Python 3.10+ available and the environment variable set, you can install the SDK using pip: pip install nova-act
.
A quick note: the very first time you run a script using NovaAct
, you might notice a delay of a minute or two before things start happening. This is expected. The SDK needs to download and install the necessary Playwright browser components in the background. Subsequent runs should initialize much faster, typically within a few seconds.
Let's look at the basic usage pattern through the simple example provided in the SDK documentation: adding a coffee maker to an Amazon shopping cart. This gives you a feel for how NovaAct
works in practice.
This example uses Script Mode, which is how you'd typically embed Nova Act within an automated script. The with NovaAct(...)
as n: construct conveniently manages the browser lifecycle, ensuring it starts and stops correctly.
Executing this script performs the described sequence, automating the initial steps of purchasing an item.
Now let's look at Interactive Mode, which you'd use for experimentation or step-by-step debugging. This allows you to use Nova Act within a standard Python REPL (note: the documentation mentions ipython
is not currently supported).
This mode allows you to send commands one at a time using n.act()
and observe the result in the browser before proceeding. Remember to call n.start()
after initialization and n.stop()
when you're finished to manage the browser session correctly. It's important, as the documentation highlights, not to manually interfere with the browser window while an act()
command is executing, as the agent's internal state won't reflect your manual changes.
act()
MethodNow that you've seen a basic example, let's dive deeper into the core concepts you'll need to use Nova Act effectively, starting with how to structure your prompts.
As mentioned earlier, Nova Act's design encourages a specific prompting style focused on reliability. The key is to avoid high-level, ambiguous goals and instead provide clear, sequential, step-by-step instructions. Why? This approach aims to minimize the chances of the AI misinterpreting the goal or getting lost in a complex UI, which is a common failure mode for agents given less specific instructions. By breaking the task down, you make each step more deterministic.
Prompting Guidelines:
1. Be Prescriptive and Specific: Clearly state the exact action the agent should take in the current step.
n.act("Reorder my last pizza")
n.act("Click 'Account', then 'Order History', find the most recent order from 'Pizza Place', and click the 'Reorder' button")
n.act("Check train times")
n.act("In the 'From' field enter 'Downtown Station', in the 'To' field enter 'Uptown Station', select tomorrow's date, and click 'Find Trains'")
2. Decompose Complex Tasks: Break down larger workflows into a series of distinct act()
calls. Each call should represent a logical step a human would take.
n.act("Find the highest-rated hotel in Seattle under $200 for next weekend and book it")
Adhering to this strategy of clear, decomposed steps is presented as the most effective way to build robust and maintainable browser automations using Nova Act in its current form.
The NovaAct
Class and act()
Method:
The NovaAct
class is your main tool. Initializing it (n = NovaAct(...))
sets up the Playwright-managed browser session.
The central piece of the interaction is the n.act()
method. It takes your natural language prompt as input, sends it to the Nova Act AI model along with the current state of the web page, receives back a plan of low-level browser actions (like clicks, typing sequences, scrolls), and executes that plan in the browser.
Key parameters for the act()
method that you'll often use include:
prompt
(str): Your natural language instruction for this step.max_steps
(int, default: 30): A safeguard. It limits the maximum number of individual browser interactions (clicks, key presses, etc.) the agent will attempt for a single act()
call before timing out. This helps prevent the agent getting stuck in unexpected loops.schema
(Dict[str, Any], optional): Used for structured data extraction. You provide a JSON schema definition (as a Python dictionary), and the agent will attempt to return information from the page matching that structure.timeout
(int, optional): An overall time limit in seconds for the act()
call to complete.With these fundamentals covered, let's explore the specific building blocks the SDK offers.
The Nova Act SDK provides patterns and integrates with tools to handle common, practical automation needs effectively. These building blocks allow you to move beyond simple clicks and searches towards more sophisticated automation workflows.
Often, you need to extract specific data from a page, not just interact with it. Nova Act integrates with Pydantic to make this more reliable. Instead of asking for free-form text which might be inconsistent, you define a structure for the data you need.
The process involves:
.model_json_schema()
).n.act()
with a prompt requesting the data, passing the schema via the schema
argument.ActResult
's matches_schema attribute
.True
, validating and parsing the result.parsed_response
using your Pydantic model (.model_validate()
).Why use Pydantic schemas? This approach strongly guides the AI to return information in the precise format you expect. It transforms potentially unstructured web content into validated, predictable Python objects, making the data extraction far more robust and easier to integrate into the rest of your application logic compared to parsing free-form text responses.
Consider the example for extracting book data:
For simple boolean checks, use the provided BOOL_SCHEMA
:
Remember to place data extraction prompts in separate act()
calls from those performing actions.
While one NovaAct
instance runs sequentially, you can achieve concurrency by running multiple NovaAct
instances in parallel using Python's concurrent.futures.ThreadPoolExecutor
. This is particularly good for tasks like scraping data from many URLs or performing independent checks across different web interfaces simultaneously.
The documentation describes this as creating a "browser use map-reduce". The core idea is to submit multiple independent NovaAct
tasks (each running in its own thread and controlling its own browser instance) to the executor and collect results as they complete.
For a detailed code example of this pattern (fetching book data for multiple years in parallel), please refer to the apartments_caltrain.py
sample script included in the Nova Act SDK repository. It demonstrates how to set up the ThreadPoolExecutor
, submit tasks, and collect results using as_completed
. Remember that when running in parallel, proper handling of the user_data_dir
(using the default cloning behavior) is important to ensure session isolation.
Many useful automation tasks involve sites requiring login. Since NovaAct
starts with a clean slate by default (temporary user_data_dir
), you need a way to handle authentication cookies and session state.
The user_data_dir
parameter in the NovaAct
constructor lets you specify a path to a persistent Chrome profile directory. Nova Act can then use the cookies and local storage within that profile.
The recommended way to prepare such a profile is to dedicate a directory, then use the helper script nova_act.samples.setup_chrome_user_data_dir.py
provided with the SDK. Running python -m nova_act.samples.setup_chrome_user_data_dir --user_data_dir /path/to/profile
launches a browser using that directory; you log in to your sites manually, then press Enter to save the session state.
Remember the clone_user_data_dir=True
default behavior: Nova Act copies the specified profile to a temporary location for each run. This protects your original profile and is necessary for parallel execution. Keep this enabled unless you have a specific need to work directly on the original profile with only a single NovaAct
instance.
This is critically important. Never include passwords, API keys, credit card numbers, or other sensitive data directly in the prompt
string you pass to n.act()
.
Prompts and interaction data (including potentially screenshots) might be collected by Amazon during the research preview for model improvement. Putting secrets in prompts creates an unnecessary security risk.
The secure method involves leveraging Playwright's direct interaction capabilities, which bypass the AI model for sensitive input:
n.act()
to navigate and place focus on the sensitive input field (e.g., password box).getpass.getpass()
for interactive input).Page
object via n.page
and use n.page.keyboard.type()
to directly input the sensitive string.Here's the recommended pattern:
Security Caveat Reminder: Be aware that if sensitive information typed via Playwright is visibly displayed on the screen when a subsequent n.act()
call runs, it might still be captured in screenshots collected during the preview.
If focus is tricky, try the workaround: n.act("enter '' in the password field")
followed by n.page.keyboard.type(password)
.
Nova Act does not solve CAPTCHAs. Workflows encountering them require human assistance. The suggested pattern is: detect the CAPTCHA, pause the script, prompt the user to solve it manually in the controlled browser, and then resume.
Nova Act handles standard actions like searching and downloading files:
Searching: Provide instructions to find the search field, enter text, and submit.
If needed, be more specific about submission:
File Downloads: Use Playwright's expect_download()
context manager combined with the act()
call that triggers the download.
A few other configuration points mentioned in the documentation:
NovaAct
initialization using the user_agent="MyCustomAgent/1.0"
parameter.NOVA_ACT_LOG_LEVEL
environment variable, using standard Python logging level integers (e.g., 20 for INFO, 10 for DEBUG).When your automation doesn't behave as expected, you need tools to see what went wrong. Nova Act provides two useful mechanisms:
act()
call, an HTML trace file is generated. The location of this file is printed in the console logs. Opening this file provides a step-by-step visual replay of that specific act()
command, showing screenshots and identified elements. This is extremely helpful for pinpointing where the agent deviated from your expectation. The logs_directory
parameter in the NovaAct
constructor controls where these traces are saved.record_video=True
and providing a logs_directory
when initializing NovaAct
. This allows you to watch the full workflow, which can reveal issues spanning multiple act()
calls.Using Nova Act effectively requires acknowledging its current status as experimental research preview software. This comes with important caveats and responsibilities:
Keep these Known Limitations in mind:
Pay close attention to the Important Considerations (Disclosures) provided by Amazon:
nova-act@amazon.com
for data deletion requests.Amazon encourages Providing Feedback during this preview phase. You can report bugs, suggest improvements, or share your experiences by emailing nova-act@amazon.com
. Including the session ID from logs and relevant script details is helpful for bug reports.
Reliably automating tasks through web interfaces remains a difficult challenge, especially when APIs are inadequate or just missing. Traditional UI automation scripts are pretty fragile and unreliable. Amazon Nova Act offers an early look at an alternative approach: Leveraging AI trained for UI understanding, accessible via a Python SDK designed for developers.
Its core idea, focusing on reliability through developer-guided, composable commands rather than autonomous interpretation of high-level goals, presents a different strategy given the current state of agentic computer use technology.
However, it's important to remember that Nova Act is a research preview. It's experimental, it comes with known limitations, and it requires careful handling, particularly regarding security and data privacy (remember the disclosures about data collection). It is not yet a general-purpose autonomous agent but a specialized toolkit for engineers building targeted browser automation solutions, and it's still in research preview.
Amazon hints at a longer-term vision involving more advanced training techniques like reinforcement learning to enable agents capable of more complex tasks, but it might take several months, if not a couple of years, until we reach a solid and reliable state. For now, Nova Act provides a tangible first step towards that reliability.