AWS Nova Act is automating your browser in wild new ways


AWS Nova Act is automating your browser in wild new ways

I have been talking a lot about MCP lately and your LLM agents can use that interface to act on your behalf but lets take a look at the other side of the spectrum where we can use AWS’s Nova Act to use the old fashion browser interface to browse the web and interact with websites on your behalf.

With just a few lines of code and a natural language prompt you will be able to send an AI agent off on an adventure browsing the internet on your behalf, more than just research. Creating actual transactions, like making purchases on your behalf (scary right?).

Here is their example:

 with NovaAct(
        starting_page="https://order.sweetgreen.com",
        user_data_dir=user_data_dir,
        headless=headless,
    ) as nova:
        nova.act(
            "If there is a cookie banner, close it. "
            "Click Menu at the top of the page. "
            "Click Delivery on the sidebar. "
            "Select 'Home' address. "
            f"Scroll down and click on '{order}'. "
            "Click 'Add to Bag'. "
            "If visible, click 'Continue to bag', otherwise click the bag icon. "
            "Click 'Continue to checkout'. "
            "Select a 20% tip. "
            "Click 'Place Order'."
        )

Now typing out all this for just one order seems like it would be faster to just do it yourself. The real power comes with scale. In the demo they said this runs every week meaning that it could actually save someone time… until it runs when they are on vacation.

Soon I am sure you won’t even need the code. This will likely be a standard feature in most browsers over the next few years. AI agents and browsers will likely evolve into one interface.

Now what is a better way to have your AI Agent/Browser interact with the web? I think that having a clearly defined protocol (Similar to MCP) will reduce the error rate of your AI Agents but for those cavemen that refuse to release a standardized interface optimized for AI agents this browser clicking behavior will be a likely fallback.

Circling back to how this could be used at scale with a big business I am struggling a bit for use cases.

Question for you:

Besides DDoSing your competition, what would be some real world use cases for this?

PS: If you want to learn more about tech like this check signup to get an early copy of my new eBook, The CTOs Guide To AI/ML On AWS.