Approaches to Regulating Artificial Intelligence: A Primer

ai knowledge based systems :: Article Creator

Beyond The Gang Of Four: Practical Design Patterns For Modern AI Systems

Key Takeaways

AI design patterns are repeatable solutions for common problems in modern AI-driven software, saving teams from reinventing solutions. We group them into five buckets: Prompting & Context, Responsible AI, User Experience, AI-Ops, and Optimization Patterns.

To create effective AI outputs, you must provide effective guidance, either by crafting precise prompts and/or supplying relevant context (or external knowledge) directly within your prompt.

As part of building responsible AI systems, you must reduce hallucinations, prevent inappropriate or disallowed content, mitigate biases, and ensure transparency around AI decision-making.

Well-defined UX patterns help you, the developer, handle new types of interactions in a user-friendly way to keep users engaged and satisfied and promote transparency.

You must make smart optimization choices for your system, whether redirecting traffic away from unnecessarily powerful models, caching predictable responses, batching queries near-real-time, or developing smaller specialized models.

Why Design Patterns for AI Systems?

The Gang of Four's 23 object-oriented patterns shaped how an entire generation of developers designed software. In the 2010s, cloud computing introduced patterns like publish-subscribe ("pub-sub"), microservices, event-driven workflows, and serverless models that now power most cloud based distributed-systems.

Similarly, before the current AI boom, the machine learning community had already developed "ML design patterns". When you build and deploy ML models, you face specific challenges, and patterns like Checkpointing, Feature-Stores, and Versioning have become standard practice.

Why should you care about these patterns? They help you solve known problems in standardized ways. Instead of reinventing solutions, you use a shared vocabulary. When you say "Singleton", "Pub-Sub", or "Feature Store", your team immediately understands your approach. This speeds up your development, reduces errors, and makes your systems easier to maintain.

Modern AI systems bring new challenges that neither classic software nor conventional ML patterns fully address.

For example, how do you guide model output and prevent misleading content? How do you build user experiences that help users understand, trust, and effectively use AI-powered applications? How do you manage agent interactions in multi-agent systems? How do you reduce compute costs to make your product sustainable?

Figure 1: An illustration of a well-architected modern AI-based system

To help develop a well-architected AI system as shown in Figure 1, many AI patterns have emerged across the industry. In this article, I won't invent new design patterns. Instead, I'll show you how existing patterns fit together. I organize key emerging patterns into five categories that build on each other as you scale your AI system.

Prompting & Context Patterns: For crafting effective instructions and providing relevant context to guide the model's output

Responsible AI Patterns: For ensuring ethical, fair, and trustworthy outputs

User Experience Patterns: For building intuitive interactions

AI-Ops Patterns: For managing AI at scale

Optimization Patterns: For maximizing efficiency and reducing cost

I specifically cover best practices for building user-facing AI applications using existing models, mainly accessed through API calls. While I focus on text-based interactions, you can also apply these patterns across multimodal applications. However, I deliberately don't address model training, customization, hosting, or model optimization since these typically fall outside the workflow of developers using API-based AI models. I also don't cover agentic AI systems or patterns for multi-agent interactions, as these topics deserve their own dedicated discussions.

Prompting and Context Patterns

Unlike traditional software, where you explicitly code system behavior, in modern AI systems, behavior heavily depends on the instructions and context you provide to large language models (LLMs) or large multimodal models (LMMs). To create effective AI outputs, you must provide effective guidance, either by crafting precise prompts and/or supplying relevant context (or external knowledge) directly within your prompt.

Prompting might seem trivial at first. After all, you send free-form text to a model, so what could go wrong? However, how you phrase a prompt and what context you provide can drastically change your model's behavior, and there's no compiler to catch errors or a standard library of techniques. Creating prompts that reliably and consistently produce your desired behavior becomes difficult, especially as tasks grow more complex.

If you use prompting and context patterns effectively, you can improve the model's reasoning, accuracy, consistency, and adherence to instructions. Equally important, you can create reusable prompts that generalize across models, tasks, and domains.

Let's examine four specific prompting patterns that will help you standardize and refine your approach:

Table 1: Prompting Issues and When to Apply Each Pattern

Few-Shot Prompting Pattern

Few-Shot Prompting is one of the most straightforward yet powerful prompting approaches. Without examples, your model might generate inconsistent outputs, struggle with task ambiguity, or fail to meet your specific requirements. You can solve this problem by providing the model with a handful of examples (input-output pairs) in the prompt and then providing the actual input. You are essentially providing training data on the fly. This allows the model to generalize without re-training or fine-tuning.

Let's look at a very simple example (using "GPT-4o-mini" via OpenAI's API; you can run similar prompts locally using OLLAMA or Hugging Face Transformers):

PROMPT: Classify the sentiment of the following sentences as Positive, Negative, or Neutral. Sentence: "I absolutely loved the new Batman movie!" Sentiment: Positive Sentence: "The food was okay, nothing special". Sentiment: Neutral Sentence: "I'm really disappointed by the poor customer service". Sentiment: Negative Sentence: "The book was thrilling and kept me engaged the whole time". Sentiment: RESPONSE: (GPT 4o-mini) Positive

Do the frontier models today need these few-shot examples to complete their task correctly? No - they already excel at zero-shot learning and don't need spoon-fed examples to understand basic instructions.

However, you can think of Few-Shot Prompting as a personalization tool, guiding the model to cater to your specific needs. You can:

Guide the model toward your expected output format, tone, or complexity.

Adapt models to your new scenarios or specialized tasks without fine-tuning.

Reduce hallucinations by anchoring your model's output.

Role Prompting Pattern

In Role Prompting, you instruct the model to take on a specific persona or role, which helps contextualize its responses. For example: "You are a biology professor. Answer the question in detail".

Without guidance, your model's style or assumptions may not match your needs. Role prompting can steer your model's output style (e.G., "Act as a JSON formatter"), tone (e.G., "Pretend you are an angry pirate for this response"), and/or content boundaries (e.G., "You are a legal assistant"). You can implement role prompting via system messages (in chat models) or by simply including a persona statement in your prompt. It's as simple as that - just tell your model who it should be, as illustrated in the example below.

PROMPT (without role prompting): Explain Photosynthesis in 1-2 sentences. RESPONSE: (GPT 4o-mini) Photosynthesis is the process by which plants, algae, and some bacteria convert light energy into chemical energy, producing glucose and oxygen. This occurs in the chloroplasts, using sunlight, water, and carbon dioxide. PROMPT (with role prompting): You are a primary school teacher. Explain Photosynthesis to your students in 1-2 sentences. RESPONSE: (GPT 4o-mini) Photosynthesis is how plants make their own food using sunlight, water, and air. They turn these things into sugar to grow and give off oxygen, which we breathe!

Popular AI assistants use role prompts extensively:

OpenAI's developer role (previously known as system prompt) allows you to specify how the model should behave and respond. Here is an example of role prompting from OpenAI's prompt engineering guide: "You are a helpful assistant that answers programming questions in the style of a southern belle from the southeast United States".

Similarly, Anthropic's developer guides explicitly advise you to "Use system prompts to define Claude's role and personality. This sets a strong foundation for consistent responses".

Chain-of-Thought (CoT) Pattern

If you are a software developer trying to solve a complex algorithmic problem or a software architect trying to analyze complex system bottlenecks and vulnerabilities, you will probably brainstorm various ideas with your colleagues to understand their pros and cons, break down the problem into smaller tasks, and then solve it iteratively, rather than jumping to the solution right away. In Chain-of-Thought (CoT) prompting, you encourage the model to follow a very similar process and think aloud by breaking the problem down into a step-by-step process. Especially when prompting complex tasks, CoT can prevent the model from jumping to pre-learned and often incorrect patterns prematurely.

If you're using state-of-the-art or advanced reasoning models today (e.G., Claude-3.7, OpenAI o1), you may observe step-by-step reasoning under the hood without explicit prompting. However, explicitly instructing these models to "think step by step" remains a powerful approach, especially when using older or smaller models. OpenAI and Anthropic recommend use of CoT in their prompt engineering guides.

By nudging the models to produce intermediate reasoning steps, you can:

Improve the model's logical reasoning and decision-making capabilities.

Make the model's outputs more interpretable and increase confidence in its predictions.

Diagnose errors or biases in the model's outputs.

Reduce hallucinations significantly.

Let's look at an example below:

PROMPT (with CoT): A user recently watched and enjoyed the movies 'Inception,' 'The Social Network,' and 'Blade Runner 2049.' Think step-by-step about what these movies have in common, such as themes, genres, style, or storytelling, and then recommend a few more movies or tv shows the user might like. RESPONSE: (o4-mini) Here's my quick breakdown and some picks you might enjoy: Step-by-step commonalities 1. High-concept premises – each film poses a big "what if?" (dream control, social media origin, AI-driven future). 2. Smart, twisty narratives – nonlinear or slowly-unfolding storytelling that rewards close attention. 3. Tech-oriented/social commentary – they explore how technology shapes identity, society, or reality. 4. Striking visuals & mood – strong production design, immersive cinematography, atmospheric score. Recommendations ● Memento (2000) – Nolan's early mind-bender of memory loss told in reverse. ● Ex Machina (2014) – A tense, intimate study of AI consciousness and human manipulation. ● Arrival (2016) – Thought-provoking first-contact drama where language and time intertwine. ● Black Mirror (TV series, 2011– ) – Standalone episodes examining near-future tech's dark side. Each shares that blend of brains-and-cinema you seem to love. Enjoy! Retrieval-Augmented Generation (RAG) Pattern

From startups to tech giants alike, RAG has become a standard best practice across the AI industry. Advanced models learn from vast amounts of real-world data. Ask them about history, science, or popular facts, and they'll usually answer correctly. Yet, these models have limitations. Their training cuts off at a specific date, their knowledge is general rather than specialized, and they don't have access to the newest, proprietary, or dynamically changing information.

This is precisely where RAG helps. RAG combines the model's reasoning abilities with real-time access to external knowledge (like databases, vector stores, or documents). So you get the best of both worlds.

Imagine building a chatbot for your law firm. With RAG, when a client asks about a specific legal issue, your chatbot can instantly retrieve relevant statutes and recent case summaries from your internal knowledge base, creating an accurate, well-supported response.

Figure 2: Retrieval-Augmented Generation

When building AI systems, you should consider using RAGs when:

Your model needs up-to-date information beyond the model's training cutoff date.

Your system relies on domain-specific, proprietary, or frequently updated data.

Accuracy and transparency are critical, and you must reduce hallucinations or incorrect outputs.

You want to cite or directly reference external content or knowledge bases in responses.

Responsible AI Patterns

Prompting and Context Patterns we discussed thus far can help reduce ambiguity, inconsistency, and hallucinations through better instructions and grounded context. However, you may soon notice that additional safeguards are needed to handle ethical, fairness, and safety issues. Even accurate responses can be biased, harmful, or inappropriate. This is where Responsible AI Patterns come in.

As part of building responsible AI systems, you must reduce hallucinations, prevent inappropriate or disallowed content, mitigate biases, and ensure transparency around AI decision-making. Otherwise, your AI outputs may mislead users, spread misinformation, or even create liability issues.

Techniques like RAGs, discussed earlier, already help reduce hallucinations by grounding outputs in an external context. Let us look at a few additional patterns that focuses on safety, fairness, and ethical compliance that go beyond accuracy alone.

Figure 3: Sequence Diagram Illustrating Responsible Patterns in Modern AI-based Systems

[Click here to expand image above to full-size]

Output Guardrails Pattern

Even when you do everything right, models may still produce incorrect, biased, or harmful content. You need guardrails! These are rules, checks, or interventions applied after the model generates an output. They act as your final defense to modify or block the content before it reaches users. Guardrails are particularly important for sensitive domains such as legal or medical applications.

Depending on your domain and use case, you can implement guardrails in several ways. For example, you can:

Verify outputs for ethical compliance, fairness, and accuracy using established business rules or domain guidelines.

Detect bias either through lightweight classifiers or fairness metrics.

Use ML models to detect and filter harmful multimodal AI content.

Use metrics like groundedness score to measure how well the response is "grounded" in the input or retrieved references.

Instruct the model to regenerate content with clear warnings to avoid previous errors.

Many model providers also integrate fairness and ethics checks into their own guardrail pipelines. For example, Anthropic's Claude models follow a constitutional approach where outputs are revised according to predefined ethical principles. However, having your own guardrail layer will provide a consistent experience for your users, regardless of which model or provider you use.

Model Critic Pattern

Beyond basic guardrails, you can use a dedicated fact-checking or "critic" model to verify your primary model's output. This second model can be different or the same one in a "critic" or "judge" role. It's analogous to an author or editor reviewing and correcting a draft. Even if the first pass contains hallucinations, this verification loop makes the model check its facts, reducing false information and bias in your final output.

Adding a secondary judge or critic isn't always practical without increasing system complexity, latency, or cost. However, you should definitely consider this approach for automated QA testing. Consider a scenario where your production system uses a smaller "mini" or "nano" LLM version for efficiency. You could use the larger model as a judge in your offline testing to validate accuracy and ensure responsible outputs are generated. Github Copilot, for example, uses a second LLM to evaluate its primary model.

User Experience (UX) Patterns

After stabilizing your outputs with proper prompts and guardrails, your next big concern is the user experience (UX). AI systems don't behave like traditional software systems and often produce unpredictable, open-ended content that may occasionally be wrong, slow, or confusing. Similarly, users have different expectations for these tools. For example, they might want to ask follow-up questions, refine the AI's responses, or see disclaimers when the AI isn't sure.

That's why well-defined UX patterns are essential. They help you, the developer, handle these new types of interactions in a user-friendly way to keep users engaged and satisfied and promote transparency. There are many techniques you can use to smooth these complexities, such as:

Providing clear onboarding examples

Signaling uncertainty transparently

Allowing quick edits of generated content

Enabling iterative explorations

Assisting users through suggested follow-ups

Explicitly confirming critical user intents

Let's look at a few illustrative UX patterns in detail.

Contextual Guidance Pattern

This may seem obvious, but many new AI tools launch without proper user guidance. Users often don't understand how to interact with these tools or know their capabilities and limitations. Don't assume users will immediately know how to use your tool. Lower their learning curve by providing prompt examples, contextual tips, and quick feature overviews. Show these aids at the right moment in users' journey when they need them. For instance, in Notion, pressing the spacebar in an empty page triggers writing suggestions (since users likely want to draft content), while selecting text brings up editing options like "Improve writing" or "Change tone", displayed alongside the original text for easy comparison.

Figure 4: An illustration of contextual guidance

Editable Output Pattern

With GenAI models, there is no single correct answer in many scenarios. Your best output depends on the context, application, and user preferences. Recognizing this, you should consider letting users modify or rewrite generated content. This creates a better perception of human-AI collaboration. Your tool will no longer be a black box, giving users control over their final outputs. Sometimes, this is an obvious feature (like GitHub Copilot letting users edit suggested code directly in their IDE). In other cases, it's a deliberate design choice (such as ChatGPT's canvas).

Figure 5: An illustration of the editable output pattern

Iterative Exploration Pattern

Never assume the first output will satisfy users. Include "regenerate" or "try again" buttons so users can quickly iterate. For image generation, show multiple options at once. In chatbots, allow users to refine or follow up on responses. This feedback loop helps users discover the best output without feeling stuck. Microsoft research shows that when users try many prompts, newer attempts sometimes perform worse than earlier ones – so letting them revert to previous outputs (or combine parts from different generations) significantly improves their experience.

Figure 6: An illustration of an AI video editor tool allowing iterative exploration

[Click here to expand image above to full-size]

AI-Ops Patterns

When you start putting your AI software into production, you'll face new operational challenges that traditional software doesn't have. You'll still need versioning, monitoring, and rollbacks, but now your core "logic" lives in prompts, model configurations, and generative pipelines. Additionally, GenAI outputs can be unpredictable, requiring new testing and evaluation methods.

Think of AI-Ops as DevOps specifically for modern AI systems. You're not just deploying code; you're shipping AI "knowledge" embedded in prompt-model-config combinations that might change weekly. You must manage performance and cost, track user interactions, identify regressions, and maintain reliable, available systems.

You can adopt many familiar operational tactics from traditional software, plus an entirely new set of AI-specific methods you've never needed before. Let's look at a couple of AI-specific patterns in detail (though this is just a tiny sample of the complete playbook) to understand nuances of AI-Ops.

Metrics-Driven AI-Ops Pattern

When your change goes to production, track everything: latency, token usage, user acceptance rate, and cost per call. Define success metrics that matter most for your business. It could be a daily acceptance score from user feedback or a "hallucination rate" measured by an LLM-judge pipeline. Set up alerts if those metrics dip. This data-driven approach lets you quickly detect when a new model or prompt version hurts quality. Then, you can roll back or run an A/B test to confirm. Think of metrics as your safety net in an unpredictable environment.

Prompt-Model-Config Versioning Pattern

Your AI system can fail if there are uncontrolled prompt changes, configuration tweaks, or ad-hoc model swaps. If you consider each (prompt, model, configuration) combination a "release", you can manage it like any other software build. To ensure no regressions, you must tag it with version tags, QA tests, and a golden dataset. Automated pipelines can run these test queries whenever you update a prompt, modify config settings, or switch from one model to another. If the outputs degrade according to your metrics, you revert. This discipline prevents "stealth changes" that break your UX.

In addition to AI-specific practices, you should continue standard operational best practices from traditional software development, such as:

Rigorous QA Checks: Ensuring thorough testing before deploying changes.

Regression Testing: Regularly running tests to verify new changes haven't introduced issues.

Canary Deployments: Gradually deploying new features to smaller user groups before wider release.

Rollback Strategies: Establishing clear and simple processes to revert changes quickly if metrics decline.

Backup and Fallback Systems: Having backup models or versions available in case your primary model becomes unavailable.

Here is how your typical AI-Ops workflow would look:

Figure 7: An AI-Ops workflow illustration for managing, testing, and deploying AI System changes

[Click here to expand image above to full-size]

Optimization Patterns

As your AI application grows, you'll face operational bottlenecks. API rate limits, increasing latency, and rapidly rising inference costs can quickly become significant challenges. That impressive prototype your leadership loved becomes suddenly unsustainable in production.

These problems are common but manageable if you always follow some best practices. For example, don't automatically choose the biggest, most powerful model. Ask yourself if your task can be handled faster, cheaper, or with reused outputs. Similarly, you must make smart optimization choices for your system, whether redirecting traffic away from unnecessarily powerful models, caching predictable responses, batching queries near-real-time, or developing smaller specialized models.

Let us dive into three powerful optimization patterns that you can directly implement in your AI workflows:

Prompt Caching Pattern

The fastest, cheapest LLM call is the one you don't make. Consider caching and reusing responses if your system frequently uses identical or similar prompts. This works exceptionally well for documentation assistants, customer support bots, or internal chat tools where user questions often repeat.

Even more effective is prefix caching, where you can cache the expensive part of the prompt (e.G., system instructions or few-shot examples). Amazon Bedrock (and many others) supports this feature natively and reports up to 85% latency reduction on large prompts.

Continuous Dynamic Batching Pattern

If you manage a high-volume AI system, maximizing GPU utilization and system throughput is critical for minimizing costs and efficiently scaling. If you process each query sequentially, you will underutilize your computing resources, pay more fees, and perhaps hit API limits sooner.

Instead of processing each request as soon as it arrives, consider waiting briefly, perhaps tens to hundreds of milliseconds, depending on your application's latency tolerance, to batch your incoming requests dynamically. You can then process these batches through your inference servers and LLMs. This approach can help increase your system's throughput and ensure your GPUs operate at near-optimal utilization.

While you could implement custom queuing and batching logic in bespoke systems, production-ready tools such as vLLM, NVIDIA Triton Inference Server, and AWS Bedrock offer robust, out-of-the-box solutions suitable for most use cases.

Intelligent Model Routing Pattern

Rather than indiscriminately sending every request to your largest, most expensive model, implement intelligent model routing. The idea is simple but powerful. Introduce a lightweight, preliminary model at the entry point, similar to a reverse proxy or API gateway in traditional microservices. Like a reverse proxy, this model can help with load balancing between models, caching frequent responses, and gracefully handling fallbacks. It also serves as an API gateway, intelligently routing queries to the appropriate downstream models based on the complexity or context of each request.

For common or repetitive queries, the routing model can directly pull from caches or prefetches, altogether avoiding model inference. For queries requiring moderate reasoning or domain-specific knowledge, route to specialized, cost-effective models. You should only route the most complex or ambiguous queries to your largest, general-purpose models.

Intelligent Model Routing Pattern is particularly useful if you are building general-purpose systems handling diverse queries. This pattern can balance cost-efficiency and model accuracy, ensuring each query uses precisely the computational resources it requires.

Advanced Patterns

This article explored foundational patterns that can help you incorporate best practices into different stages of AI software development. However, there are several advanced areas we intentionally didn't cover. However, I want to briefly mention three key topics with many emerging patterns and best practices, as they're becoming critically important in modern AI systems.

Fine-Tuning and Customizing Models: Sometimes, off-the-shelf models aren't enough for your specific use case, are too expensive, or require you to run on local networks or devices. This is where fine-tuning, customization, and optimizing large foundational models will benefit your use case. Common approaches include Domain-Specific Fine-Tuning, Knowledge Distillation, Low Rank Adaptation (LoRA), Mixture of Experts (MoE), and Quantization. Platforms like Hugging Face, VertexAI, and AWS Bedrock enable you to easily customize and fine-tune models.

Multi-Agent Orchestration: When tasks become too complex for a single model, consider using multiple specialized AI agents working collaboratively. Some common patterns you'll encounter include LLM-as-a-Judge, Role-Based Multi-Agent Collaboration, Reflection Loops, and Tool-Using Agents.

Agentic AI and Autonomous Systems: Arguably, one of the hottest fields today is building autonomous AI agents. Agentic systems involve models that dynamically plan, reason, and execute complex tasks independently, often using external tools and APIs. Agentic AI is a fascinating and rapidly growing domain with its own emerging best practices. It deserves a dedicated exploration beyond our scope here.

These advanced concepts are beyond our current scope. Recognizing their importance is key to keeping up with evolving trends in modern AI systems. Watch out for the ever-growing collection of innovative AI patterns, and keep adding them to your arsenal. They can help you unlock even more powerful and specialized applications!

AI Agents: From Co-pilot To Autopilot

AI is moving from "co-pilot" to "autopilot". The development of generative artificial intelligence is increasingly focused on "agentic AI": the use of AI agents that perform tasks autonomously, either within fixed parameters or to achieve goals set by the user.

Bring in the agents

AI agents are not new but they are becoming ever more sophisticated. In their basic form they are simply tools built to carry out tasks such as answering queries to a script, as chatbots do, or fetching information from the web. These functions are limited, requiring no follow-up action without further input. Such reactive AI systems operate solely on programmed responses.

More complex AI agents, with autonomy and adaptability, have also been around for a long time. They control home thermostats and automate factory processes.

This type of technology is, however, rapidly developing capabilities beyond fetching and delivering information or performing distinct tasks. AI agents

Such adaptable AI can perform increasingly complex actions in pursuit of a goal and without taking a prescribed path. Using advanced machine learning and neural networks, it can understand context, analyse and respond to dynamic situations, learn from experience and use problem-solving and reasoning to make strategic decisions.

Predictive capabilities based on historical statistical analysis add another layer, enabling AI agents to plan, automate and execute tasks as well as to make informed decisions with specific goals in mind. They carry out their tasks after being given natural language prompts and without constant user input. They can also be designed to check each other's work in an iterative process that improves quality and reliability.

Some content could not load. Check your internet connection or browser settings.

Foundations for progress

Several developments have enabled AI agents to become more complex while at the same time being easier to use. Generative AI has provided a natural language interface, broadening access to AI, especially for users who are less tech-savvy. Generative AI interprets a prompt by the user then other AI fulfils the task. Google says: "Generative AI is just one piece of the AI puzzle. Other AI technologies, like predictive AI, vision AI, and conversational AI, are crucial for building sophisticated AI agents."

Advances in computing power and memory have enabled large language models and more sophisticated machine learning. The understanding of context and the ability to plan has improved as AI systems learn more data and improve their capacity to remember interactions.

These are the foundations for AI agents, with the ease of interaction accelerating development as more users gain access. At the same time AI itself is speeding up the innovation cycle, refining its outputs and creating iterative processes at ever higher speeds.

Hype or reality?

AI agents can speed up analysis and decisions as well as taking over certain functions from employees but they still fall short of full autonomy.

Cassie Kozyrkov, the founder and chief executive of Decision Intelligence and formerly chief decision scientist at Google, says AI agents' main role in an enterprise still lies in taking over repetitive tasks with "well understood and well designed processes" that do not require "creative spin".

While there is huge potential for agentic AI to perform ever more complex tasks, Pascal Bornet, an expert in automation and author of Agentic Artificial Intelligence, points to a "significant gap" between hype and reality. Even with a clear directive, systems cannot yet perform complex tasks end to end, especially in nuanced or novel situations, without some human oversight.

That said, the field "is advancing rapidly". Bornet likens development to the progression from fully manual to fully autonomous cars, which is rated from level zero to level five. Currently, autonomous cars operate at levels two to four, depending on the environment. Automation can handle many tasks but human oversight, and occasional intervention, is needed.

AI agents are at a similar stage. Most operate at levels two or three, with some "specialised systems" reaching level four in tightly defined domains. Level five, where agents fully understand, plan and execute complex missions with minimal human input across any domain or corporate boundary, remains theoretical.

Given the challenges involved in folding capabilities into a coherent system, fully integrated multimodal agents are some way off but Bornet says the building blocks are in place. He says some applications, such as that developed for veterinarians by Pets at Home, the UK FT250 company, exemplify audio processing but multimodal systems will require a sophisticated orchestration of agents with different types of expertise.

Some content could not load. Check your internet connection or browser settings.

Functional applications

While some sectors have adopted agentic AI more than others, as covered below, it can be put to work in functions that are common to most businesses. Bornet says the opportunity is systemic. "Agentic AI isn't coming for a [single] department, it's coming for all of them. Every workflow with friction is a use case waiting to be transformed."

Currently agents are used mostly in internal roles to gain efficiency and savings rather than top-line growth. A 2025 report from UK Finance co-authored with Accenture said: "Most near-term uses involve single-agent deployments targeting productivity and efficiency gains and improvements to customer and colleague experience". The trade body found "relatively few" examples within financial services aimed at increasing sales or revenue. It also noted that most deployments were "closely monitored by an employee acting as a competent supervisor".

Across industry, AI that can reduce the time spent on mundane work to "free up" employees for more creative or skilled tasks has been adopted faster than elsewhere.

Bornet and his team have gathered data from 167 companies in various sectors that have deployed what he classifies as level three LLM-based agents in production environments. Customer service, internal operations, and sales and marketing functions have seen the highest adoption, with benefits ranging from time savings of 12 to 30 per cent in customer service, 30 to 90 per cent in internal operations and increased revenue of nine to 21 per cent for sales and marketing teams.

It should be noted that the use of AI agents alongside humans does not always enhance performance. An analysis of a customer service software company by the US National Bureau of Economic Research found that AI both improved issue resolution and cut the time taken. However it was newer staff who benefited most, with the AI electronically transferring the knowledge of experienced people. The performance of older hands did not improve.

The reverse can be true in roles that are highly skilled. Attila Kecsmar, the co-founder and chief executive of Antavo, the AI loyalty cloud programme platform, says that in more technical areas, such as programming, those who use AI without an adequate understanding of the output will struggle, while the productivity and speed of competent workers will be supercharged.

Bar chart of Primary barriers organisations face in scaling the use of gen AI (% of respondents) showing Businesses have concerns about adopting LLMs in their work

Customer service

This has been the most visible deployment of AI from a consumer perspective but feedback has been mixed. Industry proponents say how well chatbots perform but customer surveys suggest the opposite. Preferences could change as customer service agents develop and digital natives make up more of the consumer base. Better responses and 24/7 support may improve customer perceptions.

Older agents answered queries based on set scripts that quickly ran out of road, especially with complex queries. Newer agents, given their ability to remember and respond to dynamic inputs, are more responsive. They are able to act based on up to date client data as well as to recall historical interactions with customers.

With agentic AI, customer service interfaces have developed beyond dial-up chatbots. Google Gemini is behind Volkswagen's MyVW app, a virtual assistant that answers a driver's queries about their car.

Chart showing the projected increase in the proportion of technology budgets allocated to generative AI across various sectors between 2024 and 2025

Coding

The application of AI in coding is well documented. In a report by the McKinsey consultancy, Lenovo said that its engineers' speed and quality of code production improved by 10 per cent.

Kecsmar agrees that agent-supported engineers can achieve much more but says this in turn will lead to rising expectations for human productivity and performance. Given natural language interfaces, it is increasingly feasible for laypeople to write code.

This is the real revolution in agentic AI, Kozyrkov says. "Before, you had to go and get yourself schooled in the arcane arts of some new language and now you don't — you speak your mother tongue and it works."

While this presents an opportunity, she cautions that it is also one of the greatest risks in deploying AI in an enterprise. "Unfortunately the mother tongue is vague and not everybody knows when they're being ambiguous. Now you can program a machine without thinking it through, so it's hardly a surprise that you get unintended consequences."

Marketing and campaign management

As covered in our report on personalisation and marketing, AI has hugely expanded the reach of marketing departments, enabling mass communications to be targeted at ever smaller segments.

AI agents can take this further. Antavo has developed an AI agent for its brand customers which helps them to devise and communicate loyalty programmes and campaigns. It can decide an appropriate approach for a brand in any sector and analyse data and give ideas, illustrated with charts, on how to optimise and develop a programme. It can also look inwards, finding and delivering relevant information to help customer service employees resolve consumers' queries.

Human resources

AI agents can be used in hiring, scheduling meetings, retention and management, predicting turnover and identifying where training may be required.

Virtual assistants

These are capable of executing simple tasks with minimal supervision, such as scheduling meetings with clients, sending standard emails and general client communications. Claude, Anthropic's AI model, can find information from many sources in a computer so that it can complete a form.

Finance

Applications include AI systems that can make trading decisions based on real-time data analysis or systems that suggest investment strategies based on a client's profile. AI can also help with identifying fraud, flagging its suspicions in real time.

Healthcare

Autonomous diagnostic tools can identify problems using patient histories and images, recommend personalised healthcare treatments, monitor patient health and recommend or remind people about follow-up actions. AI agents can be deployed in robotic-assisted surgery to improve control and accuracy. Pattern recognition, deep learning and computer vision all enhance machines' ability to adjust surgery incisions in real time. Systems such as Philips' IntelliVue Guardian manage postsurgical complications by providing early warnings for those patients most at risk.

Law

In addition to simple and repetitive tasks such as contract drafting, agents can advise on cases. Based on analysis of historical data or judges' rulings they can predict potential outcomes to a suit and suggest arguments.

Already A&O Shearman, the international law firm, is using an AI tool created in collaboration with Harvey, a start-up. This makes use of a business's financial information to assess in which jurisdictions a client needs to file in the event of a merger. It then identifies any missing data and drafts the information requests for each party.

Manufacturing and logistics

While autonomous cars have yet to reach the mainstream, autonomous lorries are about to arrive. Aurora Innovation, which works with Volvo, Uber and FedEx in the US, plans to use 10 driverless lorries between Dallas and Houston. AI agents are also used in manufacturing for monitoring and maintaining equipment and optimising processes. They can perform quality control on both inputs and outputs with greater consistency than humans.

Retail

Beside the chatbots deployed in customer service, AI agents can be used along the supply chain to monitor and manage inventory levels based on historical data and to predict trends and demands.

Drawbacks

There are various issues that enterprises need to consider when adopting AI.

Companies operating with legacy tech or which have inadequate or inconsistent data will find it harder to make progress. Any data quality issues experienced when training agents will be exacerbated by "slop" the colloquial name for the proliferation of LLM-created content.

EY says this could be solved in part by agents sourcing information from several inputs rather than relying on static scraped data. For instance iterative AI could gather data from wearables, which would layer current and contextual data on top of historical information.

Generative AI is just one piece of the AI puzzle

Connection within and between companies is hampered by data incompatibilities as well as the inadequacies of existing application programming interfaces. Bornet says the lack of a standard protocol presents a hurdle to multi-agent systems that might otherwise cross corporate boundaries.

Kecsmar believes this problem may itself be solved by agents. "In future the agents developed around data exchange skills will be able to create their own data exchange. They will be uploaded with how their host company communicates data and they will have a tool call to interface data between different sources."

Trust is a problem in several areas, for instance in sectors where the options for reversal are limited. "'Fully automate and leave it' in the financial services industry is a terrible idea," Kozyrkov says, adding that "the golden rule of AI is that it makes mistakes". Consumers might be unwilling to let agents have autonomy over their bank accounts or credit cards. There is also a lack of trust among leaders in terms of AI performance and with workers who face the risk of replacement. Once systems can link up across business boundaries, will companies trust external agents?

Use of untrammelled AI also adds to cyber security threats by increasing points of access and the risk of unexpected actions. Kozyrkov says: "One of the top suggestions is: limit its access. Don't give it any data that you wouldn't want leaked." Granting AI the same access as a human employee dramatically increases the attack surface, meaning systems are more vulnerable.

Constraint on computing capacity is a further hurdle. Despite the investment in infrastructure the competition for stretched resources is fierce. Still, no user pays what it costs to run an AI query even in energy terms, a point raised at an FT Climate Capital Council round table last year. For companies using commercial services, current pricing is based on the number of employees — but what will happen if staff levels shrink due to AI adoption?

Companies also need to consider the ethical implications of AI adoption. Research at Cambridge university notes that — if they cannot already — agents may soon be able to predict our habits and spending patterns and influence or manipulate them, although this is likely to be of greater concern to consumers.

Accountability is another imponderable. With whom does this lie when agents are carrying out end to end tasks without human intervention, or with connections to other companies?

How to adopt AI agents

As with any new technology, it is important to identify business needs first. Bornet says the most sophisticated option is not necessarily always the best — suc

Search This Blog

Follow It

Autonomous AI

How to Make Money Online