Introduction to Large Language Models (LLMs)

model based expert system in artificial intelligence :: Article Creator

U.S. Central Command Employs Large Language Model-based Artificial Intelligence

Amidst an intense geopolitical environment and active war in the Middle East region, the U.S. Central Command (CENTCOM) is pulling in certain artificial intelligence- (AI-) based tools, namely large language models, to gain efficiencies at the operational level. CENTCOM is already seeing value from large language model applications used for code augmentation, document disclosure processing and general enterprise use.

About a year ago , Sky Moore, the command's chief technology officer (CTO) started looking into how large language models could help CENTCOM. Officials are "learning as they go" and are quickly discerning where applying large language model solutions works or does not work.

"We've discovered that there are places where we can deliver exquisite value and chip away seconds, minutes and hours of your time, and there are places where it makes absolutely no sense," Moore noted, speaking in October at NVIDIA's artificial intelligence (AI) conference in Washington, D.C. "We have had an abundance of experience with both. And we are just dipping a toe into the opportunities that large language models can provide for us."

The command's first foray last year was an example of where applying a large language model did not make sense.

"On October 7 [2023], everything changed for us with the Hamas-Israel war," Moore stated. "We saw increased attacks on our forces in Iraq and Syria. The Houthis began firing just about everything that they could into the Red Sea, disrupting maritime traffic and so on. Within a 24-hour period, the pace and operational tempo of our command changed."

Correspondingly, the command's level of necessary communications abounded. The CTO's office considered how a large language model could assist with the requirements to quickly summarize information, briefings and meetings and then push out information during and about critical events. The transfer of information was rapidly needed not only within CENTCOM, but also outside of the command, to the joint staff, the National Security Council, and other organizations.

"And so the first question that our users came back to us with was, 'Is there any way to make this easier? Can you help us with summaries?'" she said. "But the next piece of taking all of those summaries of different meetings and being able to push information into different templates for whoever might need to receive it turned out to be much harder than we initially thought it was."

The CTO's office found that the information was to be delivered to a crisis action team at a stand-up meeting and the list of who needed to present information changed every time. For an initial large language model application, the ability to consistently use a defined prompt that would provide effective summaries proved early on not to be true, Moore said.

Alexis Bonnell (l), chief information officer and director of the Digital Capabilities Directorate of the Air Force Research Laboratory, speaks to an attendee at the Air, Space and Cyber Conference in National Harbor, Maryland, September 17, 2024. Amongst other efforts, Bonnell conducted research into how AFRL and the greater Air Force could benefit from artificial intelligence, and with senior leaders, has helped usher in the use of large language model experimentation and use at a wide scale. Marc DeNofio, AFRL

However, three use cases over the past nine months have been promising—in code augmentation and generation, document disclosure processes and general office use.

With access to a specific platform, called CENTGPT, designed for the Secure Internet Protocol Router Network (SIPRnet), with the needs of CENTCOM's operating environment in mind, more gains are coming.Software programmers at the command are using the tool in code generation and augmentation, Moore shared. "What we discovered right off the bat was that our software programmers were overjoyed by the ability to have a large language model where they could put in a simple query in a particular message format to deliver a certain function," she said. "By having a large language model available on a SIPR environment, we opened the complete aperture of what our programmers could do."

CENTGPT has improved the programmers' effectiveness and, frankly, elevated their workplace contentment. "Code generation and augmentation is a good fit for a large language model tool because it can easily catch errors. You'll be very quick to notice if your code just doesn't generate the output you want," Moore explained.

The officials are also applying CENTGPT to perform machine-assisted processing for document and information disclosure.

The command generates "an inordinate" amount of documents daily, and with that activity comes a requirement for disclosure, Moore said, whether it's a U.S. Citizen asking for document disclosure through the Freedom of Information Act, for example, or the command needing to disclose certain information to a partner nation during allied operations.

"There are a variety of different requirements that come out of this," Moore continued. "And it means a huge burden on teams to sift through the trove of documents to be able to filter out what needs to be disclosed. We think that large language models can help us a lot with that 'first triage.' A large language model is not going to magically ingest all of your documents and clearly spit out the ones that can be disclosed. But what it can do is say, 'I have high confidence these documents can be disclosed.' And so machine-assisted disclosure has breathed new life for our foreign disclosure officers who have previously been drowning in documentation."

The third area where CENTCOM is applying large language models is for general office tasks. "And this one, I think, will always be growing and developing in whatever ways and where our users see fit," the CTO stated. "We now have something on the order of 500 plus folks on CENTGPT. Every hour, the number of users online at any time has increased. So it used to be like single digits, and now we're up closer to 50."

That regular use means the command's staff is starting to pick apart what adds value for them, Moore continued. She expects the tool to work overtime for many different workflows. And here, again, having the platform on a secret network, means the ability to harness common Internet-like capabilities, such as a browser search.

"If you are on a secret network and you need to [search] something, you must quickly turn to your other server," she clarified. "And if you are lucky enough that the screens are here, you must remember what you saw on one screen and then type it onto another one."

Instead, CENTGPT's query functions can act like a browser to find information quickly for users.In addition, the tool has proven helpful for summarizing large documents. "If you needed to read through a 50-page document and another 100-page document, and you wanted to be able to get really quickly the general summary of what these documents are, [it identifies] the piece that I need to read through. That alone is value add, chipping away at some of the workflow that otherwise [would] have taken hours on that time."

Moore cautioned that the output of large language models cannot be followed blindly and that the command has implemented awareness processes to reinforce this practice.

"We are really clear to our users that there is risk associated with large language models," she specified. "We force our users to sign a document before they get their own account to say, 'I acknowledge that regardless of what this model gives me, I am responsible for the outputs that I present. I am the human who is responsible for vetting the material.' But really, what we've discovered again is that our humans do that because they understand their responsibilities, and more importantly, they understand the content of what they're asking for and can vet it faster than any of us."

The CENTGPT platform is based on the Department of the Air Force's (DAF's) NIPRGPT, unveiled to reporters at the Pentagon in June. The platform is tied to applications on the U.S. Department of Defense's Nonclassified Internet Protocol Router Network (NIPRNet). It was approved for Impact Level Four security environments and is hosted on the DoD network through the Defense Information Systems Agency HPC office's cloud compute platform. It requires CAC authentication to enter the system.

The NIPRGPT platform is meant to be a "sandbox," or safe experimental platform for understanding, researching, testing and leveraging natural language models and other generative AI tools on a large scale. DAF CIO Venice Goodwine led the NIPRGPT effort along with other department officials, the Air Force Research Laboratory's (AFRL's) Chief Information Officer Alexis Bonnell, and other researchers from AFRL, including Senior Computer Scientist Collen Roller, a natural language processing engineer at the AFRL. The genesis of NIPRGPT stems from Roller's Dark Saber software platform developed at the AFRL's Information Directorate in Rome, New York.

Having a secured government capability for DoD to use a large language model was important, Roller said, speaking to reporters in June and at the October NVIDIA AI conference. "We don't have people throwing documents or information into ChatGPT anymore, where OpenAI is collecting that information," he said. "We need to be active in making solutions that are in our specific environments where we can both hold and present information."

As users learn to leverage this type of AI in a secure environment, the DAF is also using the platform to see how warfighters interact with large language models. Moreover, the platform's use will also help inform future policy, acquisition and investment decisions related to AI and large language models.

Like the results seen from CENTCOM, the DAF's goal is to bring AI into operations and day-to-day tasks, and increase human-machine comfort level and productivity. Chandra Donelson, DAF chief data and artificial intelligence officer, characterized what warfighters had already crafted with the NIPRGPT capability as "phenomenal."

Roller sees the platform, which has been out a year, already reducing workplace toil. "For the use cases that we are seeing for NIPRGPT, we see a lot of people using this for basic toil reduction tasks, the monotonous tasks," he noted. "For where I have to open up a blank Word document and start coming up with an outline for my presentation, or maybe it's the fact that I have to write a bold background paper on a topic, and I really don't want to start from nothing. By plugging in a prompt into a large language model, you really have the opportunity to advance yourself, to put yourself more in an editing role than sitting there late at night just trying to get the paper done for whatever the requirement is that you have."

One of his favorite use cases is pulling all of his daily meeting summaries into NIPRGPT to summarize "what did I do today," he said. The platform offers a simple summary that is easily digested.

"With Gen AI, there are so many amazing things that are being brought to fruition, capability-wise, with this technology," Roller noted. "This is truly going to be a game changer."

How To Architect Multi-Agent AI Systems For Complex Workflows

Gopikrishnan Anilkumar is a Principal Product Manager at Amazon, where he builds and manages AI products.

The landscape of artificial intelligence is undergoing a significant transformation. As the capabilities of large language models grow, we are beginning to see a shift away from isolated question-answering systems toward something far more powerful and flexible: multi-agent AI systems capable of orchestrating complex workflows across domains, tools and tasks.

These systems represent a fundamental evolution in how we design intelligent software. Rather than relying on a single, monolithic agent to handle every aspect of a problem, we break down the cognitive load across a network of specialized agents, each responsible for a particular role. This approach opens new possibilities for scalability, reliability and modularity—but it also introduces new layers of architectural complexity.

In this article, I explore the emerging design patterns behind multi-agent systems, the technical challenges they present and why building them requires rethinking how we approach AI system design from the ground up.

Moving from Single-Model Systems To Agent Ecosystems

Most current AI applications follow a relatively simple architecture: a user interacts with a frontend that sends prompts to a backend model, which then returns a generated response. This single-model approach is sufficient for narrow use cases such as customer support chatbots or knowledge-based Q&A systems. However, it quickly reaches its limits when we demand greater reasoning depth, multi-step decision making or interaction with external systems.

This is where the multi-agent paradigm becomes relevant. In a multi-agent system, we distribute responsibilities across multiple autonomous components—or "agents"—that can communicate, delegate tasks and collaborate toward a shared goal. For example, one agent might specialize in understanding user intent, another in planning a series of actions, another in executing those actions via tools or APIs and yet another in validating the results. This division of labor mirrors how human teams function, with different members playing distinct but interdependent roles.

The move from a monolithic intelligence model to a modular agentic system aligns clearly with principles of separation of concerns, which are foundational in systems and software architecture.

A Practical Example: Contract Review With Multi-Agent Collaboration

To understand how this looks in practice, consider a legal-tech company building an AI assistant for automated contract review. Reviewing commercial agreements involves extracting legal clauses, assessing risk, comparing versions and drafting summaries (a task far too nuanced and layered for a single prompt or monolithic LLM call).

Instead, the company designs a multi-agent system composed of several cooperating agents.

When a new contract is uploaded, a planner agent first decomposes the task into subtasks: clause extraction, risk scoring, document comparison and summary generation. It assigns each part to a specialized agent.

A dedicated extraction agent reads the contract and identifies key clauses such as termination, indemnity and payment terms. These extractions are passed to a risk analysis agent, which evaluates them against internal legal policies and regulatory templates to flag problematic or missing content. If the contract is a renewal or amendment, a comparison agent highlights all deviations from the previous version.

Next, a summary agent generates a human-readable report with highlights, compliance scores and recommended actions. Finally, a validator agent reviews the output for consistency, checks formatting and ensures nothing critical is missing before delivering the report to the legal team.

In this architecture, each agent is optimized for its role, and communication happens through structured messages or shared task states. This modular system is not only more accurate and interpretable, but also easier to maintain and scale across different contract types and jurisdictions.

Specialization, Orchestration And System Design

The contract review example demonstrates the benefits of agent specialization and role clarity. When agents are designed to perform narrow tasks—such as clause extraction, risk analysis, or validation—they can be more easily evaluated, reused and optimized. They can also be paired with tailored prompts or model variants, reducing cognitive load and improving output quality.

To support this modularity, the system must include robust orchestration. Some workflows benefit from simple chaining, where agents operate in a predefined sequence. Others require dynamic planning, where the sequence of actions depends on the document's structure or detected anomalies. Planning-based orchestration introduces flexibility but also complexity: It must account for conditional logic, mid-task replanning and fallback paths in case of uncertainty or failure.

Delegation models can add scalability by allowing asynchronous execution. In large-scale deployments—for example, reviewing hundreds of contracts in parallel—the system might spin up multiple agent clusters coordinated by a central scheduler. These execution models must be chosen carefully based on latency needs, task complexity and reliability requirements.

Engineering For Failure And Observability

As with any distributed system, multi-agent architectures must be designed for resilience. Each agent must gracefully handle timeouts, malformed input or missing data from upstream components. Errors should propagate with clear context and actionable logging. The orchestration layer must detect failure patterns and adapt—whether by retrying, skipping optional steps or escalating the task to a human reviewer.

To enable troubleshooting and continuous improvement, observability is essential. Each agent's input, output, latency and metadata should be logged. Developers should be able to reconstruct task graphs showing the sequence of agent decisions, branching logic and results. These graphs not only support debugging but also inform model evaluation, versioning and agent performance monitoring.

Conclusion: Designing Intelligence As A System

The future of AI applications lies in moving beyond single-shot generation to modular systems composed of intelligent, cooperating agents. These agents should be treated not as isolated prompts, but as components of an orchestrated system with state, responsibility and accountability.

Designing these systems requires more than model engineering. It requires system thinking: defining interfaces, handling failure, maintaining observability and building trust through structure and predictability. In multi-agent systems, the intelligence isn't just in the models—it's in the architecture.

As more enterprises adopt AI to automate sophisticated workflows, the success of these deployments will hinge on how well we can design, orchestrate and evolve these ecosystems of intelligent agents—not just how well we prompt a single model.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

The United Arab Emirates Releases A Tiny But Powerful AI Model

The United Arab Emirates has released an open source model that performs advanced reasoning as well as the best offerings from both the United States and China—one of the strongest signs so far that the nation's big investments in artificial intelligence are starting to pay off.

The new model, K2 Think, comes from researchers at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) located in the UAE's capital, Abu Dhabi. The model—one of the first so-called sovereign AI models that incorporates technical advances needed for reasoning—is being made available for free by G42, an Emirati tech conglomerate backed by Abu Dhabi's sovereign wealth funds. G42 is running the model on a cluster of Cerebras chips, an alternative to Nvidia's hardware.

K2 Think is one of the UAE's contributions to the global race to demonstrate prowess in a technology widely expected to have huge economic and geopolitical implications. The US and China are considered the dominant players in this contest. But many smaller nations, especially ones with considerable wealth to invest, are also racing to develop their own "sovereign" AI models.

K2 Think is relatively modest in size, with 32 billion parameters. It is not a complete large language model but rather a model specialized for reasoning, capable of answering complex questions through a simulated kind of deliberation rather than quickly synthesizing information to provide an output. For such tasks, the researchers say it performs on par with reasoning models from OpenAI and DeepSeek, which have more than 200 billion parameters.

"This is a technical innovation or, in my opinion, a disruption," Eric Xing, MBZUAI's president and lead AI researcher, told WIRED ahead of today's announcement.

Xing says the model demonstrates a particularly effective combination of a number of recent technical innovations. These include fine-tuning on long strings of simulated reasoning, an agentic planning process that breaks problems down in different ways, and reinforcement learning that trains the model to reach verifiably correct answers. Other innovations allow the model to be served very efficiently on Cerebras chips.

"How to make a smaller model function as well as a more powerful one—that's a lesson to learn, if other people want to learn from us," Xing said.

Xing adds that K2 Think was developed using several thousands of GPUs (he declined to give a precise number), and the final training run involved 200 to 300 chips. The plan is to incorporate K2 Think into a full LLM in the coming months. MBZUAI has open sourced the model and published a technical report that details how different innovations were combined to create it.

Other nations in the Middle East, including Saudi Arabia, are also investing heavily in AI infrastructure and research. President Donald Trump traveled to the region in May to announce numerous AI deals involving US tech companies.

The UAE's leadership has invested billions to establish itself as a strategically important research hub. The country has already revealed some cutting-edge AI research and established an outpost in Silicon Valley. The UAE has lessened its ties to China in return for access to the US silicon needed to train frontier models.

Peng Xiao, CEO of G42, and a MBZUAI board member, said in a statement: "By proving that smaller, more resourceful models can rival the largest systems, this achievement shows how Abu Dhabi is shaping the next wave of global innovation."

Search This Blog

Follow It

Autonomous AI

How to Make Money Online