How an AI Agent Really Works: Loop, Tools, Memory and Planning Explained

Most people think an AI agent is a smarter chatbot. Wrong. The 4 building blocks that make an agent an agent: loop, tools, memory, planning.

Most executives think an AI agent is a smarter ChatGPT. That is like mistaking a car for a faster horse. The difference is not speed, it is a different operating principle. A chatbot answers, an agent acts, and it does so in a loop until a goal is reached. Once you have understood that difference cleanly, you can judge soberly for the first time in 2026 what an agent can really do and what is marketing promise. Here are the four building blocks that make an agent an agent: the loop, the tools, the memory and the planning. Explained in plain language, technically correct, with examples from Mittelstand daily life.

Chatbot vs agent: the one difference

Picture two employees. The first sits at the reception desk and answers questions: precise, helpful, but he never stands up. You ask, he answers, that is it. The second gets a task, sets off, opens drawers, calls someone, comes back with an interim result, notices something is missing, sets off again, and finally delivers a finished result. The first is a chatbot. The second is an agent.

Technically precise: a classic chatbot does a single pass. Input in, one answer out, done. It has no access to your systems, it cannot execute anything, and it remembers nothing once you close the tab. An agent, by contrast, runs in a loop, can trigger real actions through tools, retains intermediate state and plans its own steps. Exactly these four capabilities, loop, tools, memory and planning, are the difference. They build on each other, and they are the reason an agent can complete a task instead of just answering a question.

A concrete counter-example makes it tangible. Ask a chatbot "Does this invoice match our purchase order?", and it explains what to watch for when checking an invoice. Helpful, but it has seen neither the invoice nor the order. Give an agent the same task, and it pulls the invoice from the inbox, finds the matching order in the ERP, compares line items and amounts, finds a discrepancy and reports it back. The same sentence, an entirely different behavior. If you want to sort the surrounding terms cleanly, find them in the 7 terms every executive should know, and the bigger frame in the agentic AI crash course.

Building block 1: the agent loop

The loop is the heart of it. Everything else hangs off it. An agent does not work in a single pass, but in a loop of four phases that repeat until the goal is reached: perceive, plan, act, observe. Then from the top.

Perceive means: the agent captures the current state. What is the task, what do I already know, what is the result of my last action. Plan means: it decides what the next sensible step is. Act means: it executes that step, usually through a tool, more on that shortly. Observe means: it looks at the result of its action and carries it as a new perception into the next round. This feedback is the decisive point. The agent reacts to what actually happened, not to what it had planned beforehand.

Exactly that changes everything. A single pass can only do what is achievable in one step. A loop can adapt to reality. When a step fails, the agent sees it during observe and re-plans. When an interim result delivers new information, it builds it into the next plan. A chatbot meant to check an invoice that notices a line item is missing can only say "something is missing here". An agent can, on the next loop iteration, check in the system whether the item was cancelled, query the supplier or involve a human.

The invoice example played through the loop: round one, perceive, the agent reads the incoming invoice. Plan, it decides to find the corresponding purchase order. Act, it queries the ERP. Observe, it has the order. Round two, perceive, now invoice and order are both present. Plan, compare. Act, reconcile line items and totals. Observe, one line item deviates by 240 EUR. Round three, plan, that is a deviation above the threshold, so do not correct it yourself, escalate instead. Act, a query to the responsible person with evidence. Goal reached, loop ends. Four building blocks were in play, but the loop orchestrated them.

Building block 2: tools

A language model alone can only produce text. It cannot query anything, store anything, trigger anything. Tools are exactly what pulls the agent out of the chat window and docks it to your systems. A tool is a clearly defined action the agent can call: query a database, call an API, send an email, read a PDF, run a calculation in code, create a record in a system.

Technically it works like this: the agent gets a list of available tools with a description of what each one does and what inputs it needs. When the model decides in the plan step that it now needs information from the ERP, it does not output prose, but a structured tool call: "fetch order with number X". The system executes that call, returns the result, and the agent processes it in the next loop iteration. The model itself executes nothing, it only decides which tool with which inputs makes sense. Execution happens in a controlled way outside the model.

This separation is also security-relevant: you, as the operator, decide which tools an agent gets at all and what they are allowed to do. An agent without a send tool cannot send any mail, no matter how it argues. A large part of practical control sits exactly here. What attack surface arises in the process and how to secure it is detailed in the post on prompt injection and agent protection.

From Mittelstand reality, what tools concretely are: read access to the DMS so the agent can find contracts and documents. The ERP interface for order and stock data. The calendar and the mailbox for scheduling and correspondence tasks. SQL access to the data warehouse for reporting questions. A code execution for calculations that a language model would otherwise do unreliably. The better and cleaner the tools are connected, the more an agent can do, and the coarser they are defined, the more error-prone it becomes. Tools are not a detail, they are half the battle.

Building block 3: memory

An agent without memory is a tool for one-off tasks. An agent with memory becomes an assistant that grows more useful over time. There are two kinds, and the difference matters because it is often confused.

Short-term memory is the working memory of the running task. It is essentially the context, that is, what the agent has seen and done in the current job: the task, the steps so far, the results of the tool calls. This memory is finite. It fits into the model's context window, which as of May 2026 sits in the order of roughly 100,000 to over a million tokens for frontier models in production use (order of magnitude, as of May 2026, depending on model and tier). When a task gets very long, this memory fills up, and the agent has to summarize or discard older content. That is a real technical limit, not a detail.

Long-term memory is what persists beyond the single task. It lives outside the model, in a database or a document store, and the agent retrieves at runtime exactly what it needs. Mechanically this is the same approach as connecting knowledge via retrieval, and that comparison is worth making: the mechanics behind it are laid out in the post on RAG, fine-tuning and prompting. In long-term memory you find, for example: which suppliers typically produce which deviations, which escalation thresholds your company has set, or which corrections a human approved in the past.

Here the honest limit: an agent does not "learn" in the sense of a human who grows wiser from experience. The language model itself does not change through use. What builds up is an external store that the agent taps cleverly. That is enormously useful, but it is database mechanics, not magic. Anyone who promises "the agent remembers everything and gets better on its own" is selling an illusion. What is real: a well-built long-term memory makes the agent more consistent and reduces follow-up questions, because next time it does not start from zero.

Building block 4: planning

Planning is the ability to break a goal into a sensible sequence of steps and to adjust that sequence when reality intervenes. A simple example is not planning: "read this one mail and summarize it" is a single step. Planning begins when a goal needs several dependent steps whose order is not fixed from the outset.

In practice the agent does this in the plan step of its loop. It reasons: to create a quote, I first need the customer data, then the requested line items, then the current prices, then the margin rules, then the document. Some steps depend on each other, some do not. Good planning recognizes these dependencies and works through them in the right order. And, the more important part, it adapts: when the price step reveals that a line item is no longer available, the agent re-plans instead of marching on stubbornly.

Where planning works well as of May 2026: on clearly scoped goals with a manageable number of steps, where each step delivers a verifiable result. Invoice checking, quote creation from clear inputs, data research with a defined endpoint, report assembly. In these cases the agents are dependable in 2026.

Where planning still breaks, and this needs to be said honestly: on very long chains and on ambiguous goals. The more steps build on each other, the higher the probability that an early error propagates through the whole chain. And the fuzzier the goal ("optimize our procurement"), the more likely the agent gets lost or makes assumptions you do not share. The lesson is not "planning does not work", but "planning needs a sharply scoped goal and manageable chains". Ignore that, and you land in the usual disappointments that the post on what AI agents cannot do today covers in detail.

How the 4 building blocks work together

Let us play the four building blocks through one continuous example: an agent that creates a quote from a customer inquiry. This shows how tightly the blocks interlock, because none works alone.

The inquiry comes in by mail. The loop starts. Perceive: the agent reads the inquiry, that is its short-term memory for this task. Planning: it breaks the goal "create a quote" into steps, fetch customer data, clarify line items, pull prices, check margin, build the document. Act through tools: it asks the CRM about the customer (tool one), the ERP about availability and list prices (tool two). Observe: one line item is not available.

Now the value of the interplay shows. The loop turns another round, the planning adapts: do not abort, propose an alternative line item instead. Here long-term memory kicks in: the agent knows from earlier transactions which substitute item this customer accepted in the past. It builds the quote (tool three, document generation) and stops before the final step, the send. Because sending is an irreversible action toward the customer, so it submits the finished quote to a human for approval. Loop, tools, memory, planning, all four were in play, and the one missing block, the deliberately omitted send automation, is just as important as the four present ones.

What this means for your expectations

Once you have understood the four building blocks, a few sober consequences follow that save you expensive disappointments.

Agents need three things, otherwise they do not work: a clear goal, access to the right tools and a way to check their result. Without the goal, planning gets lost. Without the tools, the agent stays a chatbot. Without the check, the eval, you never know whether it works well. These three are not optional polish, they are the precondition.

Agents are not magic, they are orchestrated mechanics. Loop, tool calls, database store, step decomposition. That is impressively capable, but it is traceable and therefore also controllable. Anyone who treats it as magic can neither secure it nor sensibly constrain it.

Agents break on ambiguous goals, not on hard ones. A complex but clearly defined task they solve more reliably than a simple but vaguely phrased one. That is counterintuitive and the most common reason pilots disappoint. How this pattern translates into real project failures is shown in the 5 architecture failures from pilot to production and the pilot graveyard.

And the most important point: for irreversible actions, a human belongs in the loop. Transferring money, sending mails to customers, deleting records, signing contracts, these are actions you do not get back. Here human-in-the-loop is not distrust of the technology, but clean architecture. What this autonomy costs over 12 months and which recurring items sit behind it is worked through in the TCO post.

FAQ

Is an AI agent the same as GPT with plugins?

No, but the confusion is understandable, because plugins are a precursor of the tool idea. A plugin gives a chatbot access to an external function, that is the tool building block. What is missing is the loop: a plugin call is usually a single pass, whereas the agent calls tools repeatedly, observes the results and re-plans. Only loop plus tools plus memory plus planning turn a tool-capable chatbot into an agent. Plugins are one building block of four, not the whole.

Does every use case need an agent?

No, and that is one of the most important insights. Many tasks are single passes with no need for a loop: summarize a mail, classify a text, answer a question on a given document. For these an agent is overkill, a simple model call is enough and is cheaper, faster and easier to control. The rule of thumb: an agent is worth it only when a task needs several dependent steps, real actions through tools and an adaptation to interim results. Otherwise the loop is just unnecessary complexity.

How autonomous should an agent be?

As autonomous as necessary, as controlled as possible, and the line runs along reversibility. Reading and preparatory steps (fetch data, compare, build a draft) an agent can do largely autonomously. Irreversible actions toward money, customers or data inventory belong behind a human approval. Autonomy is not a switch set to "all or nothing", but a gradation per action type. Anyone who regulates this in a blanket way builds either a useless or a dangerous agent.

What is the most common thinking error?

The belief that a vaguer goal makes the agent more flexible. The opposite is true. A sharply scoped goal ("check this invoice against the order and escalate deviations above 100 EUR") an agent solves reliably. A vague goal ("take care of our invoices") lets its planning run into the void, because it cannot derive verifiable steps. Clarity is not a corset, it is the condition for loop and planning to be able to work at all.

Does the agent learn from every task?

Not in the human sense. The language model itself does not change through use. What "learns" is the external long-term memory: a store into which results, decisions and patterns are written and retrieved again next time. That makes the agent more consistent over time, but it is database mechanics, not self-improvement of the model. Anyone who promises more is overselling.

Sources:

Sentient Dynamics workshop aggregates, 40 DACH workshops 2025-2026 (headcount 80 to 4,000)
Bitkom AI study 2025 (German companies with 20+ employees: 41 percent adoption; German companies from 500 employees: 89 percent adoption)
McKinsey State of AI, November 2025
Gartner Press Release, June 2025
MIT NANDA Report 2025: "GenAI Divide: State of AI in Business 2025"

Next step: If you want to find out for a concrete process whether an agent is the right approach or a simple model call is enough, book 30 minutes via our demo page. We bring the four building blocks, an honest assessment and three questions, not a vendor deck. If you want to understand the difference from classic automation, find it in the post on AI agent vs RPA vs automation, and if you want to know why most agent projects fail, in the overview of the anti-patterns behind 40 percent of failed projects.

How an AI Agent Really Works: Loop, Tools, Memory and Planning Explained

Chatbot vs agent: the one difference

Building block 1: the agent loop

Building block 2: tools

Building block 3: memory

Building block 4: planning

How the 4 building blocks work together

What this means for your expectations

FAQ

Keep reading

AI Agent vs RPA vs Classic Automation: the Difference in 2026 (and When You Need Which)

RAG vs Fine-Tuning vs Prompting: The CTO Decision Framework 2026

AI Tools Landscape Mittelstand 2026: What Actually Runs in Production (and What Is Theater)

Once a month. Only substance.