What is an Agent? Let's stop the speculations

I see a lot of confusion and speculation around the meaning.

So here is how I understand what it is:

An Agent can define, select and run workflows.

An Agent is not a personality (very bad wording for a prompt that defines the contextual appearance of a text generation process - maybe “synthetic answer style” would be better).

But hey, I am not here to define it, I want to know. Please correct me if I am wrong.
And in the end (I’d say by the end of August) I’ll add a voting and then we take whatever comes out from this democratic process?
Sounds good?

5 Likes

Two related earlier post on this forum

and a related one on the Discourse forum

To make matters even worse, some also interchange or differentiate the word assistant with agent.

Saw a blog recently where the person compared agent and assistant to a movie star with both an assistant and agent.

Personally I am sticking with the definition from the AI book.

One of the classic books on AI “Artificial Intelligence : A Modern Approach” by Stuart J. Russell and Peter Norvig (WorldCat) gives

An agent is just something that acts (agent comes from the Latin agere, to do).

6 Likes

Sometimes I think we overcomplicate things too much.

4 Likes

The only thing worse than getting AI terminology incorrect… is policing AI terminology :rofl:

It ain’t the end of the world. A year ago, the term we were all arguing over was “AI” itself.

1 Like

This.
I mean, if we think about a human agent, what do they do? They act on someone else’s behalf. This is really the definition I’ve always gone with. I guess the bigger question to me truly is, if this is true, than does that make any model that can execute a tool suddenly an agent?
What does it mean for a model to “act”? :thinking:

Lol, we’re not policing, we’re just noticing nobody has a consensus, and are trying to see if we can find an agreed-upon definition. It’s also discussed all over the news, in the media, in startups, etc. but they never fully answer the question “okay but what is an agent?”, and now everyone is suddenly expected to just know what it means, when everybody also gives a different answer. AKA this is our attempt to describe not prescribe the meaning.

It’s also just kind of funny watching everyone flail around the definition. It’s basically this but with “agent” instead of “dogecoin/crypto” lol:

4 Likes

What’s a workflow? (rhetorical)

An agent is something or someone that acts on your behalf. An AI agent is an AI that can interact with software and services on your behalf.

That’s my understanding at least. Easy enough for a layperson to comprehend too.

1 Like

The Agents SDK defines it as:

An Agent is a Large Language Model (LLM) that has been configured with:

  • Instructions – the system prompt that tells the model who it is and how it should respond.
  • Model – which OpenAI model to call, plus any optional model tuning parameters.
  • Tools – a list of functions or APIs the LLM can invoke to accomplish a task.
5 Likes

Here’s my understanding of it:

tl;dr - this is a trust level upgrade from the earlier role of advisor to a higher agency role aka agent.

A few years ago, when GPT-3 was initially released through models such as Ada, Babbage, Curie, and Davinci, these models represented a significant advancement in artificial intelligence and demonstrated the extent of what was possible at the time. However, their primary applications were limited to tasks related to knowledge retrieval, natural language communication, translation, and similar use cases, where text gen was being used to just display content to the user. Additionally, applications at this time required substantial human intervention, supervision, and validation and also had to pass a manual review in order to go live.

As LLMs improved, becoming increasingly reliable and predictable, significant advancements such as function calling emerged. This capability further developed into tool calling and structured outputs, markedly enhancing the dependability and accuracy of model outputs. With this improved reliability and enhanced performance, there emerged a clear potential to grant these AI models greater autonomy or “agency” in performing tasks.

Consequently, the role of AI shifted from merely an advisory capacity to one embodying greater autonomy, accountability, and agency, marking the rise of the concept known as “agentic AI” or “AI agents”.

Hence IMO this is a trust level upgrade, from humans managing every aspect of AI performance to AI systems increasingly operating independently to achieve specific goals, directly linking the concepts of “agency” and “agents”.

And thus we reach the definition from the agents-sdk:

3 Likes

I define that like this:

A solution path from n problems to n solutions which can be devided into n steps which should be atomic. A step is solved by a workflow transition.

An agent can initiate a chain of agents - e.g. select a suitable workflow (Orchestrator Agent).

It can define a workflow (Planner Agent).

It can solve an atomic step e.g. write and send a mail (Worker Agent).

But there might be better suitable definition e.g. in Business Process Modeling Notation.

“Let’s stop the speculations” (receives speculations)

How about what is not an agent:

  • a language completion call that can do no more than generate language, with no iteration loop of refined task prompting or sub-tasks.
  • a preparation for the completion, such as an AI determining parameters and model to run the task against, moderations, language rewriting.
  • No ability to take real-world action or receive programmatic services on-demand.
4 Likes

I think there are a few pieces missing from the top level agents-SDK definition:

  • Control over LLM Loop - how many times can the LLM be called, what are the stopping conditions. This is very common to current implementations out there. For example in cursor agents have a “session” and can call up to 25 tool calls per session. (handoff would fit in here, parallelization would also fit)

  • Control over history - how does the LLM consume previous turns (extractive summary, llm based summary)

  • Control over context - particularly coding agents like injecting files and other bits of data into the first turn - so in many ways this could be called “context expansion”

  • Control over LLM params - temp / top_p etc… agents “personality” is defined with system/dev prompt / params and examples.

And then … how much of all of this do you need to be a Real Agent ™ ?

I guess taking this to “basic” in my mind - agent at the simplest form is

LLM+Tools+Loop

5 Likes

For insight into Human-in-the-loop (HITL).

“Digital Apollo: Human and Machine in Spaceflight” by David A. Mindell, published in 2008 (WorldCat)

Summary:As Apollo 11’s Lunar Module descended toward the moon under automatic control, a program alarm in the guidance computer’s software nearly caused a mission abort. Neil Armstrong responded by switching off the automatic mode and taking direct control. He stopped monitoring the computer and began flying the spacecraft, relying on skill to land it and earning praise for a triumph of human over machine. In Digital Apollo, engineer-historian David Mindell takes this famous moment as a starting point for an exploration of the relationship between humans and computers in the Apollo program. In each of the six Apollo landings, the astronaut in command seized control from the computer and landed with his hand on the stick. Mindell recounts the story of astronauts’ desire to control their spacecraft in parallel with the history of the Apollo Guidance Computer. From the early days of aviation through the birth of spaceflight, test pilots and astronauts sought to be more than “spam in a can” despite the automatic controls, digital computers, and software developed by engineers. Digital Apollo examines the design and execution of each of the six Apollo moon landings, drawing on transcripts and data telemetry from the flights, astronaut interviews, and NASA’s extensive archives. Mindell’s exploration of how human pilots and automated systems worked together to achieve the ultimate in flight–a lunar landing–traces and reframes the debate over the future of humans and automation in space. The results have implications for any venture in which human roles seem threatened by automated systems, whether it is the work at our desktops or the future of exploration.

Summary from WorldCat

1 Like

Whenever “Apollo Guidance Computer” is mentioned I got to add that it’s was written under the lead of Margaret Hamilton who invented parallel processing and automated tests and created the term “software engineer”.

The code is online on github.

Take a few hours and ask ChatGPT to let it explain to you. It’s worth it! :nerd_face:

I am a huge fanboy. Got a painting of her in my office.

6 Likes

OK we are going way off topic, but it is your topic.

There with you.

Do a YouTube search for the Apollo guidance computer, was amazed at some the details and years of work some have put into restoration projects.

2 Likes

You might find this relevant https://arxiv.org/pdf/2506.01438

1 Like

So he basically says there are 3 types of Agents and hybrid Agents. Hybrid? If you let an Agent solve a step by solving or creating a workflow that is an Agentic System and not an Agent anymore.

A point worth considering.

With MCP being recent and the source of much code and papers, MCP has some definitions.

  • MCP Hosts: Programs like Claude Desktop, IDEs, or AI tools that want to access data through MCP
  • MCP Clients: Protocol clients that maintain 1:1 connections with servers
  • MCP Servers: Lightweight programs that each expose specific capabilities through the standardized Model Context Protocol
  • Local Data Sources: Your computer’s files, databases, and services that MCP servers can securely access
  • Remote Services: External systems available over the internet (e.g., through APIs) that MCP servers can connect to

Sometimes when MCP is being noted, incorrect terminology is used, often compounding the confusion.

As @edwinarbus notes for OpenAI Agents SDK, there are definitions and those should be used and referenced when using OpenAI Agents SDK.


If you have ever read a Penguin math book, many come with math definitions. Often the symbols or names used are used elsewhere in math but for each book, one needs to use the definitions from the book.

e.g.

“Linear Algebra” by Jim Hefferon 4th edition (PDF)


From a question about terminology I answered on StackOverflow

What is the difference between a token and a lexeme?

While the answer is accurate and very technical, the most interesting part is actually a comment by Ira Baxter, an expert in this field (ref).

image

Even when you win, the status quo does not agree.

1 Like

MCP is just a tool-calling wrapper. Plugins/Functions: mid-2023. Didn’t need the word Agent.

I think you are likewise showing no use of “agent” anywhere in association with simply MCP for internet tool use.

Also conflating things is “reasoning”, which can be a context-building that is agentic in having dynamic prompting and resources - but not controlled by the developer.

The first paragraph in the link shows why OpenAI shouldn’t be relied on for precision in nomenclature.

**Instructions** – the system prompt (now, you might have to send “developer”, and cannot send a “prompt” parameter)
**Tools** – a list of functions (“functions” are one type of tool; you can’t place your own tools at internal “tools” level)

There is not “the” singular system prompt when OpenAI is pushing yours to be the second system message and jamming more system role messages in after tool returns..

Add to that

  • “Responses” …accepts input;
  • GPT-4-32k will be “deprecated” tomorrow, thus turned off;
  • truncation parameter will remove data from the middle;
  • “Prompts” playground, the new name after completions was removed, which did take “prompt”;
  • If you want to write a book, certainly don’t use the codex model (meaning book);
  • GPTs…don’t get me started.
  • Call Assistants where you create an assistant and receive assistant.
  • The GPT builder uses “context” method to set the instructions box, which is not the initial system message.
    :shortcake:
1 Like

Hey, I’ve been following this thread and just wanted to jump in. I think part of the confusion around “AI agents” is that the term’s being used for way too many different things. People are calling everything from prompt-engineered chatbots to scripted tool loops an agent. And sure, those setups are useful. But calling them agents feels like a stretch. It’s kind of like calling a GPS a co-pilot. Helpful, but not really making decisions.

To me, an actual AI agent has to do more than just run tools. It has to bring in what AI is uniquely good at, like understanding language, handling unstructured input, generalizing across tasks, or adjusting based on context. That’s what separates it from regular automation. If your setup could be built with plain scripts and no AI model involved, it’s probably not an agent. It’s just a smart workflow.

I also think there are a few thresholds that need to be crossed. Memory, for one, it should remember what it did and be able to learn from it. Autonomy , it should make some of its own decisions, not just follow a hardcoded path. Consequence, it should be able to actually affect the world in some way, not just talk. And coherence, it should have some kind of internal logic or consistency over time. When all that comes together, and it’s powered by actual AI, that’s when I’d call it a real agent.

Last thing, it should be responsive. Like, it should notice what’s happening around it, take in feedback, and change what it does. That feedback loop is key. Without it, it’s just a fancy script running in circles.

Anyway, not trying to gatekeep here. I just think if we keep calling everything an agent, we’re going to blur the line between automation and something way more powerful. And with how fast this space is moving, that’s a line we really want to keep clear.

Also, just being real, there are more people on YouTube explaining how to “build AI agents” than people who have actually built one that does anything useful. It’s become a content trend more than a technical breakthrough. And honestly, until we start plugging AI into real infrastructure, like city light traffic systems, energy grids, logistics networks, most of what we’re calling agents is going to feel a little underwhelming. They might look cool on a demo page, but without real-world connection and consequence, it’s just simulated agency (pun intented).