Andrew Ng talked a lot about AI Agent in his recent speech. He believes that using multiple Agents in a carefully designed Agent workflow can produce “next generation” level performance in existing models. Therefore, the Agent workflow built based on GPT-3.5 can perform better in applications than GPT-4.
AI tycoons all believe that AI Agent is a development trend and will have a significant impact on the way we work in the future.
So what exactly is it? What changes can it bring to me?
1. LLM, RAG and AI Agent
By comparing the uses of LLM and RAG, you can better understand AI Agent.
LLM is a large language model, which is used for linguistic logical reasoning.
RAG is equivalent to LLM + library, which solves the problem of limited content known by LLM.
LLM’s knowledge is content that has been trained in advance. If you want LLM to know more content, you can give the external content to LLM and let it understand and express it.
For example, LLM is trained with data from one year ago, so it can only know the content from one year ago. Now that Xiaomi Auto has come out, if you want it to tell relevant content, you need to give it information about Xiaomi Auto, and it can tell.
But it should be noted that these external contents are always external. After being handed over to LLM once, they will not enter LLM and will always be external contents.
The main difference between LLM and RAG is the content scope, but AI Agent is a workflow , not a level concept.
AI Agent will use the reasoning ability of LLM to break down the problem into small problems one by one, and define the relationship between these small problems, which one should be processed first, and which one should be processed later.
Then in sequence, call LLM or RAG or external tools to solve every small problem.
Finally solve the original problem.
2. Features of AI Agent
Let’s sort out the main features of AI Agent:
1) Goal-directed behavior
LLM and RAG are mainly about logical reasoning of text and generating text.
Lack of ability to set and pursue specific goals in a flexible and intelligent manner.
AI Agents can be designed with clear goals, plan, and take actions to achieve these goals.
2) Memory and status tracking
LLM and RAG have no memory, no state tracking capabilities, and each input is processed independently.
AI Agent can maintain an internal state, accumulate knowledge, and make decisions and actions based on state information.
3) Interact with the environment
LLM operates independently in the textual domain and cannot interact with the physical world.
AI Agent can connect to sensors and other devices to sense the external environment.
4) Continuous learning
The data of LLM is trained and static.
AI Agents can continuously learn and adapt their knowledge and skills as they interact with new environments and situations.
5) Multi-tasking ability
LLM is used for specific language tasks.
AI Agents can be designed as general-purpose multi-tasking systems that can fluently combine various skills such as language, reasoning, perception, and control to solve complex multi-faceted problems.
3. AI Agent Example
Let’s say you need to book a complex trip.
LLM can explain different places to visit or provide general travel tips.
RAG can find richer content about your destination.
On this basis, AI Agent can also do:
- Search flights and hotels based on budget
- Perform scheduled actions
- Add trip to calendar
- Send trip reminders
To put it simply, AI Agent goes beyond the information level and can plan, break down tasks, and actually execute tasks.
4. A clearer understanding of the advantages of AI Agent
1) Task-oriented vs. general knowledge
LLM excels at a broad range of language understanding and generation. They are like huge repositories of information.
RAG improves the performance of large language models by finding relevant information. Nonetheless, the focus remains on knowledge and text generation.
AI Agents are built with specific goals in mind. Bridging the gap between language understanding and taking action.
2) Multi-step reasoning
LLM & RAG primarily process a single input and provide a response based on it.
AI Agent can integrate multiple steps in a chain:
- Information retrieval (similar to RAG)
- Process information and make decisions
- Take actions such as sending emails, making appointments, controlling smart devices
3) Be proactive
LLM & RAG respond purely to prompt words.
AI Agent can be very proactive:
- Monitor data flow and issue alarms
- Initiate actions based on your preferences
- Adjust behavior based on accumulated knowledge
4) Integrate existing systems
LLM & RAG operate in their own environment.
AI Agent can interface with different systems and API interfaces.
For example, access mailboxes and calendars; operate databases; operate smart hardware devices.
5. Infrastructure of AI Agent
The architectural design of AI Agent usually includes:
- inference engine
The core component uses powerful large-scale language models (LLM) to understand natural language, acquire knowledge and reason to solve complex problems.
- knowledge base
Acts as the Agent’s memory bank, storing factual information, past experiences and preferences related to its tasks.
- Tool integration
Allows the Agent to interact with various software applications and services through APIs, extending its ability to manipulate and control its environment.
- sensory input
Provide the Agent with the ability to sense the surrounding environment and collect data from text, images or various sensors.
- user interface
A bridge to seamlessly communicate and collaborate with human users.
Together, these elements form an intelligent system that can solve problems autonomously.
AI Agents can analyze a problem, map out a step-by-step plan, and execute it with confidence, making them a transformative force in the world of artificial intelligence.