Agent是如何工作的：概念及LangChain实现

Published: 26 Mar 2024 Category: llm

LLM在自主agent领域的应用

LLM（大型语言模型）在自主Agent领域的应用受到了广泛关注。你可能已经在诸如Auto-GPT、BabyAGI等流行应用中了解过它们的用法，这些应用几乎每天都层出不穷。

理解这些应用的基本原理并不复杂，因为大多数工具的工作流程大致相同。

使用代理背后的直观思路

让我们首先从高层次上理解流程，我们将根据需要引出相关的概念。

使用Agent的想法很简单。我们已经有了理解人类语言并能够相当好地进行推理的模型。这些模型包括像GPT这样的模型以及其他开源替代品。到目前为止，这些模型仅限于“讨论”如何完成事情，无论是编写构建应用程序的代码还是列出执行诸如设立慈善基金等操作的步骤。如果我们能让它们根据这些智能采取行动会怎样呢？

在软件工程的世界里，实现此类动作的传统方式是通过API（应用程序编程接口）。API可以将执行特定操作的应用或服务的能力暴露给其他软件或前端（供人类使用）。这让我们想到，如果模型也能使用这些API，它就有可能根据自己的知识采取行动。但如何让模型知道有哪些API可用，更重要的是，API能做什么呢？

你只需要告诉它！

由于大语言模型强大的自然语言理解能力，很容易向它们解释API的功能以及在何种情况下应该使用。

在这种情况下，LLM只需输出API的名称和应提供什么输入以获得期望的输出。然后，我们可以轻松设计逻辑，利用这些知识调用下游API，用正确的变量传递参数，并将输出返回给LLM进一步处理（它要么调用另一个API，要么直接以自然语言形式将结果返回给用户）。这个过程会反复进行，直到最后输出一个容易理解的最终输出。

深入探索

你可以将上述情况视为多个可独立解决的问题的组合。让我们再次回顾一下流程，这次我们会更技术地去介绍相关概念。

粗略地说，上述流程可以分解为以下组件之间的交互：

LLM模型
API
编排器

以上每个组件在LangChain（一个开源的LLM框架）中都会对应一个抽象概念。抽象只是一个接口，用于正式定义实体的责任和属性。我们将逐一了解这种结构能带来的好处。

🤖 Agent

Agent本质上是LLM模型的包装器，它可以接收需要传递给模型的输入。它们实际上是用LLM链构建的，也就是包含模型和一些附加元素（如提示模板）的管道。如果你对这些术语感到陌生，我建议你阅读LangChain关于此主题的文档。然而，对于理解Agent工作原理来说，这并非必不可少的知识，只要不超出本文其余部分的范围，你可以自由跳过。

那么，Agent有什么特别之处，为什么需要它们而不是仅仅使用LLM链呢？Agent定义了一些额外的属性，使执行变得更加简单。

通过预定义的提示词来控制模型所需的输出，这些提示词可以打包到Agent的概念中。下面是一个聊天Agent的提示词示例，它输出的格式后续可由该Agent或下游功能来进行解析。

# flake8: noqa
SYSTEM_MESSAGE_PREFIX = """Answer the following questions as best you can. You have access to the following tools:"""
FORMAT_INSTRUCTIONS = """The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are: {tool_names}

The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:

>>>
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
>>>
ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action:
>>>
$JSON_BLOB
>>>

Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question"""
SYSTEM_MESSAGE_SUFFIX = """Begin! Reminder to always use the exact characters `Final Answer` when responding."""
HUMAN_MESSAGE = "{input}\n\n{agent_scratchpad}"

包含可以参与中间步骤并负责从这些步骤构建“scratchpad”或上下文的函数，并将其与上述提示词一起传递给模型，使它能够“思考”整个过程或目标。

def get_full_inputs(
        self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
    ) -> Dict[str, Any]:
        """Create the full inputs for the LLMChain from intermediate steps."""
        thoughts = self._construct_scratchpad(intermediate_steps)
        new_inputs = {"agent_scratchpad": thoughts, "stop": self._stop}
        full_inputs = {**kwargs, **new_inputs}
        return full_inputs

上面的函数定义在Agent类中，它构建了一个将传递给模型的输入，由一系列思考的scratchpad组成，这些scratchpad是按照与这个agent特定的初始提示词相符的方式构建的。

列出特定类型的代理能够使用的工具，这有助于后续进行验证。

@classmethod
    def _validate_tools(cls, tools: Sequence[BaseTool]) -> None:
        if len(tools) != 2:
            raise ValueError(f"Exactly two tools must be specified, but got {tools}")
        tool_names = {tool.name for tool in tools}
        if tool_names != {"Lookup", "Search"}:
            raise ValueError(
                f"Tool names should be Lookup and Search, got {tool_names}"
            )

这是LangChain中一个agent定义的代码片段，它验证传递给agent的工具是否被命名为“Lookup”和“Search”，这是这种特定类型的agent所期望的。

像AgentFinish和AgentAction这样的概念是LangChain的一部分，它们与agent一起使用，帮助区分响应的类别。例如，类型为AgentFinish的响应表明agent已经得出了结论。

👨‍💻 API

这部分不需要太多解释。API使得可以通过外部服务执行各种任务。在LangChain中，有一个Tool抽象，你可以定义一个函数，它可以接收输入，内部通过任何自定义的逻辑去调用外部服务、获取输出并返回。

class BingSearchRun(BaseTool):
    """Tool that adds the capability to query the Bing search API."""

    name = "Bing Search"
    description = (
        "A wrapper around Bing Search. "
        "Useful for when you need to answer questions about current events. "
        "Input should be a search query."
    )
    api_wrapper: BingSearchAPIWrapper

    def _run(self, query: str) -> str:
        """Use the tool."""
        return self.api_wrapper.run(query)

    async def _arun(self, query: str) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("BingSearchRun does not support async")

上面的代码是Bing搜索工具。它具有以下特点：

一个run函数，包含使用工具时调用的逻辑。您可以看到它使用了一个BingSearchAPIWrapper的run函数，这是进行对Bing的API调用的地方。
每个工具都附带一个自然语言描述，说明了应该使用该工具的情况。这有助于模型确定何时选择一个特定的工具而不是另一个。ChatGPT中的插件也依赖类似的理念。

🧙 编排器

这里说的编排器是指能够控制agent、用户和工具之间执行流程的实体。这包括：

接收用户输入。
将输入传递给agent（模型），同时提供正确的输入、提示词和过去的记忆上下文。
获取输出，指示使用哪个工具以及使用什么输入。
调用所需的工具或API，用输入获取响应，并将其返回给模型。
根据工具输出将自然语言响应传递给用户。

LangChain中的AgentExecutor是我们的编排器，它来执行上述所有任务。

用你想要的agent类型以及希望agent选择使用的工具来对它进行初始化。

tools = [
    Tool(
        name = "Current Search",
        func=search.run,
        description="useful for when you need to answer questions about current events or the current state of the world. the input to this should be a single search term."
    ),
]

agent_executor = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    callback_manager=manager,
    verbose=True,
)

AgentExecutor做的事就是：

它里面的函数会去驱动agent循环地和工具进行交互。下面的代码展示了触发循环的call函数以及从agent返回输出的takenextstep函数。

def _call(self, inputs: Dict[str, str]) -> Dict[str, Any]:
        """Run text through and get agent response."""
        ...
        # We now enter the agent loop (until it returns something).
        while self._should_continue(iterations, time_elapsed):
            next_step_output = self._take_next_step(
                name_to_tool_map, color_mapping, inputs, intermediate_steps
            )


def _take_next_step(
        ...
    ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
        """Take a single step in the thought-action-observation loop.

        Override this to take control of how the agent makes and acts on choices.
        """
        # Call the LLM to see what to do.
        output = self.agent.plan(intermediate_steps, **inputs)
        # If the tool chosen is the finishing tool, then we end and return.
        if isinstance(output, AgentFinish):
            return output
        ...

保存中间步骤，以便对代理（模型）的每个请求都可以使用，agent（模型）根据这些步骤回答问题。
验证所提供的工具是否是选定的agent所支持或需要的。

总结

了解了这些概念后，你现在可以理解LangChain提供的多种agent类型，并开始尝试不同的工具，看看它们如何协同工作。

你还可以将学到的这些用于最近出现的基于agent的新应用中。如果想了解其他工具的功能和实现细节，以便理解它们的工作原理并根据需要去修改，建议去看一下LangChain的博客，里面列出了Auto-GPT、BabyAGI等流行应用的特性及一些实现细节。