用 Ruby 构建 AI Agent › 工具调用 - Rei

在上一节消息循环中我们实现了最小可用的 Chat CLI 脚本，这个脚本能力非常有限，只能通过文本回答用户的问题。这一节我们会赋予 Chat CLI 调用工具的能力，让它能获取外界信息和实际执行任务。

原理

OpenAI API 提供了 Tool Call （工具调用）功能，通过在对话参数中加入工具的定义，AI 会自己决定是否需要调用工具，获取额外信息或者执行操作。AI Agent 的作者需要提供工具定义和工具实现。

流程如下：

Agent 请求时带上可用的 Tools 定义。
LLM 根据需要，返回 Tool Call 指令。
Agent 接收到 Tool Call 指令后，决定是否执行以及如何执行。
Agent 执行 Tool 后，根据结果创建 message。
Agent 发起新一轮请求，将包含 Tool Call 结果的 message 加到对话中。
LLM 根据需要，输出结果结束对话，或者返回更多的 Tool Call 指令。

API 示例

在发起请求时，在请求内容里面加上 Tools 的定义：

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "北京今天天气怎么样？"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "获取指定城市的当前天气",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {
                "type": "string",
                "description": "城市名称，例如：北京"
              },
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

其中：

"tools" 定义可用工具，使用 JSON Schema 格式。
"name" 定义工具的名称。
"description" 定义工具的用途。
"parameters" 定义了工具的参数格式。
"tool_choice" 表示让 LLM 自行决定何时使用 Tool。

收到请求后，API 会返回这样的内容：

{
  "id": "chatcmpl-abc123example",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"北京\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 78,
    "completion_tokens": 20,
    "total_tokens": 98
  }
}

在上面的例子中，LLM 没有直接回答用户的提问，而是返回了 "tool_calls"，告诉用户需要调用这个 Tool 才能继续回答问题。

Agent 在收到 "tool_calls" 的返回时，决定是否调用相应的 Tool，然后把结果作为消息，发起下一轮对话。

例如，在调用 get_weather 工具后（由 Agent 实现），再次请求 API：

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "北京今天天气怎么样？"},
      {
        "role": "assistant",
        "content": null,
        "tool_calls": [{
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"city\":\"北京\"}"
          }
        }]
      },
      {
        "role": "tool",
        "tool_call_id": "call_abc123",
        "content": "当前北京的天气是晴朗，温度25摄氏度。"
      }
    ],
    "tools": [
      // 同上
    ]
  }'

其中：

前两条 message 是之前的对话历史。
第三条 "role": "tool" 消息用来提交工具调用结果。
调用结果的 "content" 没有格式要求，只要是纯文本，LLM 对输入非常宽容。

在接收到 Tool Call 结果后，API 会返回以下内容：

{
  "id": "chatcmpl-xyz789example",
  "object": "chat.completion",
  "created": 1712345680,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "北京今天天气晴朗，气温18℃。是个适合外出活动的好天气！"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 18,
    "total_tokens": 138
  }
}

到此，一轮提问->工具调用->提交结果->解决问题的对话结束。

通过工具调用，LLM 获取了模型本身不自带的外部信息，更好的回答了用户问题。

Ruby code

接下来修改我们的 Chat CLI，实现上面的工具调用流程。

首先添加 get_weather 的工具定义：

TOOLS = [
  {
    name: "get_weather",
    schema: {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "获取指定城市的当前天气",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "城市名称，例如：北京"
            }
          },
          "required": ["city"]
        }
      }
    },
    run: lambda do |args|
      city = args["city"]
      # 这里可以替换为实际的天气API调用，当前返回模拟数据
      "当前#{city}的天气是晴朗，温度25°C。"
    end
  }
]

添加一个方法用来处理 tool_calls：

def execute_tool_calls(tool_calls)
  tool_calls.map do |tool_call|
    # 根据工具调用信息找到对应的工具
    tool = TOOLS.find { |t| t[:name] == tool_call[:function][:name] }
    next unless tool

    # 输出工具调用信息
    puts "[Tool call] #{tool[:name]} with arguments: #{tool_call[:function][:arguments]}"

    # 执行工具并获取结果
    args = JSON.parse(tool_call[:function][:arguments])
    result = tool[:run].call(args)
    # 将工具结果以特定格式追加到消息历史中，供模型后续使用
    {
      role: "tool",
      tool_call_id: tool_call[:id],
      content: result
    }
  end.compact
end

在请求 API 时添加 Tools 参数和 Tools 处理流程：

  # 内层循环：持续将工具结果回传给模型，直到模型返回文本回复
  loop do
    response = openai.chat.completions.create(
      messages: messages,
      tools: TOOLS.map { |t| t[:schema] }
    )

    choice = response[:choices][0]
    message = choice[:message]

    if message[:tool_calls]
      # 将工具调用消息存入历史
      messages << message

      # 执行工具并将结果追加到历史
      tool_results = execute_tool_calls(message[:tool_calls])
      messages.concat(tool_results)

      # 继续循环，让模型基于工具结果生成最终回复
    else
      content = message[:content] || ""
      puts "AI> #{content}"
      messages << message
      break
    end
  end

完整的代码放在文末。

执行脚本，询问关于天气的问题，可以看到 Agent 调用了工具，然后 LLM 根据工具结果提供了更完善的回答：

Me> 北京天气如何
[Tool call] get_weather with arguments: {"city": "北京"}
AI> 北京的天气情况如下：

- **天气状况**：☀️ 晴朗
- **当前温度**：🌡️ 25°C

天气不错，适合外出活动哦！请问还有其他需要帮忙的吗？

安全问题

前面的例子使用了一个虚拟的获取天气信息的工具例子，这看起来作用有限。你可以给自己的 Agent 提供更强大的工具，例如：

网络访问和网络搜索。
读写本地文件。
执行 Shell 命令

随着工具增加，Agent 的能力会更加强大，但随之而来会遇到安全问题：

无休止的执行 tool call，大量消耗 token。
在用户不知情的情况下泄露隐私信息。
执行危险代码，例如删除重要文件。
外部的提示词注入。

针对这些问题，有一些常见的预防方案：

限制最大的 tool call 循环次数。
使用工具前需要获得用户批准。
在沙盒环境中运行 Agent。
只和可信任的外部系统进行交互。

但是 AI Agent 还是一个较新的领域，安全问题和对应方案会不断更新，我们需要不断学习相关知识。

小结

这一节我们通过一个获取天气信息的例子学习了 AI Agent 如何进行工具调用，从而扩展 Agent 的能力。

下一节我们将学习系统提示词（暂定）。

完整代码

require "bundler/inline"
require "readline"
require "json"

gemfile do
  source "https://rubygems.org"

  gem "openai"
end

openai = OpenAI::Client.new(
  api_key: ENV["OPENAI_API_KEY"],
)

TOOLS = [
  {
    name: "get_weather",
    schema: {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "获取指定城市的当前天气",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "城市名称，例如：北京"
            }
          },
          "required": ["city"]
        }
      }
    },
    run: lambda do |args|
      city = args["city"]
      # 这里可以替换为实际的天气API调用，当前返回模拟数据
      "当前#{city}的天气是晴朗，温度25°C。"
    end
  }
]

def execute_tool_calls(tool_calls)
  tool_calls.map do |tool_call|
    # 根据工具调用信息找到对应的工具
    tool = TOOLS.find { |t| t[:name] == tool_call[:function][:name] }
    next unless tool

    # 输出工具调用信息
    puts "[Tool call] #{tool[:name]} with arguments: #{tool_call[:function][:arguments]}"

    # 执行工具并获取结果
    args = JSON.parse(tool_call[:function][:arguments])
    result = tool[:run].call(args)
    # 将工具结果以特定格式追加到消息历史中，供模型后续使用
    {
      role: "tool",
      tool_call_id: tool_call[:id],
      content: result
    }
  end.compact
end

messages = []

loop do
  input = Readline.readline("Me> ")

  messages << { role: "user", content: input }

  # 内层循环：持续将工具结果回传给模型，直到模型返回文本回复
  loop do
    response = openai.chat.completions.create(
      messages: messages,
      tools: TOOLS.map { |t| t[:schema] }
    )

    choice = response[:choices][0]
    message = choice[:message]

    if message[:tool_calls]
      # 将助手的工具调用消息存入历史
      messages << message

      # 执行工具并将结果追加到历史
      tool_results = execute_tool_calls(message[:tool_calls])
      messages.concat(tool_results)

      # 继续循环，让模型基于工具结果生成最终回复
    else
      content = message[:content] || ""
      puts "AI> #{content}"
      messages << message
      break
    end
  end
end