AI Agent 面 —— 函数调用 Function Call 篇

来自：AiGC面试宝典

Just do it! 2024年05月17日 23:15

函数调用 Function Call 篇

一、为什么需要函数调用(function call)?
二、什么是函数调用(function call)?
三、函数调用(function-call)的目的是什么?
四、怎么样使用函数调用?
五、如何使用API完成函数调用?
- 5.1 如何使用函数调用(function-call)构建新闻机器人?

一、为什么需要函数调用(function call)?

模拟一个场景：你创建一个人工智能助手，你可以这样对他说：“帮我买北京的火车票，提前一天提醒我”！如果是 ChatGPT 的话，它必然会无情地拒绝你。

[图片描述]：一张聊天界面截图。
用户输入：“帮我买北京的火车票，提前一天提醒我”
AI回复（绿色图标）：很抱歉，我无法为您购买火车票，因为我是一个文本模型，无法执行在线购买或提醒功能。但是，我可以告诉您如何自己购买火车票并设置提醒。

ChatGPT 是世界上最强大的模型，不过，它虽然知道你想让它帮你买票，但它却不懂如何买票，它能力的上限就摆在那儿了。好在 OpenAI 在 GPT 模型引入了一个强大的功能——函数调用(function call)。

📝通俗解释：想象一下，ChatGPT 就像一个知识渊博但不会动手的学者，它能回答很多问题，但如果你让它帮你订票、查天气、控制家电，它就做不到了。函数调用就像是给这个学者配备了一套工具，让它能指挥其他工具帮你做事。

相信有些小伙伴应该已经用过 ChatGPT 的 Plugins 功能，Plugins 的功能有不少是基于 function call 进行的。

今天，我们一起来创建一个获取最新新闻的 GPT，给大家展示如何使用 function call，用于深入理解函数调用的概念以及它给我们带来的可能性。

二、什么是函数调用(function call)?

函数调用(function call) 是 OpenAI GPT-4-0613 和 GPT-3.5 Turbo-0613 模型支持的功能，这两个模型经过训练，可以根据用户的提示检测需要不需要调用用户提供的函数，并且用一个很规范的结构返回，而不是直接返回常规的文本。

📝通俗解释：普通的对话是用户说一句话，AI直接回答。函数调用则是用户说一句话后，AI会先判断需不需要"叫帮手"，如果需要，就用一个标准格式告诉开发者它想调用哪个函数、传什么参数。

函数调用(function)允许 ChatGPT 和其他的系统进行信息的交互，让 ChatGPT 回答他们原本无法回答的问题。例如，我们需要查询实时的天气，这些数据是 ChatGPT 没有的，所以 ChatGPT 需要从别的平台获取最新的天气情况。换句话就是说，函数调用，就是提供了一种方式，教 AI 模型怎么样和外面的系统进行交互。

📝通俗解释：函数调用相当于给 AI 提供了一个"万能遥控器"，让它能够指挥外部系统（如天气服务、订票系统、数据库等）获取真实世界的信息或执行实际操作。

三、函数调用(function-call)的目的是什么?

函数调用，可以用来增强 GPT 模型的功能，让 GPT 能做到更多事情。其实还有两种方式可以增强 GPT 模型的能力。

📝通俗解释：GPT 本身是一个大脑，但它的知识有截止日期，而且不会自己上网查资料或操作其他软件。函数调用就是给这个大脑配上"手"和"眼睛"，让它能接触外部世界。

微调：提供标注数据进一步训练模型，不过微调需要很多时间和精力准备训练数据。
📝通俗解释：就像请一个老师傅手把手教徒弟，通过大量例子让模型学会某种特定技能，但需要很多训练数据和计算资源。
嵌入：构建机器人知识库，通过构建上下文和知识库的内容进行关联，从而让 GPT 获得更丰富的回答。
📝通俗解释：就像给 AI 一本百科全书当参考资料，让它在回答问题时能查到更专业的内容，但这些知识是静态的，不能实时更新。

函数调用是第三种扩展 GPT 功能的方式，这种方式和其他两种不一样，它可以让我们和外部的系统进行交互！

📝通俗解释：前两种方法是在"喂知识"给 AI，第三种方法则是给 AI 一个"打电话"的能力，让它需要什么信息时可以自己打电话去问外部系统，而且是实时的！

四、怎么样使用函数调用？

在没有函数调用(function-call)时候，我们调用 GPT 构建 AI 应用的模式非常简单。主要有几个步骤：

用户(client)发请求给我们的服务(chat server)；
我们的服务(chat server)给 GPT 提示词；
重复执行；

[图表描述：这是一个简单的交互时序图]
Client 发送消息 "what's weather like today?" 给 Chat Server。
Chat Server 发送 Prompt: "what's weather like today?" 给 GPT API。
GPT API 返回 Response: "I don't know..." 给 Chat Server。
Chat Server 返回 "I don't know..." 给 Client。

📝通俗解释：这就是最基础的"问答模式"，用户问什么，AI就答什么。但AI不知道实时信息，所以只能说"我不知道"。

有了函数调用(function-call)，调用的方式就比之前的复杂一些了，具体的步骤：

发送用户的提示词以及可以调用的函数；
GPT 模型根据用户的提示词，判断是用普通文本还是函数调用的格式响应；
如果是函数调用格式，那么 Chat Server 就会执行这个函数，并且将结果返回给 GPT；
然后模型使用提供的数据，用连贯的文本响应。

[图表描述：这是一个包含函数调用的复杂交互时序图]
Client 发送消息 "what's weather like today?" 给 Chat Server。
Chat Server 发送 Prompt: "what's weather like today?" 以及 Available function: 1. get_location() 2. get_weather(location) 给 GPT API。
GPT API 返回调用指令 "Call get_weather(location)" 给 Chat Server。
Chat Server 执行函数调用。
Chat Server 发送 Result: "Raining" 给 GPT API。
GPT API 返回 Response: "It is raining today, don't forget your umbrella." 给 Chat Server。
Chat Server 返回 "It is raining today, don't forget your umbrella." 给 Client。

📝通俗解释：这个过程就像一个分工明确的团队：用户提需求 → AI 判断需要什么工具 → 服务器去执行工具 → 把结果交给 AI → AI 把结果翻译成人话告诉用户。AI 在这个过程中扮演的是"指挥官"和"翻译官"的角色。

五、如何使用 API 完成函数调用？

在调用GPT接口时，我们一般使用的是 Completions接口，这个接口发送的是 POST 请求，发送的数据格式如下图所示：

json

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What's weather like today?"
    }
  ]
}

GPT返回的可能是下面这些内容:

json

{
  "id": "chatcmpl-FWVo3hYwjrpAptzepU46JamvvgBzb",
  "object": "chat.completion",
  "created": 1687983115,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm sorry, but I don't have access to real-time information, including current weather conditions or forecasts. To find out the weather for your location today, I recommend using a weather website, app, or a voice-activated assistant like Siri, Google Assistant, or Alexa. Simply ask one of these services for the weather in your area, and they should be able to provide you with up-to-date information."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 44,
    "total_tokens": 59
  }
}

📝通俗解释：这个返回结果说明 GPT 只能诚实地告诉你它不知道实时天气，因为它训练数据有截止日期，不会自己上网查。

在这里，为了记录上下文，我们需要在每个请求上，将整个消息历史记录返回给 API。例如，如果我们想继续讨论之前的问题，那么相应的 JSON 应该是：

json

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "How many planets does the solar system have?"
    },
    {
      "role": "assistant",
      "content": "I'm sorry, but I don't have access to real-time information..."
    },
    {
      "role": "user",
      "content": "how do I access these apps?"
    }
  ]
}

📝通俗解释：这就相当于把对话记录都发给 AI，让它记得之前聊过什么。这样对话才能连贯，不会每句话都从头开始。

在上面的例子中，我们明确知道了，ChatGPT 查不了实时信息，接下来，我们会加上 function call，让 ChatGPT 可以查询实时信息。

json

{
  "model": "gpt-3.5-turbo-0613",
  "messages": [
    {
      "role": "user",
      "content": "How is the weather in NYC?"
    }
  ],
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
}

📝通俗解释：这个请求里多了一个 functions 字段，告诉 GPT "我给你准备了一个查天气的工具，你如果需要可以调用它"。函数定义里说明了工具的名字、用途和参数。

当GPT模型决定调用我们提供的函数，那么我们就会收到下面类似的返回：

json

{
  "id": "chatcmpl-7WWG94C1DCF1Ak5xmUwrZ900hFn0q",
  "object": "chat.completion",
  "created": 1687984857,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"New York, NY\"\n}"
        }
      }
    }
  ],
  "finish_reason": "function_call"
}

📝通俗解释：这次返回不一样了！GPT 不再直接回答，而是说"我需要调用 get_current_weather 函数，参数是 New York, NY"。这等于是一个任务分配，开发者需要去执行这个函数。

get_current_weather 将会使用返回的参数调用。OpenAI不执行该函数，我们的服务会执行这个函数，并且获取结果后解析返回给OpenAI。

📝通俗解释：这里有个关键点——AI 只是发出指令，真正执行代码的是开发者写的服务器程序。AI 不会自己查天气，要靠程序员写的代码去查，然后把结果喂给 AI。

一旦我们检索到天气数据，我们就会使用一种名为 function 的新的角色将其发送回模型。例如：

json

{
  "model": "gpt-3.5-turbo-0613",
  "messages": [
    {
      "role": "user",
      "content": "How is the weather in NYC?"
    },
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "get_current_weather",
        "arguments": "{\n  \"location\": \"New York, NY\"\n}"
      }
    },
    {
      "role": "function",
      "name": "get_current_weather",
      "content": "Temperature: 57F, Condition: Raining"
    }
  ],
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
}

📝通俗解释：这里我们把查询结果"57华氏度，下雨"以 function 角色的形式发回去。GPT 收到这个结果后，就能组织语言回答用户了。

这里注意下，我们将整个消息历史记录传递给了 API，包括原始提示词、模型的函数调用以及代码中执行天气函数的结果，这种方式可以让模型能够理解调用函数的上下文。

📝通俗解释：这就像给 AI 一个完整的"任务日志"，让它知道用户问了什么、自己做了什么决定、执行了什么操作、得到了什么结果。这样它才能给出连贯的回答。

最后，模型可能会回复一个格式正确的答案，回答我们最初的问题：

json

{
  "id": "chatcmpl-7WWQUccvLUfjhbIcuvFrj2MDJVEiN",
  "object": "chat.completion",
  "created": 1687985498,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in New York City is currently raining with a temperature of 57 degrees Fahrenheit."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 119,
    "completion_tokens": 19,
    "total_tokens": 138
  }
}

📝通俗解释：大功告成！AI 终于能告诉你"纽约现在下雨，气温57华氏度"了。整个过程就是：用户问 → AI 判断要查天气 → 服务器去查 → 把结果给 AI → AI 组织语言回答。

以上，就是 Function Call 在调用过程中交互数据的格式，接下来，我们使用实际的例子，使用 python 开发 function call 的简单应用。

5.1 如何使用函数调用(function-call) 构建新闻机器人？

为了实现这个功能，我们需要 OpenAI 的 key，key 的获取可以在 https://platform.openai.com/account/api-keys 获取。获取 key 之后，我们需要安装一些 python 依赖包：

bash

$ pip install openai tiktoken requests

我们需要导入一些依赖库：

python

import openai
import tiktoken
import json
import os
import requests

📝通俗解释：这些库分别是：openai（调用GPT接口）、tiktoken（计算token数量，防止超过限制）、json（处理数据格式）、requests（用来请求新闻API）。

接下来，将定义几个常量：

python

llm_model = "gpt-3.5-turbo-16k"           # 指定 GPT 模型
llm_max_tokens = 15500                     # 最大 token 数量限制
llm_system_prompt = "You are an assistant that provides news and headlines to user requests. Always try to get the latest breaking stories using the available function calls."
encoding_model_messages = "gpt-3.5-turbo-0613"  # 用于消息的编码模型
encoding_model_strings = "cl100k_base"          # 用于字符串的编码模型
function_call_limit = 3                         # 调用的函数的最大数量
openai.api_key = "sk-xxx"                       # 需要设置你的 OpenAI API Key
zsxq_cookie = "zsxq_access_token=xxx"           # 知识星球的 cookie（可选）
news_key = ""                                   # 新闻 API 的 Key

📝通俗解释：这些是配置参数。token 限制很重要，因为 GPT 有处理长度限制，超过就会报错。function_call_limit 是防止 AI 进入无限调用函数的死循环。

所有 GPT 模型都有 token 限制。如果超过此限制，API 将抛出错误而不是响应我们的请求。因此，我们需要一个函数来计算 token 的数量：

python

def num_tokens_from_messages(messages):
    """Returns the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(encoding_model_messages)
    except KeyError:
        encoding = tiktoken.get_encoding(encoding_model_strings)

    num_tokens = 0
    for message in messages:
        num_tokens += 4
        for key, value in message.items():
            num_tokens += len(encoding.encode(str(value)))
            if key == "name":
                num_tokens += -1
        num_tokens += 2
    return num_tokens

📝通俗解释：这个函数用来数一数我们发了多少"token"。GPT 是按 token 收费和计算的，如果超过限制就要删掉一些旧消息，就像微信聊天记录太多要清理一样。

现在我们需要有一个函数来获取新闻，我们可以在 https://newsapi.org/ 获取查询新闻的 KEY：

python

def get_top_headlines(query: str = None, country: str = None, category: str = None):
    """Retrieve top headlines from newsapi.org (API key required)"""

    base_url = "https://newsapi.org/v2/top-headlines"
    headers = {
        "x-api-key": news_key
    }
    params = {"category": "general"}
    if query is not None:
        params['q'] = query
    if country is not None:
        params['country'] = country
    if category is not None:
        params['category'] = category

    # Fetch from newsapi.org - reference: https://newsapi.org/docs/endpoints/top-headlines
    response = requests.get(base_url, params=params, headers=headers)
    data = response.json()

    if data['status'] == 'ok':
        print(f"Processing {data['totalResults']} articles from newsapi.org")
        return json.dumps(data['articles'])
    else:
        print("Request failed with message:", data['message'])
        return 'No articles found'

📝通俗解释：这个函数就是实际去新闻网站抓数据的"工具人"。它接受三个参数：搜索词、国家、类别，然后调用新闻 API 返回结果。

为了让 GPT 模型知道我们存在 get_top_headlines 函数可以调用，我们需要用 JSON 结构描述我们的函数：

python

signature_get_top_headlines = {
    "name": "get_top_headlines",
    "description": "获取按国家和/或类别分类的头条新闻。",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "自由输入关键词或短语进行搜索。",
            },
            "country": {
                "type": "string",
                "description": "要获取头条新闻的国家的2位ISO 3166-1代码。",
            },
            "category": {
                "type": "string",
                "description": "要获取头条新闻的类别",
                "enum": ["business", "entertainment", "general", "health", "science", "sports", "technology"]
            }
        },
        "required": [],
    }
}

📝通俗解释：这个函数签名就像是给 AI 看的"说明书"，告诉它这个函数叫什么、有什么用、需要什么参数。AI 就是根据这个来决定要不要调用这个函数的。

接下来，我们将定义 complete 函数，执行和 GPT 大模型交互的任务，主要步骤为：

在消息末尾添加系统提示。这个用于添加消息的上下文
如果 token 总数超过模型的限制，则删除旧消息。
将请求发送到 GPT API。
从列表末尾删除系统消息

python

def complete(messages, function_call: str = "auto"):
    """Fetch completion from OpenAI's GPT"""

    messages.append({"role": "system", "content": llm_system_prompt})

    # delete older completions to keep conversation under token limit
    while num_tokens_from_messages(messages) >= llm_max_tokens:
        messages.pop(0)

    print('Working...')
    res = openai.ChatCompletion.create(
        model=llm_model,
        messages=messages,
        functions=[signature_get_top_headlines, signature_get_zsxq_article],
        function_call=function_call
    )

    # remove system message and append response from the LLM
    messages.pop(-1)
    response = res["choices"][0]["message"]
    messages.append(response)

    # call functions requested by the model
    if response.get("function_call"):
        function_name = response["function_call"]["name"]
        if function_name == "get_top_headlines":
            args = json.loads(response["function_call"]["arguments"])
            headlines = get_top_headlines(
                query=args.get("query"),
                country=args.get("country"),
                category=args.get("category")
            )
            messages.append({"role": "function", "name": "get_top_headlines", "content": headlines})
        elif function_name == "get_zsxq_article":
            args = json.loads(response["function_call"]["arguments"])
            articles = get_zsxq_article(query=args.get("query"))
            messages.append({"role": "function", "name": "get_zsxq_article", "content": articles})

📝通俗解释：这个函数是对话的核心逻辑：把用户消息发给 GPT → 看看 GPT 要不要调用函数 → 如果要就执行函数 → 把结果发回给 GPT → GPT 生成最终回答。这里面有个保护机制：如果 token 太多，就删掉最老的对话记录，防止报错。

最后，我们添加一个 Run 函数，循环接受我们的请求发送给 GPT API：

python

def run():
    print("\n你好，我是你的小助手，你有什么问题都可以问我噢～")
    print("你可以这样问我:\n - 告诉我最近有什么技术发现？\n - 最近的体育有什么新闻\n - 知识星球最近有什么精彩内容\n")

    messages = []
    while True:
        prompt = input("\n你想知道些什么？=> ")
        messages.append({"role": "user", "content": prompt})
        complete(messages)

        # the LLM can chain function calls, this implements a limit
        call_count = 0
        while messages[-1]['role'] == "function":
            call_count = call_count + 1
            if call_count < function_call_limit:
                complete(messages)
            else:
                complete(messages, function_call="none")

        # print last message
        print("\n\n==Response==\n")
        print(messages[-1]["content"].strip())
        print("\n==End of response==")

📝通俗解释：这就是程序的入口——一个无限循环，持续接收用户输入、调用 GPT、返回结果。call_count 限制是为了防止 AI 陷入疯狂调用函数的循环。

最后，我们可以运行我们的程序进行测试了，要注意，运行这个 python 程序的电脑一定要在国外，另外一定要设置相应的 key。国内的电脑无法调用 GPT 模型的 API：

bash

$ python client.py

📝通俗解释：因为国内网络无法直接访问 OpenAI 的服务器，所以需要通过特殊网络（代理）或者使用国内的大模型服务。

终端运行结果截图描述：

截图显示了程序运行后的交互过程：

用户输入提示：你想知道些什么？=> 最近的体育有什么新闻
系统处理信息：
- Working...
- Processing 1000 articles from newsapi.org
- Working...
系统输出响应：
- ==Response==
- 最近的体育新闻包括以下内容：
- 1. [捷克人计划夺取瓦伦西亚] - 捷克代表队计划征兹瓦伦西亚。
- 1. [博卡青年将迎战阿尔马格罗] - 瓜达拉哈拉竞技俱乐部...
- 1. [斯卡洛尼确认梅西将前往玻利维亚] - 阿根廷国家队教练宣布梅西将前往玻利维亚参加比赛。
- 1. [Patrick Lange 以银牌结束超越阿伦佐·弗罗德斯] - Patrick Lange 的努力，弗罗德诺斯获得了铜牌。
- ==End of response==

📝通俗解释：这就是一个完整的函数调用示例！用户问体育新闻 → GPT 调用新闻 API → 拿到结果 → 用自然语言总结给用户看。整个过程全自动，用户体验就像在和一个无所不知的助手聊天。

总结

函数调用(Function Call) 让 GPT 从一个"只会说话的大脑"变成了一个"能指挥行动的助手"。通过它，我们可以：

让 AI 查实时信息（天气、新闻、股票等）
让 AI 执行操作（订票、发邮件、控制智能家居等）
让 AI 连接数据库（查询企业数据、用户信息等）

这就是 AI Agent（AI 智能体）的基础能力之一！

来源：知识星球 - AiGC面试宝典2024年05月17日 23:15

AI Agent 面 —— 函数调用 Function Call 篇 ​

函数调用 Function Call 篇 ​

一、为什么需要函数调用(function call)? ​

二、什么是函数调用(function call)? ​

三、函数调用(function-call)的目的是什么? ​

四、怎么样使用函数调用？ ​

五、如何使用 API 完成函数调用？ ​

5.1 如何使用函数调用(function-call) 构建新闻机器人？ ​

AI Agent 面 —— 函数调用 Function Call 篇

函数调用 Function Call 篇

一、为什么需要函数调用(function call)?

二、什么是函数调用(function call)?

三、函数调用(function-call)的目的是什么?

四、怎么样使用函数调用？

五、如何使用 API 完成函数调用？

5.1 如何使用函数调用(function-call) 构建新闻机器人？