对话补全 Chat Completions

文档版本: v2.2.0 | 更新日期: 2026-05-15

1. 接口概述

对话补全（Chat Completions）是平台最核心的接口，支持多轮对话、流式输出、工具调用、视觉理解等能力。

项目	值
端点	`POST https://platform.shuyanai.com/v1/chat/completions`
Content-Type	`application/json`
鉴权	`Authorization: Bearer sk-你的API-Key`

2. 请求参数

2.1 顶层参数

参数	类型	必填	默认值	说明
`model`	string	是	—	模型标识符，如 `deepseek-v4-flash`、`glm-5.1`、`deepseek-v4-pro`
`messages`	array	是	—	消息列表，见 2.2 消息格式
`stream`	boolean	否	`false`	是否启用流式输出（SSE）
`stream_options`	object	否	`null`	流式选项。设置 `{"include_usage": true}` 可在流末尾返回用量
`temperature`	number	否	模型默认	采样温度，范围 `[0, 2]`。值越低输出越确定
`top_p`	number	否	模型默认	核采样概率，范围 `(0, 1]`
`max_tokens`	integer	否	模型默认	生成的最大 Token 数
`max_completion_tokens`	integer	否	`null`	最大补全 Token 数（含推理 Token），优先级高于 `max_tokens`
`stop`	string / array	否	`null`	停止序列，最多 4 个
`presence_penalty`	number	否	`0`	存在惩罚，范围 `[-2, 2]`
`frequency_penalty`	number	否	`0`	频率惩罚，范围 `[-2, 2]`
`response_format`	object	否	`null`	指定输出格式，见 2.4 输出格式控制
`tools`	array	否	`null`	工具定义列表，见 5. 工具调用
`tool_choice`	string / object	否	`"auto"`	工具选择策略，见 5.2 tool_choice
`seed`	integer	否	`null`	随机种子，用于可复现输出
`user`	string	否	`null`	终端用户标识符，用于平台监控与滥用检测

注意：当前平台上的模型不支持 n（候选数量，仅支持 n=1）和 logprobs/top_logprobs（Token 对数概率）参数。传入这些参数可能返回错误或被忽略。

2.2 消息格式（messages）

messages 是一个消息对象数组，每条消息包含 role 和 content 字段。

系统消息（system）

设定模型的行为指令与角色设定。

{
  "role": "system",
  "content": "你是一位专业的技术文档撰写助手。"
}

用户消息（user）

用户输入的对话内容。支持纯文本和多模态（图文混合）。

纯文本：

{
  "role": "user",
  "content": "请解释什么是 Transformer 架构。"
}

多模态（图文混合）：

{
  "role": "user",
  "content": [
    {"type": "text", "text": "请描述这张图片的内容。"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.jpg",
        "detail": "auto"
      }
    }
  ]
}

detail 可选值：auto（自动）、low（低分辨率，节省 Token）、high（高分辨率）。

Base64 图片：

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/png;base64,iVBORw0KGgo..."
  }
}

助手消息（assistant）

模型生成的回复，用于多轮对话中的历史上下文。

{
  "role": "assistant",
  "content": "Transformer 是一种基于自注意力机制的深度学习架构……"
}

工具消息（tool）

工具调用结果的回传消息，必须包含 tool_call_id。

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 22, \"unit\": \"celsius\"}"
}

2.3 多模态支持说明

能力	支持的模型示例
图片理解	`qwen2.5-vl-72b-instruct`、`glm-5.1`
视频理解	`qwen2.5-vl-72b-instruct`
文档理解	`qwen2.5-vl-72b-instruct`、`glm-5.1`

具体模型的多模态能力请参考控制台的模型能力矩阵，或通过 GET /v1/models 查询模型的 supported_endpoint_types。

2.4 输出格式控制（response_format）

type	说明	平台支持状态
`text`	默认，自由文本输出	全量支持
`json_object`	强制输出合法 JSON	已验证可用
`json_schema`	按指定 JSON Schema 结构化输出	部分模型不支持，可能返回错误

json_object 示例：

{
  "model": "deepseek-v4-flash",
  "response_format": {"type": "json_object"},
  "messages": [
    {"role": "system", "content": "请以 json 格式回答用户的问题。"},
    {"role": "user", "content": "列出三种编程语言及其特点。"}
  ]
}

注意：
部分模型（如 DeepSeek 系列）在使用 json_object 模式时，要求 system 或 user 消息中必须包含"json"一词，否则可能返回错误。建议始终在提示词中明确提及 JSON 输出要求。
json_schema 类型当前平台支持有限，部分模型会返回 "This response_format type is unavailable now" 错误。如需结构化输出，建议优先使用 json_object 配合提示词约束输出格式。

3. 响应格式

3.1 非流式响应

以下为 deepseek-v4-flash 模型的实际响应示例：

{
  "id": "chatcmpl-ffe850e6-3b4f-92b5-bead-63015931b73e",
  "object": "chat.completion",
  "created": 1778819669,
  "model": "deepseek-v4-flash",
  "system_fingerprint": null,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "你好！有什么我可以帮助你的吗？",
        "reasoning_content": null
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 35,
    "completion_tokens": 34,
    "total_tokens": 69,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 31
    }
  }
}

3.2 响应字段说明

字段	类型	说明
`id`	string	请求唯一标识符
`object`	string	固定为 `"chat.completion"`
`created`	integer	Unix 时间戳
`model`	string	实际使用的模型标识
`system_fingerprint`	string \| null	系统指纹（部分模型返回）
`choices`	array	生成结果列表
`choices[].index`	integer	候选序号
`choices[].message.role`	string	固定为 `"assistant"`
`choices[].message.content`	string \| null	生成的文本内容
`choices[].message.reasoning_content`	string \| null	推理过程内容（仅推理模型返回，见 3.3）
`choices[].message.tool_calls`	array \| null	工具调用列表（见 5. 工具调用）
`choices[].finish_reason`	string	结束原因，见下表
`choices[].logprobs`	null	保留字段，当前平台始终返回 `null`
`usage.prompt_tokens`	integer	输入 Token 数
`usage.completion_tokens`	integer	输出 Token 数（含推理 Token）
`usage.total_tokens`	integer	总 Token 数
`usage.prompt_tokens_details`	object \| null	输入 Token 明细
`usage.prompt_tokens_details.cached_tokens`	integer	缓存命中的输入 Token 数
`usage.completion_tokens_details`	object \| null	输出 Token 明细
`usage.completion_tokens_details.reasoning_tokens`	integer	推理过程消耗的 Token 数

注意：不同模型返回的字段可能存在差异。例如部分模型的 usage 中可能包含 estimated_cost、prompt_cache_hit_tokens 等扩展字段，建议客户端解析时对未知字段做兼容处理。

finish_reason 取值：

值	说明
`stop`	正常结束（遇到停止词或自然结束）
`length`	达到 `max_tokens` / `max_completion_tokens` 上限
`tool_calls`	模型请求调用工具
`content_filter`	内容被安全策略过滤

3.3 推理内容（reasoning_content）

平台上多款模型具备深度推理能力，响应中会包含 reasoning_content 字段，展示模型的推理思考过程。

已验证支持推理的模型：

模型	说明
`deepseek-v4-flash`	DeepSeek 推理模型，`reasoning_content` 包含中文思维链
`deepseek-v4-pro`	DeepSeek 高级推理模型
`glm-5.1`	智谱 GLM 推理模型，`reasoning_content` 包含英文思维链

实际响应示例（deepseek-v4-flash，请求："1+1等于几？只回答数字"）：

{
  "id": "chatcmpl-ffe850e6-3b4f-92b5-bead-63015931b73e",
  "object": "chat.completion",
  "created": 1778819669,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2",
        "reasoning_content": "用户问的是1+1等于几，要求只回答数字。答案是2。"
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 35,
    "completion_tokens": 34,
    "total_tokens": 69,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 31
    }
  }
}

说明：

reasoning_content 为 string 类型，包含模型的推理思维链文本。非推理模型该字段为 null 或不返回。

usage.completion_tokens_details.reasoning_tokens 记录推理过程消耗的 Token 数量，该部分 Token 同样计入 completion_tokens 总量并参与计费。

推理 Token 消耗通常大于最终输出 Token，请合理设置 max_completion_tokens 以确保推理过程有充足空间。

4. 流式响应

4.1 启用流式

设置 "stream": true 即可启用流式输出。服务端通过 Server-Sent Events（SSE）逐步推送生成内容。

4.2 流式数据格式

每个 SSE 事件以 data: 开头，最后一条为 data: [DONE]。

首个事件（包含 role）：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1778819720,"model":"deepseek-v4-flash","system_fingerprint":null,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning_content":"让我思考"},"finish_reason":null,"logprobs":null}],"usage":null}

推理阶段事件（推理模型）：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1778819720,"model":"deepseek-v4-flash","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":null,"reasoning_content":"这个问题的答案是"},"finish_reason":null,"logprobs":null}],"usage":null}

内容生成事件：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1778819720,"model":"deepseek-v4-flash","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"你好","reasoning_content":null},"finish_reason":null,"logprobs":null}],"usage":null}

结束事件：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1778819720,"model":"deepseek-v4-flash","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"","reasoning_content":null},"finish_reason":"stop","logprobs":null}],"usage":null}

用量统计事件（需 stream_options.include_usage: true）：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1778819720,"model":"deepseek-v4-flash","system_fingerprint":null,"choices":[],"usage":{"prompt_tokens":11,"completion_tokens":451,"total_tokens":462,"completion_tokens_details":{"reasoning_tokens":448},"prompt_tokens_details":{"cached_tokens":0}}}

结束标记：

data: [DONE]

4.3 流式响应字段说明

字段	类型	说明
`choices[].delta.role`	string	仅首个 chunk 包含，值为 `"assistant"`
`choices[].delta.content`	string \| null	增量文本内容
`choices[].delta.reasoning_content`	string \| null	增量推理内容（仅推理模型）
`choices[].delta.tool_calls`	array	增量工具调用
`choices[].finish_reason`	string \| null	生成结束时返回结束原因
`choices[].logprobs`	null	保留字段，当前始终为 `null`
`usage`	object \| null	仅在用量统计事件中非 `null`

注意: 推理模型通常先输出一系列 reasoning_content 事件（此时 content 为 null），再输出 content 事件（此时 reasoning_content 为 null）。客户端应分别拼接这两个字段。

4.4 Python 流式处理示例

5. 工具调用（Tool Calling）

5.1 定义工具

通过 tools 参数定义可供模型调用的函数。

{
  "model": "glm-5.1",
  "messages": [
    {"role": "user", "content": "北京今天的天气怎么样？"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "查询指定城市的当前天气信息",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "城市名称，如 北京"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"],
              "description": "温度单位"
            }
          },
          "required": ["city"]
        }
      }
    }
  ]
}

5.2 tool_choice 参数

值	说明	平台支持状态
`"auto"`	模型自行决定是否调用工具（默认）	全量支持，推荐使用
`"none"`	禁止调用工具	全量支持
`"required"`	强制必须调用至少一个工具	部分模型不支持（见下方说明）
`{"type": "function", "function": {"name": "xxx"}}`	强制调用指定工具	部分模型支持

注意: DeepSeek 系列模型使用推理模式时不支持 tool_choice: "required"，调用会返回错误 "deepseek-reasoner does not support this tool_choice"。建议始终使用 "auto"，这是兼容性最好的选项。

5.3 工具调用响应

当模型决定调用工具时，响应的 finish_reason 为 "tool_calls"。

以下为 glm-5.1 的实际响应示例：

{
  "id": "chatcmpl-RJbq47MfzkorhtetXx3D7DDr",
  "object": "chat.completion",
  "created": 1778819728,
  "model": "glm-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_e272",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"北京\"}"
            }
          }
        ],
        "reasoning_content": "The user is asking about the weather in Beijing. Let me call the weather function for this city."
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 167,
    "total_tokens": 199,
    "completion_tokens": 32
  }
}

注意：
工具调用时 content 字段可能为 null、空字符串或包含文字说明，不同模型行为不同。客户端应以 tool_calls 字段是否存在来判断是否需要执行工具。
部分模型（如 deepseek-v4-flash）可能同时返回 content（文字说明）和 tool_calls，这属于正常行为。

5.4 工具调用完整流程

发送用户消息和工具定义。

模型返回 tool_calls。

客户端执行对应函数，获取结果。

将助手消息（含 tool_calls）和工具结果（role: "tool"）追加到 messages，再次请求。

模型根据工具结果生成最终回复。

第二次请求示例：

{
  "model": "glm-5.1",
  "messages": [
    {"role": "user", "content": "北京今天的天气怎么样？"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_e272",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"city\": \"北京\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_e272",
      "content": "{\"temperature\": 28, \"condition\": \"晴\", \"humidity\": 45}"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "查询指定城市的当前天气信息",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "城市名称"},
            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
          },
          "required": ["city"]
        }
      }
    }
  ]
}

5.5 并行工具调用

模型可能在一次响应中返回多个 tool_calls，表示需要并行调用多个工具。客户端应分别执行每个工具，并将所有结果回传：

{
  "role": "assistant",
  "tool_calls": [
    {"id": "call_001", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"北京\"}"}},
    {"id": "call_002", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\": \"上海\"}"}}
  ]
}

回传时需为每个 tool_call_id 提供对应的 tool 消息。

6. 完整请求示例

6.1 基础对话

6.2 多轮对话

6.3 视觉理解

6.4 JSON 结构化输出

7. 错误处理

7.1 常见错误

HTTP 状态码	错误类型	说明	建议处理
400	`invalid_request_error`	请求参数错误	检查参数格式与取值范围
401	`authentication_error`	鉴权失败	检查 API Key 是否正确
403	`authorization_error`	无权限	检查令牌的模型白名单与额度
429	`rate_limit_error`	请求频率超限	降低请求频率或联系管理员提升限额
500	`server_error`	服务端错误	稍后重试，持续异常请联系技术支持
503	`server_error`	上游服务异常或无可用渠道	稍后重试，或切换至其他可用模型

平台实际错误消息示例：

场景	错误消息
模型不支持某参数	`"Error from provider (DeepSeek): deepseek-reasoner does not support this tool_choice"`
模型不支持 `n>1`	`"Error from provider (DeepSeek): Invalid n value (currently only n = 1 is supported)"`
response_format 不支持	`"This response_format type is unavailable now"`
无可用渠道	`"获取分组 auto(auto) 下模型 xxx 的可用渠道失败"`

7.2 错误响应格式

{
  "error": {
    "message": "具体的错误描述信息",
    "type": "invalid_request_error",
    "param": "",
    "code": "invalid_request_error"
  }
}

7.3 重试建议

错误类型	是否重试	策略
400 参数错误	否	修正参数后重新请求
401/403 鉴权	否	检查并更换 API Key
429 限流	是	指数退避重试（1s、2s、4s……）
500 服务端	是	指数退避重试，最多 3 次
503 上游	是	指数退避重试，或切换模型

8. 最佳实践

场景	建议
控制成本	使用 `max_tokens` 限制输出长度；选择性价比合适的模型（如 `deepseek-v4-flash`）
提高质量	提供清晰的 system prompt；使用较低的 temperature（如 0.3）获得更确定的输出
流式体验	启用 `stream: true` 降低首 Token 延迟，提升用户体感
结构化输出	使用 `response_format` 的 `json_object` 类型，配合提示词约束格式
推理任务	使用推理模型（如 `deepseek-v4-flash`、`glm-5.1`），适当提高 `max_completion_tokens` 以容纳推理过程
工具调用	为 function 提供清晰的 `description`；`tool_choice` 建议使用 `"auto"` 以获得最佳兼容性
多轮对话	合理裁剪历史消息，避免超出模型上下文窗口
生产稳定性	实现指数退避重试；监控 429/5xx 错误率；配置多模型备选

_{数眼智能技术团队 — 文档版本 v2.2.0}

对话补全 Chat Completions

1. 接口概述#

2. 请求参数#

2.1 顶层参数#

2.2 消息格式（messages）#

系统消息（system）#

用户消息（user）#

助手消息（assistant）#

工具消息（tool）#

2.3 多模态支持说明#

2.4 输出格式控制（response_format）#

3. 响应格式#

3.1 非流式响应#

3.2 响应字段说明#

3.3 推理内容（reasoning_content）#

4. 流式响应#

4.1 启用流式#

4.2 流式数据格式#

4.3 流式响应字段说明#

4.4 Python 流式处理示例#

5. 工具调用（Tool Calling）#

5.1 定义工具#

5.2 tool_choice 参数#

5.3 工具调用响应#

5.4 工具调用完整流程#

5.5 并行工具调用#

6. 完整请求示例#

6.1 基础对话#

6.2 多轮对话#

6.3 视觉理解#

6.4 JSON 结构化输出#

7. 错误处理#

7.1 常见错误#

7.2 错误响应格式#

7.3 重试建议#

8. 最佳实践#