Skip to content

Chat and Message Streaming

The AI module supports explicit chat CRUD plus a streamed execution path that persists both the user message and the final assistant message.

Controller Surface

  • GET /chat
  • GET /chat/:id
  • POST /chat
  • PATCH /chat/:id
  • DELETE /chat/:id
  • GET /chat/:chatId/messages
  • POST /agent/:id/chat

These are controller-defined routes only. The AI module is not mounted yet, so they are not currently live.

Chat CRUD

  • POST /chat requires an agentId.
  • The create command loads the agent first so each chat inherits the agent's agencyId and orgId.
  • PATCH /chat/:id only updates the optional name field.
  • DELETE /chat/:id removes the chat, relies on message cascade deletion, invalidates Redis cache entries, and requests agent chat-stat resync.
  • GET /chat and GET /chat/:chatId/messages both use the shared pagination schema.

Streamed Message Flow

POST /agent/:id/chat executes the full assistant flow:

  1. Load the agent, cached chat, and cached messages in parallel.
  2. Create the chat on demand if the supplied chatId does not exist yet.
  3. Persist the incoming user message first.
  4. Convert message history into model messages for the Vercel AI SDK.
  5. Resolve tools for the agent, including the RAG tool when knowledge is enabled.
  6. Build the extra knowledge prompt context.
  7. Run the guardrail subagent against the latest user message.
  8. Check token budgets before starting the main model execution.
  9. Stream the assistant response to the HTTP response object.
  10. Persist the final assistant message, increment processed-message stats, and refresh cached history.

Automatic Title Generation

  • If this is the first message in a chat and the chat has no name, the module asynchronously invokes the title-generator subagent.
  • The generated title is based on the first user message and is expected to stay under six words in the user's language.
  • Title generation does not block the main assistant response.

Guardrails and Token Limits

  • The guardrail subagent checks prompt injection, harmful or illegal requests, attempts to reveal instructions, and parent-agent-specific prohibitions.
  • Guardrail failures return AI.GUARDRAIL_BLOCKED.
  • Token limits can block the execution entirely or cap the output token budget, depending on executionLimitAction.
  • Capped executions reserve 300 input tokens before calculating the allowed assistant output budget.

Message and Cache Behavior

  • Chats are cached in Redis for 20 minutes under ia:chat:<userId>:<chatId>.
  • Message history is cached for 5 minutes under ia:messages:<userId>:<chatId>.
  • The message-history query returns only user and assistant roles.
  • The streamed assistant message adds finish metadata so clients can tell whether the response was aborted or capped.

Expected Request Shape

The streamed entrypoint expects a CreateMessageDto payload:

  • chatId
  • parts
  • messageId
  • optional metadata
  • regenerate (boolean, default false) — triggers tail-delete regenerate flow
  • targetMessageId (uuid7 | null, default null) — required when regenerate === true

See Message Actions for the full regenerate semantics and tail-delete behavior.

Structured Error Contract

CodeHTTPMeaning
AI.AGENT_NOT_FOUND404The agent does not exist or is disabled.
AI.CHAT_NOT_FOUND404The requested chat does not exist.
AI.ERROR_CREATING_CHAT500Auto-chat creation failed during streaming startup.
AI.GUARDRAIL_BLOCKED400The guardrail subagent rejected the prompt.
AI.TOKEN_LIMIT_EXCEEDED400Token budgets do not allow a new execution.
AI.ERROR_RUNNING_AGENT_STREAM500The streaming gateway failed.
AI.REGENERATE_MISSING_TARGET422regenerate: true sent without targetMessageId.
AI.REGENERATE_ROLE_MISMATCH400The referenced message is not an assistant message.
AI.MESSAGE_NOT_FOUND404Target message not found or belongs to a different tenant.
AI.CHAT_BUSY409A concurrent regenerate is already in progress for this chat.