Chat and Message Streaming

The AI module supports explicit chat CRUD plus a streamed execution path that persists both the user message and the final assistant message.

Controller Surface

These are controller-defined routes only. The AI module is not mounted yet, so they are not currently live.

POST /chat requires an agentId.
The create command loads the agent first so each chat inherits the agent's agencyId and orgId.
PATCH /chat/:id only updates the optional name field.
DELETE /chat/:id removes the chat, relies on message cascade deletion, invalidates Redis cache entries, and requests agent chat-stat resync.
GET /chat and GET /chat/:chatId/messages both use the shared pagination schema.

POST /agent/:id/chat executes the full assistant flow:

Load the agent, cached chat, and cached messages in parallel.
Create the chat on demand if the supplied chatId does not exist yet.
Persist the incoming user message first.
Convert message history into model messages for the Vercel AI SDK.
Resolve tools for the agent, including the RAG tool when knowledge is enabled.
Build the extra knowledge prompt context.
Run the guardrail subagent against the latest user message.
Check token budgets before starting the main model execution.
Stream the assistant response to the HTTP response object.
Persist the final assistant message, increment processed-message stats, and refresh cached history.

If this is the first message in a chat and the chat has no name, the module asynchronously invokes the title-generator subagent.
The generated title is based on the first user message and is expected to stay under six words in the user's language.
Title generation does not block the main assistant response.

The guardrail subagent checks prompt injection, harmful or illegal requests, attempts to reveal instructions, and parent-agent-specific prohibitions.
Guardrail failures return AI.GUARDRAIL_BLOCKED.
Token limits can block the execution entirely or cap the output token budget, depending on executionLimitAction.
Capped executions reserve 300 input tokens before calculating the allowed assistant output budget.

Chats are cached in Redis for 20 minutes under ia:chat:<userId>:<chatId>.
Message history is cached for 5 minutes under ia:messages:<userId>:<chatId>.
The message-history query returns only user and assistant roles.
The streamed assistant message adds finish metadata so clients can tell whether the response was aborted or capped.

The streamed entrypoint expects a CreateMessageDto payload:

chatId
parts
messageId
optional metadata
regenerate (boolean, default false) — triggers tail-delete regenerate flow
targetMessageId (uuid7 | null, default null) — required when regenerate === true

See Message Actions for the full regenerate semantics and tail-delete behavior.

Code	HTTP	Meaning
`AI.AGENT_NOT_FOUND`	`404`	The agent does not exist or is disabled.
`AI.CHAT_NOT_FOUND`	`404`	The requested chat does not exist.
`AI.ERROR_CREATING_CHAT`	`500`	Auto-chat creation failed during streaming startup.
`AI.GUARDRAIL_BLOCKED`	`400`	The guardrail subagent rejected the prompt.
`AI.TOKEN_LIMIT_EXCEEDED`	`400`	Token budgets do not allow a new execution.
`AI.ERROR_RUNNING_AGENT_STREAM`	`500`	The streaming gateway failed.
`AI.REGENERATE_MISSING_TARGET`	`422`	`regenerate: true` sent without `targetMessageId`.
`AI.REGENERATE_ROLE_MISMATCH`	`400`	The referenced message is not an `assistant` message.
`AI.MESSAGE_NOT_FOUND`	`404`	Target message not found or belongs to a different tenant.
`AI.CHAT_BUSY`	`409`	A concurrent regenerate is already in progress for this chat.