Appearance
Chat and Message Streaming
The AI module supports explicit chat CRUD plus a streamed execution path that persists both the user message and the final assistant message.
Controller Surface
GET /chatGET /chat/:idPOST /chatPATCH /chat/:idDELETE /chat/:idGET /chat/:chatId/messagesPOST /agent/:id/chat
These are controller-defined routes only. The AI module is not mounted yet, so they are not currently live.
Chat CRUD
POST /chatrequires anagentId.- The create command loads the agent first so each chat inherits the agent's
agencyIdandorgId. PATCH /chat/:idonly updates the optionalnamefield.DELETE /chat/:idremoves the chat, relies on message cascade deletion, invalidates Redis cache entries, and requests agent chat-stat resync.GET /chatandGET /chat/:chatId/messagesboth use the shared pagination schema.
Streamed Message Flow
POST /agent/:id/chat executes the full assistant flow:
- Load the agent, cached chat, and cached messages in parallel.
- Create the chat on demand if the supplied
chatIddoes not exist yet. - Persist the incoming user message first.
- Convert message history into model messages for the Vercel AI SDK.
- Resolve tools for the agent, including the RAG tool when knowledge is enabled.
- Build the extra knowledge prompt context.
- Run the guardrail subagent against the latest user message.
- Check token budgets before starting the main model execution.
- Stream the assistant response to the HTTP response object.
- Persist the final assistant message, increment processed-message stats, and refresh cached history.
Automatic Title Generation
- If this is the first message in a chat and the chat has no
name, the module asynchronously invokes the title-generator subagent. - The generated title is based on the first user message and is expected to stay under six words in the user's language.
- Title generation does not block the main assistant response.
Guardrails and Token Limits
- The guardrail subagent checks prompt injection, harmful or illegal requests, attempts to reveal instructions, and parent-agent-specific prohibitions.
- Guardrail failures return
AI.GUARDRAIL_BLOCKED. - Token limits can block the execution entirely or cap the output token budget, depending on
executionLimitAction. - Capped executions reserve 300 input tokens before calculating the allowed assistant output budget.
Message and Cache Behavior
- Chats are cached in Redis for 20 minutes under
ia:chat:<userId>:<chatId>. - Message history is cached for 5 minutes under
ia:messages:<userId>:<chatId>. - The message-history query returns only
userandassistantroles. - The streamed assistant message adds finish metadata so clients can tell whether the response was aborted or capped.
Expected Request Shape
The streamed entrypoint expects a CreateMessageDto payload:
chatIdpartsmessageId- optional
metadata regenerate(boolean, defaultfalse) — triggers tail-delete regenerate flowtargetMessageId(uuid7 | null, defaultnull) — required whenregenerate === true
See Message Actions for the full regenerate semantics and tail-delete behavior.
Structured Error Contract
| Code | HTTP | Meaning |
|---|---|---|
AI.AGENT_NOT_FOUND | 404 | The agent does not exist or is disabled. |
AI.CHAT_NOT_FOUND | 404 | The requested chat does not exist. |
AI.ERROR_CREATING_CHAT | 500 | Auto-chat creation failed during streaming startup. |
AI.GUARDRAIL_BLOCKED | 400 | The guardrail subagent rejected the prompt. |
AI.TOKEN_LIMIT_EXCEEDED | 400 | Token budgets do not allow a new execution. |
AI.ERROR_RUNNING_AGENT_STREAM | 500 | The streaming gateway failed. |
AI.REGENERATE_MISSING_TARGET | 422 | regenerate: true sent without targetMessageId. |
AI.REGENERATE_ROLE_MISMATCH | 400 | The referenced message is not an assistant message. |
AI.MESSAGE_NOT_FOUND | 404 | Target message not found or belongs to a different tenant. |
AI.CHAT_BUSY | 409 | A concurrent regenerate is already in progress for this chat. |