AI Sub-agents Observability

The AI module runs two sub-agents as side jobs of the main chat turn: GuardrailSubAgent (safety classifier, kill-switch for the parent stream) and TitleGeneratorSubAgent (generates the chat name from the first user message). Both call into IAgentRunner like a regular agent run, but they are not user-facing — until issue #179 they were invisible in Prometheus and Loki.

This page describes the canonical wide event, the three Prometheus instruments, and the parent-stream rollup that make them observable.

What and Why

Before #179, aiSubagentDuration, aiSubagentInvocationsTotal, and aiSubagentTokensTotal were declared in ai-metrics.ts but had zero call sites. There was also no canonical log line per sub-agent invocation — only the per-rule ai.guardrail.checked event, which captures rule outcomes, not the invocation as a whole.

The change closes that gap so on-call can answer:

How often is each sub-agent firing? Are errors creeping up?
How long does each sub-agent take? (Guardrail blocks parent tokens when slow because it runs in parallel with the stream.)
How many tokens are sub-agents consuming, and at what cost?
For a given chat turn, which sub-agents fired? — surfaced as subagents_spawned on the parent canonical event.

Canonical Wide Event — `ai.subagent.finished`

Emitted exactly once per subAgent.execute() call, on every code path (success, runner failure, guardrail block).

Field	Type	Notes
`subagent`	enum	`'guardrail'` \| `'title_generator'`
`parent_chat_id`	string?	From `trackingContext.chatId`
`parent_message_id`	string?	From `trackingContext.messageId` (added in this change)
`model`	string	Full ModelId (`provider/model`) used by the sub-agent
`provider`	string	Substring before `/` in ModelId (`openai`, `anthropic`, …)
`tokens_input`	number	From `runResult.usage.inputTokens` (0 if runner failed)
`tokens_output`	number	From `runResult.usage.outputTokens`
`usd_cost`	number	From `runResult.totalCostUsd`
`duration_ms`	number	`performance.now()` anchor around the runner call
`result`	enum	`'ok'` \| `'block'` \| `'error'`
`decision`	enum?	`'allow'` \| `'block'` (guardrail only)
`block_reason`	string?	Set when guardrail blocks (sourced from runner output)

parent_chat_id and parent_message_id make it trivial to correlate a sub-agent run with the user turn that spawned it — useful when ai.message-stream.finished shows abnormal latency and you need to check whether the guardrail was the cause.

Prometheus Instruments

All three were declared previously (ai-metrics.ts lines 202-294) and now have real call sites in the canonical recorder.

Instrument	Type	Labels	Unit	Reason
`ai.subagent.duration_seconds`	Histogram	`subagent`, `result`	s	Latency distribution per sub-agent and outcome
`ai.subagent_invocations_total`	Counter	`subagent`, `result`	—	Throughput + error rate
`ai.subagent_tokens_total`	Counter	`subagent`, `type`	—	Token economics; `type` is `'input'` \| `'output'`

Cardinality discipline: high-cardinality fields (chat_id, message_id, model, provider) are never emitted as labels — they live in the wide event log body. The recorder enforces this in code; a cardinality assertion lives in subagent-invocation.recorder.spec.ts.

When a token bucket is zero, the recorder skips the counter increment to avoid polluting the metric with no-op samples.

Parent Rollup — `subagents_spawned[]`

ai.message-stream.finished (the canonical event for each chat turn) now includes:

subagents_spawned: string[]   // sorted, deduplicated subagent ids that fired during the turn

The recorder appends an entry to LogContext.subagents_spawned per invocation. The stream-finished builder reads, deduplicates, and sorts that list before emitting the canonical event. Empty array when no sub-agents ran.

This lets dashboards answer "which sub-agents are routinely spawned per turn?" from a single log line — without joining across ai.subagent.finished events.

Recorder Architecture

graph TD
    GA["GuardrailSubAgent.execute()"] -->|all paths| R["recordSubagentInvocation()"]
    TG["TitleGeneratorSubAgent.execute()"] -->|all paths| R
    R -->|1| L["AppLoggerService.info('ai.subagent.finished', …)"]
    R -->|2| H["aiSubagentDuration.record(seconds, {subagent, result})"]
    R -->|3| C1["aiSubagentInvocationsTotal.add(1, {subagent, result})"]
    R -->|4| C2["aiSubagentTokensTotal.add(n, {subagent, type})"]
    R -->|5| LC["LogContext.append('subagents_spawned', …)"]
    LC -.->|read by| SF["stream-finished-event.builder.ts"]
    SF -->|surfaces| SP["subagents_spawned: string[] on ai.message-stream.finished"]

The order — log → metrics → LogContext — matches recordGuardrailCheck so downstream readers see the wide event first and metrics second.

Grafana

The AI dashboard (infra/observability/grafana-provisioning/dashboards/infra/infra-ai-observability-dashboard.json) now has a Sub-agents row with four panels:

Sub-agent invocations/min por result — throughput and error rate split by {subagent, result}
Sub-agent duración p50 / p95 por subagent — latency per sub-agent
Sub-agent tokens/min (input vs output) — token throughput per sub-agent and dimension
Sub-agent error rate (1h) por subagent — stat panel with green/yellow/red thresholds at <1% / <5% / ≥5%

For ad-hoc investigation of a specific sub-agent invocation, query Loki directly:

logql

{service_name="daramex-api"} | json | event="ai.subagent.finished" | parent_chat_id="<id>"

Privacy

The wide event carries token counts and cost — never the raw prompt or the sub-agent's response content. The block_reason field comes from the guardrail runner's parsed JSON output (the model's own justification), which is considered safe to log. If the guardrail prompt evolves to surface user-quoted text in reason, that field should be redacted before logging.

Chat and message streaming — parent stream lifecycle that hosts the sub-agents
Issue #179 — implementation tracking
Epic #174 — AI module observability roadmap (phase 4)

AI Sub-agents Observability ​

What and Why ​

Canonical Wide Event — ai.subagent.finished ​

Prometheus Instruments ​

Parent Rollup — subagents_spawned[] ​

Recorder Architecture ​

Grafana ​

Privacy ​

Related ​