Skip to content

AI Sub-agents Observability

The AI module runs two sub-agents as side jobs of the main chat turn: GuardrailSubAgent (safety classifier, kill-switch for the parent stream) and TitleGeneratorSubAgent (generates the chat name from the first user message). Both call into IAgentRunner like a regular agent run, but they are not user-facing — until issue #179 they were invisible in Prometheus and Loki.

This page describes the canonical wide event, the three Prometheus instruments, and the parent-stream rollup that make them observable.

What and Why

Before #179, aiSubagentDuration, aiSubagentInvocationsTotal, and aiSubagentTokensTotal were declared in ai-metrics.ts but had zero call sites. There was also no canonical log line per sub-agent invocation — only the per-rule ai.guardrail.checked event, which captures rule outcomes, not the invocation as a whole.

The change closes that gap so on-call can answer:

  • How often is each sub-agent firing? Are errors creeping up?
  • How long does each sub-agent take? (Guardrail blocks parent tokens when slow because it runs in parallel with the stream.)
  • How many tokens are sub-agents consuming, and at what cost?
  • For a given chat turn, which sub-agents fired? — surfaced as subagents_spawned on the parent canonical event.

Canonical Wide Event — ai.subagent.finished

Emitted exactly once per subAgent.execute() call, on every code path (success, runner failure, guardrail block).

FieldTypeNotes
subagentenum'guardrail' | 'title_generator'
parent_chat_idstring?From trackingContext.chatId
parent_message_idstring?From trackingContext.messageId (added in this change)
modelstringFull ModelId (provider/model) used by the sub-agent
providerstringSubstring before / in ModelId (openai, anthropic, …)
tokens_inputnumberFrom runResult.usage.inputTokens (0 if runner failed)
tokens_outputnumberFrom runResult.usage.outputTokens
usd_costnumberFrom runResult.totalCostUsd
duration_msnumberperformance.now() anchor around the runner call
resultenum'ok' | 'block' | 'error'
decisionenum?'allow' | 'block' (guardrail only)
block_reasonstring?Set when guardrail blocks (sourced from runner output)

parent_chat_id and parent_message_id make it trivial to correlate a sub-agent run with the user turn that spawned it — useful when ai.message-stream.finished shows abnormal latency and you need to check whether the guardrail was the cause.

Prometheus Instruments

All three were declared previously (ai-metrics.ts lines 202-294) and now have real call sites in the canonical recorder.

InstrumentTypeLabelsUnitReason
ai.subagent.duration_secondsHistogramsubagent, resultsLatency distribution per sub-agent and outcome
ai.subagent_invocations_totalCountersubagent, resultThroughput + error rate
ai.subagent_tokens_totalCountersubagent, typeToken economics; type is 'input' | 'output'

Cardinality discipline: high-cardinality fields (chat_id, message_id, model, provider) are never emitted as labels — they live in the wide event log body. The recorder enforces this in code; a cardinality assertion lives in subagent-invocation.recorder.spec.ts.

When a token bucket is zero, the recorder skips the counter increment to avoid polluting the metric with no-op samples.

Parent Rollup — subagents_spawned[]

ai.message-stream.finished (the canonical event for each chat turn) now includes:

ts
subagents_spawned: string[]   // sorted, deduplicated subagent ids that fired during the turn

The recorder appends an entry to LogContext.subagents_spawned per invocation. The stream-finished builder reads, deduplicates, and sorts that list before emitting the canonical event. Empty array when no sub-agents ran.

This lets dashboards answer "which sub-agents are routinely spawned per turn?" from a single log line — without joining across ai.subagent.finished events.

Recorder Architecture

graph TD
    GA["GuardrailSubAgent.execute()"] -->|all paths| R["recordSubagentInvocation()"]
    TG["TitleGeneratorSubAgent.execute()"] -->|all paths| R
    R -->|1| L["AppLoggerService.info('ai.subagent.finished', …)"]
    R -->|2| H["aiSubagentDuration.record(seconds, {subagent, result})"]
    R -->|3| C1["aiSubagentInvocationsTotal.add(1, {subagent, result})"]
    R -->|4| C2["aiSubagentTokensTotal.add(n, {subagent, type})"]
    R -->|5| LC["LogContext.append('subagents_spawned', …)"]
    LC -.->|read by| SF["stream-finished-event.builder.ts"]
    SF -->|surfaces| SP["subagents_spawned: string[] on ai.message-stream.finished"]

The order — log → metrics → LogContext — matches recordGuardrailCheck so downstream readers see the wide event first and metrics second.

Grafana

The AI dashboard (infra/observability/grafana-provisioning/dashboards/infra/infra-ai-observability-dashboard.json) now has a Sub-agents row with four panels:

  • Sub-agent invocations/min por result — throughput and error rate split by {subagent, result}
  • Sub-agent duración p50 / p95 por subagent — latency per sub-agent
  • Sub-agent tokens/min (input vs output) — token throughput per sub-agent and dimension
  • Sub-agent error rate (1h) por subagent — stat panel with green/yellow/red thresholds at <1% / <5% / ≥5%

For ad-hoc investigation of a specific sub-agent invocation, query Loki directly:

logql
{service_name="daramex-api"} | json | event="ai.subagent.finished" | parent_chat_id="<id>"

Privacy

The wide event carries token counts and cost — never the raw prompt or the sub-agent's response content. The block_reason field comes from the guardrail runner's parsed JSON output (the model's own justification), which is considered safe to log. If the guardrail prompt evolves to surface user-quoted text in reason, that field should be redacted before logging.

  • Chat and message streaming — parent stream lifecycle that hosts the sub-agents
  • Issue #179 — implementation tracking
  • Epic #174 — AI module observability roadmap (phase 4)