Appearance
AI Sub-agents Observability
The AI module runs two sub-agents as side jobs of the main chat turn: GuardrailSubAgent (safety classifier, kill-switch for the parent stream) and TitleGeneratorSubAgent (generates the chat name from the first user message). Both call into IAgentRunner like a regular agent run, but they are not user-facing — until issue #179 they were invisible in Prometheus and Loki.
This page describes the canonical wide event, the three Prometheus instruments, and the parent-stream rollup that make them observable.
What and Why
Before #179, aiSubagentDuration, aiSubagentInvocationsTotal, and aiSubagentTokensTotal were declared in ai-metrics.ts but had zero call sites. There was also no canonical log line per sub-agent invocation — only the per-rule ai.guardrail.checked event, which captures rule outcomes, not the invocation as a whole.
The change closes that gap so on-call can answer:
- How often is each sub-agent firing? Are errors creeping up?
- How long does each sub-agent take? (Guardrail blocks parent tokens when slow because it runs in parallel with the stream.)
- How many tokens are sub-agents consuming, and at what cost?
- For a given chat turn, which sub-agents fired? — surfaced as
subagents_spawnedon the parent canonical event.
Canonical Wide Event — ai.subagent.finished
Emitted exactly once per subAgent.execute() call, on every code path (success, runner failure, guardrail block).
| Field | Type | Notes |
|---|---|---|
subagent | enum | 'guardrail' | 'title_generator' |
parent_chat_id | string? | From trackingContext.chatId |
parent_message_id | string? | From trackingContext.messageId (added in this change) |
model | string | Full ModelId (provider/model) used by the sub-agent |
provider | string | Substring before / in ModelId (openai, anthropic, …) |
tokens_input | number | From runResult.usage.inputTokens (0 if runner failed) |
tokens_output | number | From runResult.usage.outputTokens |
usd_cost | number | From runResult.totalCostUsd |
duration_ms | number | performance.now() anchor around the runner call |
result | enum | 'ok' | 'block' | 'error' |
decision | enum? | 'allow' | 'block' (guardrail only) |
block_reason | string? | Set when guardrail blocks (sourced from runner output) |
parent_chat_id and parent_message_id make it trivial to correlate a sub-agent run with the user turn that spawned it — useful when ai.message-stream.finished shows abnormal latency and you need to check whether the guardrail was the cause.
Prometheus Instruments
All three were declared previously (ai-metrics.ts lines 202-294) and now have real call sites in the canonical recorder.
| Instrument | Type | Labels | Unit | Reason |
|---|---|---|---|---|
ai.subagent.duration_seconds | Histogram | subagent, result | s | Latency distribution per sub-agent and outcome |
ai.subagent_invocations_total | Counter | subagent, result | — | Throughput + error rate |
ai.subagent_tokens_total | Counter | subagent, type | — | Token economics; type is 'input' | 'output' |
Cardinality discipline: high-cardinality fields (chat_id, message_id, model, provider) are never emitted as labels — they live in the wide event log body. The recorder enforces this in code; a cardinality assertion lives in subagent-invocation.recorder.spec.ts.
When a token bucket is zero, the recorder skips the counter increment to avoid polluting the metric with no-op samples.
Parent Rollup — subagents_spawned[]
ai.message-stream.finished (the canonical event for each chat turn) now includes:
ts
subagents_spawned: string[] // sorted, deduplicated subagent ids that fired during the turnThe recorder appends an entry to LogContext.subagents_spawned per invocation. The stream-finished builder reads, deduplicates, and sorts that list before emitting the canonical event. Empty array when no sub-agents ran.
This lets dashboards answer "which sub-agents are routinely spawned per turn?" from a single log line — without joining across ai.subagent.finished events.
Recorder Architecture
graph TD
GA["GuardrailSubAgent.execute()"] -->|all paths| R["recordSubagentInvocation()"]
TG["TitleGeneratorSubAgent.execute()"] -->|all paths| R
R -->|1| L["AppLoggerService.info('ai.subagent.finished', …)"]
R -->|2| H["aiSubagentDuration.record(seconds, {subagent, result})"]
R -->|3| C1["aiSubagentInvocationsTotal.add(1, {subagent, result})"]
R -->|4| C2["aiSubagentTokensTotal.add(n, {subagent, type})"]
R -->|5| LC["LogContext.append('subagents_spawned', …)"]
LC -.->|read by| SF["stream-finished-event.builder.ts"]
SF -->|surfaces| SP["subagents_spawned: string[] on ai.message-stream.finished"]
The order — log → metrics → LogContext — matches recordGuardrailCheck so downstream readers see the wide event first and metrics second.
Grafana
The AI dashboard (infra/observability/grafana-provisioning/dashboards/infra/infra-ai-observability-dashboard.json) now has a Sub-agents row with four panels:
- Sub-agent invocations/min por result — throughput and error rate split by
{subagent, result} - Sub-agent duración p50 / p95 por subagent — latency per sub-agent
- Sub-agent tokens/min (input vs output) — token throughput per sub-agent and dimension
- Sub-agent error rate (1h) por subagent — stat panel with green/yellow/red thresholds at <1% / <5% / ≥5%
For ad-hoc investigation of a specific sub-agent invocation, query Loki directly:
logql
{service_name="daramex-api"} | json | event="ai.subagent.finished" | parent_chat_id="<id>"Privacy
The wide event carries token counts and cost — never the raw prompt or the sub-agent's response content. The block_reason field comes from the guardrail runner's parsed JSON output (the model's own justification), which is considered safe to log. If the guardrail prompt evolves to surface user-quoted text in reason, that field should be redacted before logging.
Related
- Chat and message streaming — parent stream lifecycle that hosts the sub-agents
- Issue #179 — implementation tracking
- Epic #174 — AI module observability roadmap (phase 4)