Document Search

GET /storage/documents and GET /storage/documents/trash are the single entry points for any UI that needs to list, filter, search, sort, or paginate documents. The chat attachment picker, the agent knowledge (RAG) selector, and the storage browser all call this endpoint with different query params — the server, not the client, decides which documents are visible.

The endpoint follows the canonical Query Filters & Pagination DSL. This page documents only the resource-specific contract.

Request shape

GET /storage/documents
  ?page=1
  &limit=10
  &sort=-createdAt,name
  &mediaType[in]=image/png,image/jpeg,application/pdf
  &documentCategoryId[null]=true
  &uploaderId[in]=<uuidv7>,<uuidv7>
  &favorite=true
  &size[gte]=1000000
  &createdAt[between]=2026-01-01T00:00:00Z,2026-04-01T00:00:00Z
  &q=invoice
  &tags=legal,contract

Param	Type	Operators	Notes
`page`	int ≥ 1	—	default 1
`limit`	int 1–100	—	default 10, hard cap 100
`sort`	CSV	—	whitelist: `name`, `size`, `createdAt`, `updatedAt`, `favorite`. `-` prefix = DESC. Default order: `createdAt DESC`.
`mediaType`	string	`eq`, `in`	`?mediaType[in]=image/png,image/jpeg`. The API exposes `mediaType` for consistency with the chat `data-attachment` schema; internally maps to the `type` column.
`documentCategoryId`	uuid v7	`eq`, `in`, `null`	`[null]=true` to find uncategorised documents.
`uploaderId`	uuid v7	`eq`, `in`	non-nullable.
`favorite`	boolean	`eq`	string forms `"true"`/`"false"` are coerced.
`size`	int	`gt`, `gte`, `lt`, `lte`, `between`	bytes.
`createdAt`, `updatedAt`	ISO datetime	`gt`, `gte`, `lt`, `lte`, `between`	between is `[from, to]`.
`q`	string 1–100	(top-level extra)	ILIKE substring match on `name`. Wildcards (`%`, `_`, `\`) are escaped — user input cannot inject pattern operators.
`tags`	string CSV / repeated	(top-level extra)	Postgres jsonb array overlap (`?

q and tags live outside the DSL on purpose: the DSL operator set is frozen at eq/in/gt/gte/lt/lte/between and explicitly does not cover ILIKE or array overlap (per the DSL spec).

Unknown top-level keys are silently stripped by the Zod schema.

Response shape

{
  items: IDocumentResponse[],
  total: number,
  page: number,
  limit: number,
}

IDocumentResponse is the standard storage DTO (id, name, size, type, favorite, tags, s3Key, createdAt, etc. — see DocumentResponseSchema).

Trash variant

GET /storage/documents/trash accepts the same query schema (no need to learn a different contract). The only behavioural difference is the deletedAt predicate applied at the repository level — IS NOT NULL instead of IS NULL. Permission, sort, filter, search, and pagination behave identically.

Tenancy

Every request resolves an access filter via AuthorizationService.buildAccessFilter('read', 'storage.document', { orgId, ... }). The resolved filter is merged into every where row at the repository layer alongside the user-supplied DSL filters and the deletedAt predicate. There is no way to bypass tenancy from the URL — even mediaType[in]=* returns only documents in the caller's org.

If the user has no matching access rules, the endpoint returns an empty page (items: [], total: 0) — never a 200 with foreign data and never a 403 (this matches existing behaviour for the previous non-paginated version).

Forcing filters per use case

The expected pattern: the client decides which filters to force based on context.

Context	Forced filters
Chat composer attachment picker	`mediaType[in]=<MIMEs the active agent's model supports>` (derived in the panel via `getSupportedMediaTypesForModalities(agent.brainConfig.model.architecture.input_modalities)` from `@repo/schemas`).
Agent knowledge (RAG) upload picker	`mediaType[in]=application/pdf,...` (whatever the RAG pipeline supports).
Free storage browser	none — user controls all filters.

The frontend cannot widen what the API allows: tenancy is server-side, the field map is the security boundary, and the schema rejects unknown keys. The frontend can only narrow results within the scope it has access to.

Why a single endpoint, not one per consumer

Earlier drafts of #296 considered a dedicated GET /agent/:id/attachable-documents for the chat use case. That was rejected because:

The capability filter (which MIMEs an agent can ingest) is derivable from data the panel already has (agent.brainConfig.model.architecture.input_modalities). No reason to query a second endpoint.
A dedicated endpoint would duplicate every storage browser feature (filters, sort, search, pagination) for the small benefit of one server-side computation.
The same modal must serve the agent RAG upload picker, future "share file in chat" flows, and any list-from-storage UI. One source of truth scales; many bespoke endpoints do not.

Centralising on the storage list endpoint also keeps the security boundary in one place: the DOCUMENT_FIELD_MAP whitelist plus tenancy resolution. New consumers do not introduce new attack surface — they only force a subset of filters.

Implementation notes

Field map: apps/api/src/modules/storage/infrastructure/field-maps/document-field-map.ts whitelists which top-level keys translate to TypeORM columns. Anything not in the map is silently dropped by toTypeOrmWhere.
Sort map: separate from the filter map. type (mediaType) is filterable but not sortable; this is deliberate — sorting by MIME type buckets results unexpectedly and we never wanted to expose it.
q escaping: escapeLikePattern doubles \ and prefixes %/_ with \. Without this, a user typing 50%_off would match every file (% is a wildcard).
tags overlap: implemented with Raw('tags ?| ARRAY[:...tagValues]', { tagValues }) because the column is jsonb, not a native Postgres array. TypeORM's ArrayOverlap operator targets native arrays — it does not work on jsonb.
Pagination: uses the shared paginate(...) helper which calls findAndCount under the hood. Wrapped in recordRepositoryOperation so the standard storage repository telemetry covers list operations the same way it covers single-row reads.

Frontend contract

The api-client exposes:

listDocuments(client, query?: ISearchDocumentsQuery): Promise<PaginationResultDto<IDocumentResponse>>
listTrash(client,    query?: ISearchDocumentsQuery): Promise<PaginationResultDto<IDocumentResponse>>

sort is serialised to the canonical CSV form (-createdAt,name) before reaching axios — qs would otherwise nest the sort objects into keys the API does not accept (same workaround the operations search uses).

The React hooks useDocuments(query) and useTrashDocuments(query) include the query in their queryKey, so React Query refetches when filters change. Mutations (move-to-trash, update-metadata, …) invalidate the bare ['storage', 'documents'] / ['storage', 'trash'] prefix so every dependent variant is matched and refetched.

Out of scope

Surfacing search/sort/paginator UI in the panel storage browser. The API is ready; the UI work tracks separately.
Cursor-based pagination. The DSL is offset-only; revisit if scan windows on documents get expensive.
Full-text search beyond filename. Use Qdrant for semantic search (per the DSL spec — q here is intentionally a simple ILIKE).

Document Search ​

Request shape ​

Response shape ​

Trash variant ​

Tenancy ​

Forcing filters per use case ​

Why a single endpoint, not one per consumer ​

Implementation notes ​

Frontend contract ​

Out of scope ​