Appearance
Document Search
GET /storage/documents and GET /storage/documents/trash are the single entry points for any UI that needs to list, filter, search, sort, or paginate documents. The chat attachment picker, the agent knowledge (RAG) selector, and the storage browser all call this endpoint with different query params — the server, not the client, decides which documents are visible.
The endpoint follows the canonical Query Filters & Pagination DSL. This page documents only the resource-specific contract.
Request shape
GET /storage/documents
?page=1
&limit=10
&sort=-createdAt,name
&mediaType[in]=image/png,image/jpeg,application/pdf
&documentCategoryId[null]=true
&uploaderId[in]=<uuidv7>,<uuidv7>
&favorite=true
&size[gte]=1000000
&createdAt[between]=2026-01-01T00:00:00Z,2026-04-01T00:00:00Z
&q=invoice
&tags=legal,contract| Param | Type | Operators | Notes |
|---|---|---|---|
page | int ≥ 1 | — | default 1 |
limit | int 1–100 | — | default 10, hard cap 100 |
sort | CSV | — | whitelist: name, size, createdAt, updatedAt, favorite. - prefix = DESC. Default order: createdAt DESC. |
mediaType | string | eq, in | ?mediaType[in]=image/png,image/jpeg. The API exposes mediaType for consistency with the chat data-attachment schema; internally maps to the type column. |
documentCategoryId | uuid v7 | eq, in, null | [null]=true to find uncategorised documents. |
uploaderId | uuid v7 | eq, in | non-nullable. |
favorite | boolean | eq | string forms "true"/"false" are coerced. |
size | int | gt, gte, lt, lte, between | bytes. |
createdAt, updatedAt | ISO datetime | gt, gte, lt, lte, between | between is [from, to]. |
q | string 1–100 | (top-level extra) | ILIKE substring match on name. Wildcards (%, _, \) are escaped — user input cannot inject pattern operators. |
tags | string CSV / repeated | (top-level extra) | Postgres jsonb array overlap (`? |
q and tags live outside the DSL on purpose: the DSL operator set is frozen at eq/in/gt/gte/lt/lte/between and explicitly does not cover ILIKE or array overlap (per the DSL spec).
Unknown top-level keys are silently stripped by the Zod schema.
Response shape
ts
{
items: IDocumentResponse[],
total: number,
page: number,
limit: number,
}IDocumentResponse is the standard storage DTO (id, name, size, type, favorite, tags, s3Key, createdAt, etc. — see DocumentResponseSchema).
Trash variant
GET /storage/documents/trash accepts the same query schema (no need to learn a different contract). The only behavioural difference is the deletedAt predicate applied at the repository level — IS NOT NULL instead of IS NULL. Permission, sort, filter, search, and pagination behave identically.
Tenancy
Every request resolves an access filter via AuthorizationService.buildAccessFilter('read', 'storage.document', { orgId, ... }). The resolved filter is merged into every where row at the repository layer alongside the user-supplied DSL filters and the deletedAt predicate. There is no way to bypass tenancy from the URL — even mediaType[in]=* returns only documents in the caller's org.
If the user has no matching access rules, the endpoint returns an empty page (items: [], total: 0) — never a 200 with foreign data and never a 403 (this matches existing behaviour for the previous non-paginated version).
Forcing filters per use case
The expected pattern: the client decides which filters to force based on context.
| Context | Forced filters |
|---|---|
| Chat composer attachment picker | mediaType[in]=<MIMEs the active agent's model supports> (derived in the panel via getSupportedMediaTypesForModalities(agent.brainConfig.model.architecture.input_modalities) from @repo/schemas). |
| Agent knowledge (RAG) upload picker | mediaType[in]=application/pdf,... (whatever the RAG pipeline supports). |
| Free storage browser | none — user controls all filters. |
The frontend cannot widen what the API allows: tenancy is server-side, the field map is the security boundary, and the schema rejects unknown keys. The frontend can only narrow results within the scope it has access to.
Why a single endpoint, not one per consumer
Earlier drafts of #296 considered a dedicated GET /agent/:id/attachable-documents for the chat use case. That was rejected because:
- The capability filter (which MIMEs an agent can ingest) is derivable from data the panel already has (
agent.brainConfig.model.architecture.input_modalities). No reason to query a second endpoint. - A dedicated endpoint would duplicate every storage browser feature (filters, sort, search, pagination) for the small benefit of one server-side computation.
- The same modal must serve the agent RAG upload picker, future "share file in chat" flows, and any list-from-storage UI. One source of truth scales; many bespoke endpoints do not.
Centralising on the storage list endpoint also keeps the security boundary in one place: the DOCUMENT_FIELD_MAP whitelist plus tenancy resolution. New consumers do not introduce new attack surface — they only force a subset of filters.
Implementation notes
- Field map:
apps/api/src/modules/storage/infrastructure/field-maps/document-field-map.tswhitelists which top-level keys translate to TypeORM columns. Anything not in the map is silently dropped bytoTypeOrmWhere. - Sort map: separate from the filter map.
type(mediaType) is filterable but not sortable; this is deliberate — sorting by MIME type buckets results unexpectedly and we never wanted to expose it. qescaping:escapeLikePatterndoubles\and prefixes%/_with\. Without this, a user typing50%_offwould match every file (%is a wildcard).tagsoverlap: implemented withRaw('tags ?| ARRAY[:...tagValues]', { tagValues })because the column isjsonb, not a native Postgres array. TypeORM'sArrayOverlapoperator targets native arrays — it does not work on jsonb.- Pagination: uses the shared
paginate(...)helper which callsfindAndCountunder the hood. Wrapped inrecordRepositoryOperationso the standard storage repository telemetry covers list operations the same way it covers single-row reads.
Frontend contract
The api-client exposes:
ts
listDocuments(client, query?: ISearchDocumentsQuery): Promise<PaginationResultDto<IDocumentResponse>>
listTrash(client, query?: ISearchDocumentsQuery): Promise<PaginationResultDto<IDocumentResponse>>sort is serialised to the canonical CSV form (-createdAt,name) before reaching axios — qs would otherwise nest the sort objects into keys the API does not accept (same workaround the operations search uses).
The React hooks useDocuments(query) and useTrashDocuments(query) include the query in their queryKey, so React Query refetches when filters change. Mutations (move-to-trash, update-metadata, …) invalidate the bare ['storage', 'documents'] / ['storage', 'trash'] prefix so every dependent variant is matched and refetched.
Out of scope
- Surfacing search/sort/paginator UI in the panel storage browser. The API is ready; the UI work tracks separately.
- Cursor-based pagination. The DSL is offset-only; revisit if scan windows on
documentsget expensive. - Full-text search beyond filename. Use Qdrant for semantic search (per the DSL spec —
qhere is intentionally a simple ILIKE).