Skip to content

Agency Custom Domains

Overview

Agencies on DaraMex are identified by a native subdomain (agency-slug.daramex.org). This feature extends the platform so that agency admins can connect their own branded subdomain — for example, booking.kfc.com — so their end-users book appointments without ever seeing a daramex.org URL.

From the platform's perspective, a custom domain is an additional alias for an existing agency. All existing multi-tenancy mechanics (JWT-embedded agencyId, x-agency-id header, per-agency data isolation) remain unchanged. The only new concern is: at first page load, the frontend must resolve an arbitrary hostname to an agencyId. This is handled by a new lookup path on the existing public endpoint GET /identity/agency/id, backed by a dedicated table and a Redis cache.

Infrastructure TLS is provisioned by the platform operator in Dokploy (see Operator Runbook — Provision TLS for Custom Domain). The application layer is TLS-agnostic: the data model, verification flow, and API are identical regardless of which TLS strategy is used.

v1 constraints:

  • One custom domain per agency (hard limit enforced in the service layer)
  • Subdomains only — apex domains (kfc.com) are not supported in v1
  • No wildcard custom domains
  • TLS provisioning is a manual operator step

End-to-End Flow

1. Agency admin enters subdomain (e.g. booking.kfc.com)
   └── POST /identity/agency/custom-domain
        └── Validates hostname (RFC 1123, rejects apex/wildcard/reserved)
            └── Creates record with status = pending_dns
                └── Returns DNS instructions to the admin:
                      Step 1 (verify ownership):
                        TXT  _daramex-verify.booking.kfc.com  →  dmx_<64hex>
                      Step 2 (route traffic):
                        CNAME  booking.kfc.com  →  proxy.daramex.org

2. Admin adds both DNS records at their DNS provider
   (DNS propagation can take minutes to 48 hours)

3. Admin clicks "Verify Now" in the dashboard
   └── POST /identity/agency/custom-domain/:domainId/verify
        └── API fans out 4 DNS lookups in parallel (Promise.allSettled, 5 s overall budget):
             ├── TXT at _daramex-verify.booking.kfc.com  (ownership token)
             ├── CNAME at booking.kfc.com                (routing record)
             ├── NS at kfc.com                           (provider detection)
             └── A at booking.kfc.com                    (conflict / proxy check)

             ├── NS result → DnsProviderDetector → dnsProvider persisted

             ├── TXT missing / mismatch  →  status = failed, failedReason set
             └── TXT ok → CNAME evaluated:
                  ├── CNAME missing              → cname_missing
                  ├── CNAME wrong target         → cname_wrong_target
                  ├── CNAME correct + CF proxy   → cname_proxied
                  ├── CNAME correct + A conflict → conflicting_a
                  └── All ok                     → status = verified, Redis cache set (TTL 300 s)

4. Operator provisions TLS in Dokploy
   └── (Manual step — see runbook linked above)
       Traefik obtains a Let's Encrypt cert for booking.kfc.com automatically

5. End-user visits booking.kfc.com
   └── Panel AgencyProvider detects non-daramex hostname
        └── GET /identity/agency/id?hostname=booking.kfc.com
             ├── Redis hit  →  returns agencyId immediately (<5 ms)
             └── Redis miss  →  DB lookup → cache population → returns agencyId
                  └── Agency context bootstraps; all existing features work normally

6. Admin removes domain
   └── DELETE /identity/agency/custom-domain/:domainId
        └── status = removed, removed_at set
            Redis key deleted immediately
            48-hour cooldown before hostname can be reclaimed by any agency

API Endpoints

All endpoints are under the /identity/ prefix and require the agency-admin role on the authenticated agency. JWT authentication is required. The target agency is derived from the session context, so no agencyId path parameter is needed.

MethodPathPurposeAuth
POST/identity/agency/custom-domainRegister a new custom domain; returns record + DNS instructionsAgency admin
POST/identity/agency/custom-domain/:domainId/verifyTrigger on-demand DNS verification; rate-limited (5/hr per domain, 10/hr per agency)Agency admin
DELETE/identity/agency/custom-domain/:domainIdRemove domain; triggers cache invalidation and 48-h cooldownAgency admin
GET/identity/agency/custom-domainFetch current custom domain record for dashboard displayAgency admin
GET/identity/agency/id?hostname=<hostname>Resolve hostname → agencyId (cache-through; public endpoint, extended from existing ?slug=)Public

The GET /identity/agency/id endpoint accepts at most one of ?slug= or ?hostname= (omit both to resolve the default agency). Query validation uses the shared Zod schema getAgencyIdQuerySchema in @repo/schemas; providing both parameters or an invalid hostname shape returns 400 Bad Request with Zod issue details (see API Zod request validation).


Data Model

Table: identity.agency_custom_domains

ColumnTypeConstraintsNotes
idUUID (v7)PKSortable by creation time
agency_idUUIDNOT NULL, FK → organizations.idCascade delete
hostnameVARCHAR(253)NOT NULLLowercased, trailing-dot stripped
statusENUMNOT NULL, default pending_dnsSee state machine below
verification_tokenVARCHAR(72)NOT NULLdmx_ + 64-char hex; generated on creation
verified_atTIMESTAMPTZNULLSet when status → verified
failed_reasonVARCHAR(512)NULLTyped enum — see failedReason values below
dns_providerVARCHAR(64)NULLDetected from NS records; see DnsProvider enum
removed_atTIMESTAMPTZNULLSet on soft delete; used for cooldown enforcement
created_atTIMESTAMPTZNOT NULL, default now()
updated_atTIMESTAMPTZNOT NULL, default now()

Indexes:

IndexExpressionWhere clausePurpose
uq_agency_custom_domain_hostname_activehostname (UNIQUE)status <> 'removed'Prevents two active agencies from claiming the same hostname
uq_agency_custom_domain_agency_activeagency_id (UNIQUE)status <> 'removed'Enforces one-per-agency v1 limit at the DB level
ix_agency_custom_domain_statusstatusStatus filter queries
ix_agency_custom_domain_hostname_removed(hostname, removed_at)status = 'removed'Cooldown check queries

Both uniqueness constraints use partial indexes: removing a domain frees the hostname for re-registration (after the 48-hour cooldown) and allows the same agency to add a new domain.


State Machine

                            [create]


                         pending_dns ◄──────────────────────────┐
                               │                                 │
                    [verify click — DNS lookup]                   │
                               │                                 │
               ┌───────────────┴──────────────────┐              │
               │ TXT found + token matches         │ TXT missing  │
               ▼                                  ▼  or mismatch  │
           verified                            failed ────────────┘
               │                                  │    [retry → pending_dns]
               │ [remove]                         │ [remove]
               ▼                                  ▼
           removed ◄──────────────────────────────┘

               │ [remove while pending_dns]
         pending_dns

Allowed transitions:

FromToTrigger
pending_dnsverifiedVerify click + DNS TXT matches token
pending_dnsfailedVerify click + DNS missing/mismatch
pending_dnsremovedAdmin clicks Remove
failedpending_dnsAdmin clicks Retry
failedverifiedRetry click + DNS TXT matches token
failedremovedAdmin clicks Remove
verifiedremovedAdmin clicks Remove (with confirmation)

Removed domains are not addressable. Any attempt to transition a removed record returns 404 Not Found.


Error Catalog

All errors follow the standard AppError / Result pattern used throughout the identity module. HTTP status codes map from the error type as shown.

errorCodeHTTPMeaning
CUSTOM_DOMAIN_INVALID_HOSTNAME400Input does not pass RFC 1123 validation (empty, bad chars, >253 chars)
APEX_DOMAIN_NOT_SUPPORTED400Input is an apex domain (too few dots) — not supported in v1
WILDCARD_NOT_SUPPORTED400Input contains *
RESERVED_HOSTNAME400Input matches a reserved platform hostname (daramex.org, localhost, etc.)
AGENCY_ALREADY_HAS_CUSTOM_DOMAIN409Agency already has a non-removed custom domain (v1 limit: 1)
HOSTNAME_ALREADY_REGISTERED409Hostname is claimed by another agency in a non-removed state
HOSTNAME_COOLDOWN_ACTIVE409Hostname was removed less than 48 hours ago; includes retry_after
CUSTOM_DOMAIN_NOT_FOUND404Domain record does not exist or is in removed state
CUSTOM_DOMAIN_INVALID_STATE409Attempted transition not allowed by the state machine
CUSTOM_DOMAIN_DNS_LOOKUP_FAILED422DNS lookup returned timeout, NXDOMAIN, SERVFAIL, or transient error
CUSTOM_DOMAIN_TOKEN_MISMATCH422TXT record found but value does not match the stored token
CUSTOM_DOMAIN_VERIFY_RATE_LIMITED429Rate limit exceeded (5/hr per domain or 10/hr per agency)

Invalid query combinations for GET /identity/agency/id (e.g. both slug and hostname, or a malformed hostname) are rejected at the controller with 400 and Zod validation issues — they do not emit a dedicated IDENTITY.* AppError code.


failedReason Enum Values

The failedReason field is a typed enum (Zod z.enum). Any value outside this list is rejected at the schema boundary.

ValueWhen set
missing_txtTXT record not found (NODATA or NXDOMAIN)
token_mismatchTXT record present but value does not match the stored token
dns_timeoutAny DNS resolver call timed out (overall 5 s budget exceeded)
dns_errorDNS server error (SERVFAIL or unknown DNS error)
cname_missingCNAME record not found after TXT validates
cname_wrong_targetCNAME exists but points to an unexpected target hostname
cname_proxiedCNAME target resolves to Cloudflare proxy IPs (suspected orange-cloud)
conflicting_aA record exists alongside the CNAME (DNS misconfiguration)

DNS Provider Detection

The verify command resolves NS records for the root domain (kfc.com from booking.kfc.com) in parallel with the TXT and CNAME lookups. The DnsProviderDetector service matches each nameserver hostname against a regex table and returns the first matching provider slug.

Provider slugNameserver pattern
cloudflare*.cloudflare.com
godaddy*.domaincontrol.com
namecheap*.registrar-servers.com
route53`*.awsdns-N.(com
digitalocean*.digitalocean.com
hostgator*.hostgator.com

If no NS record matches, dnsProvider is null. The panel shows a generic tutorial in that case.

Provider detection is best-effort: if the NS lookup times out or errors, dnsProvider is null and verification continues normally. The NS timeout does not trigger dns_timeout as a failedReason.


DTO now Field and Clock-Skew Mitigation

The GET and verify response DTOs include a now: ISO datetime field set to the server's current time at response generation. The panel uses now - updatedAt (both from the server) to determine how long the domain has been in its current state — for example, to show the troubleshooting checklist after 3 minutes in pending_dns.

This avoids client clock-skew and timezone bugs that would occur if the panel used Date.now().


Panel Live Polling

The panel hooks into TanStack Query with conditional polling:

ts
refetchInterval: (query) => {
  const status = query.state.data?.status;
  if (!status || status === 'verified' || status === 'removed') return false;
  return 15_000; // poll every 15s while pending_dns or failed
}
ConditionBehavior
status === 'verified'Polling stops immediately
status === 'removed'Polling stops immediately
status === 'pending_dns' or 'failed'Polls every 15 s
Window focus eventAlways refetches (any status)
After POST verifyQuery invalidated → immediate refetch

GET /identity/agency/custom-domain is not rate-limited. The 5/hr rate limit applies only to POST /:id/verify. Polling 20 times per minute will not return 429.


Caching

Hostname → agencyId resolution is on the critical path of every page load on a custom domain. Redis caches the result to avoid a DB round-trip on each request.

Key patternValueTTLSet whenInvalidated when
agency:host:<hostname>{ id, name } (JSON)300 sDomain transitions to verified (or first cache miss from DB)Domain transitions away from verified or is removed
agency:host:<hostname> (negative sentinel)"__none__" (string)30 sCache miss + no verified record in DB— (TTL expiry)

The negative sentinel (30 s TTL) prevents DB hammering on unresolvable hostnames. A 404 response is served immediately from Redis until the sentinel expires.


Security Model

ConcernMitigation
Domain hijackingTXT token = 32 cryptographically random bytes (hex) — unguessable
Re-registration race48-hour cooldown after removal; pending/failed domains block re-registration by others
DNS verification abuseRate limit: 5 attempts/hr per domain, 10 attempts/hr per agency
Host header injection / SSRF?hostname= param is validated against the agency_custom_domains table (verified only); the Host HTTP header is never used for tenant resolution
Hostname normalizationInput is lowercased and trailing-dot stripped before any storage or comparison
Token comparison timingVerificationToken.safeEquals wraps crypto.timingSafeEqual with a length guard
Apex domain fragilityApex inputs are rejected at the API layer (see error catalog)

v1 Limitations

One domain per agency

The application enforces a maximum of one active (non-removed) custom domain per agency. This is an intentional v1 simplification: the database schema supports multiple rows per agency_id, but the service layer rejects a second registration with AGENCY_ALREADY_HAS_CUSTOM_DOMAIN. The limit will be lifted in a future iteration when the UI and business rules are ready for multi-domain management.

Subdomains only — no apex domain support

Apex domains (e.g. kfc.com) are rejected at the API layer. CNAME records are not allowed at the zone apex per RFC 1912; the agency would need an A record pointing to the server IP, which is fragile (IP changes break all apex-configured agencies) and requires provider-specific workarounds (ALIAS, ANAME, Cloudflare CNAME flattening). Apex support is deferred to v2 and is coupled to the TLS strategy decision (see Pending v2 — Apex Domain Support).

No wildcard custom domains

*.kfc.com subdomains require wildcard TLS provisioning per custom root domain and additional routing complexity. No use case was identified for v1.

Manual TLS provisioning by the platform operator

When an agency's domain reaches verified status, the platform operator must manually register the domain in Dokploy so Traefik can provision a Let's Encrypt certificate. This step does not scale beyond ~20 concurrent active domains. See Operator Runbook — Provision TLS for Custom Domain.

No background polling for auto-verification

Verification is on-demand only (the agency admin clicks a button). There is no background job that polls DNS and auto-transitions domains. This is intentional: no job infrastructure exists today, DNS TTLs can be up to 48 hours, and on-demand verification is predictable.

No HTTP file challenge

HTTP-based domain validation (.well-known/acme-challenge/) has a chicken-and-egg problem: the domain must already route to DaraMex before the challenge works, but routing requires verification first. TXT-record challenge has no such dependency.


Pending v2 Items

These decisions were explicitly deferred from v1. The application layer (data model, handlers, API contract) was designed to be infra-agnostic so these can be implemented without changing the core domain logic.

TopicSummary
TLS automationReplace manual Dokploy registration with an automated option (Cloudflare proxy, Caddy on-demand TLS, or Dokploy API integration triggered on AgencyCustomDomainVerified event). Revisit when manual provisioning becomes a bottleneck (>10 active domains or delayed registrations). Engram topic: sdd/agency-custom-domains/pending/tls-cloudflare-alternative.
Apex domain supportAllow agencies to connect root domains (e.g. kfc.com). Requires relaxing Hostname validation, branching DNS instructions (A record vs CNAME), and is strongly coupled to the TLS automation decision above (Cloudflare flattening simplifies apex support significantly). Engram topic: sdd/agency-custom-domains/pending/apex-domain-support.
Multiple domains per agencyLift the one-per-agency limit; requires UI redesign for multi-domain management.
Wildcard custom domainsSupport *.kfc.com; requires wildcard TLS per custom root domain.
Background auto-verificationBackground job (e.g. BullMQ) that polls DNS and auto-transitions pending_dns domains without requiring admin action.