Propagating Tenant Context Across Async Jobs

A background worker has no request, no token, and no session, so the tenant identity that scoped the enqueueing call evaporates the moment work crosses the queue boundary. This page is part of the tenant context injection strategies reference, and it covers how to serialize the tenant into the job payload, re-establish it on dequeue, and hard-fail any job that arrives without a bound tenant.

Problem Framing

Tenant context inside a synchronous request usually rides on request-local state: a value resolved by middleware and stashed in async-local storage, a thread local, or an immutable context object passed down the call stack. That state is alive only while the request is alive. Enqueue a job and the request returns; the worker that picks it up runs in a different process, often a different host, minutes or hours later. None of the request-local state survives.

The failure that follows is silent and dangerous. A worker that needs a tenant but cannot find one tends to do one of three wrong things: it reads "current tenant" from some leftover global and processes tenant B's invoice under tenant A; it falls back to a default or "system" tenant and writes rows nobody can see; or it queries unscoped and touches every tenant at once. Each is a cross-tenant breach with a 200-equivalent success log.

The rule is blunt: the tenant must travel inside the job, never be inferred at the worker. The producer already knows the tenant — it resolved it for the originating request. Serialize that tenantId into the payload, treat it as a required field, re-bind it into the worker's context before any business logic runs, and refuse to execute a job whose payload carries no tenant.

This holds for every queue technology, because the queue is deliberately tenant-blind. BullMQ, Sidekiq, Celery, SQS, and Kafka all move an opaque body from producer to consumer; none of them knows or cares what a tenant is. That is the right design — the queue's job is delivery, not authorization — but it means the boundary is entirely your responsibility on both ends. The producer is the only place with a trustworthy tenant, and the worker is the only place that can re-establish it. The diagram below traces that handoff across the queue.

The producer owns the tenant and writes it into the payload; the worker re-binds it and rejects any job that arrives without one.

Step-by-Step Guide

1. Make `tenantId` a required field of the job payload

Define a payload type where the tenant is mandatory and validated at the boundary, not an optional extra. A schema validator rejects malformed or tenant-less payloads before they reach a handler. Resolve the tenant from request-local context at enqueue time — never let the producer guess.

import { z } from 'zod';
import { Queue } from 'bullmq';
import { getCurrentTenantId } from './tenant-context';

const InvoiceJob = z.object({
  tenantId: z.string().uuid(),
  invoiceId: z.string().uuid(),
});
export type InvoiceJob = z.infer<typeof InvoiceJob>;

const queue = new Queue('invoices', { connection: { host: 'redis' } });

export async function enqueueInvoice(invoiceId: string) {
  const payload: InvoiceJob = {
    tenantId: getCurrentTenantId(), // throws if no request-bound tenant
    invoiceId,
  };
  InvoiceJob.parse(payload);
  await queue.add('generate', payload, { jobId: `${payload.tenantId}:${invoiceId}` });
}

2. Re-establish tenant context on dequeue

The worker's first action is to lift tenantId out of the payload and bind it into the same async-local store the rest of the code reads from. Run the handler inside that bound scope so every downstream query, ORM call, and log line sees the correct tenant. Use AsyncLocalStorage so concurrent jobs never share state.

import { AsyncLocalStorage } from 'node:async_hooks';
import { Worker } from 'bullmq';

export const tenantStore = new AsyncLocalStorage<{ tenantId: string }>();

new Worker<InvoiceJob>('invoices', async (job) => {
  const { tenantId } = InvoiceJob.parse(job.data); // re-validate, never trust
  return tenantStore.run({ tenantId }, async () => {
    // every call inside here reads tenantId from tenantStore
    await generateInvoice(job.data.invoiceId);
  });
}, { connection: { host: 'redis' }, concurrency: 10 });

3. Refuse to run a job with no bound tenant

Make the absence of a tenant a hard error at every layer that could swallow it. The accessor that business code calls must throw, not return a default. A job that reaches a handler without a tenant is a bug or an attack, and the only safe response is to fail it.

export function getCurrentTenantId(): string {
  const ctx = tenantStore.getStore();
  if (!ctx?.tenantId) {
    throw new Error('No tenant bound to execution context — refusing to proceed');
  }
  return ctx.tenantId;
}

4. Scope every data access to the bound tenant

With the tenant in context, the data layer reads it implicitly so individual jobs cannot forget the filter. The same discipline used for synchronous requests applies here: set the tenant on the session for row-level security, or inject it through an ORM extension.

-- Bind the tenant for this connection's RLS policies before any query.
SET LOCAL app.current_tenant = '8f3c1b9e-7a40-4d2e-9c11-2b6e4f0a1d55';

SELECT id, total
FROM invoices
WHERE id = $1; -- RLS adds: AND tenant_id = current_setting('app.current_tenant')::uuid

5. Propagate the tenant through downstream emissions

Jobs frequently produce more work — they emit metering events, enqueue follow-up jobs, or publish to a stream. Each of those must carry the tenant forward the same way, so the chain never relies on inference. A scheduled or fan-out job that processes many tenants must establish a fresh bound scope per tenant rather than reusing one context across the whole batch, otherwise a single re-bind miss contaminates the rest of the run. Usage events in particular must stay tenant-stamped end to end; see usage metering event pipelines for the ingestion side.

from kafka import KafkaProducer
import json

def emit_usage_event(tenant_id: str, metric: str, quantity: int) -> None:
    if not tenant_id:
        raise ValueError("refusing to emit usage event without tenant_id")
    producer = KafkaProducer(bootstrap_servers="kafka:9092")
    producer.send(
        "usage-events",
        key=tenant_id.encode(),  # partition by tenant
        value=json.dumps({"tenantId": tenant_id, "metric": metric, "quantity": quantity}).encode(),
    )
    producer.flush()

Verification

Two checks catch the regressions that matter: a tenant-less job must be rejected rather than run, and a handler must observe exactly the tenant its payload carried. The first proves the guard fires; the second proves the re-bind is isolated across concurrent jobs.

import { expect, test } from 'vitest';

test('handler refuses a payload with no tenant', async () => {
  await expect(
    tenantStore.run({ tenantId: '' }, () => getCurrentTenantId()),
  ).rejects.toThrow(/No tenant bound/);
});

test('concurrent jobs see their own tenant', async () => {
  const observed = await Promise.all([
    tenantStore.run({ tenantId: 'tenant-a' }, async () => getCurrentTenantId()),
    tenantStore.run({ tenantId: 'tenant-b' }, async () => getCurrentTenantId()),
  ]);
  expect(observed).toEqual(['tenant-a', 'tenant-b']);
});

A passing run confirms the guard and the per-job isolation:

$ npx vitest run async-tenant
 ✓ handler refuses a payload with no tenant (4 ms)
 ✓ concurrent jobs see their own tenant (2 ms)

 Test Files  1 passed (1)
      Tests  2 passed (2)

Failure Modes & Gotchas

Tenant inferred from a worker global. Symptom: a job writes rows under the wrong tenant under concurrency. Root cause: the worker reads a module-level "current tenant" instead of the payload. Fix: bind from job.data.tenantId per job and ban any worker-scoped tenant variable.
Silent default tenant fallback. Symptom: rows land under a system or default tenant nobody owns. Root cause: the accessor returns a fallback when context is empty. Fix: make getCurrentTenantId() throw and let the job dead-letter.
Lost context across await. Symptom: queries after the first await see no tenant. Root cause: the handler escaped the AsyncLocalStorage.run scope or used a callback API that breaks the async chain. Fix: keep all work inside the run callback and avoid bare event-emitter callbacks.
Tenant deleted before the job runs. Symptom: a delayed job processes a tenant that has been offboarded. Root cause: no liveness check on dequeue. Fix: verify the tenant still exists and is active before executing, and discard the job if not.

FAQ

Why not look up the tenant from the entity ID inside the worker? Because the lookup is itself a tenant-scoped query, and you have no trusted tenant to scope it with — so it runs unscoped and can return any tenant's row. Carrying tenantId in the payload keeps the boundary explicit and lets the data layer enforce it from the first query.

Should the tenant ID also go in message headers or metadata? Putting it in the validated body is sufficient and keeps it inside your schema check. Adding it to a header or metadata field is useful for routing, partitioning, and observability, but the body remains the source of truth the handler re-validates.

How do retries and dead-letter queues interact with tenant context? The original payload, including tenantId, is preserved across retries, so each attempt re-binds the same tenant. A job that fails validation should go straight to the dead-letter queue with the tenant recorded, so an operator can triage it without it ever executing unscoped.