Invalidating Tenant Sessions on Role Change
When an admin demotes a user or strips a permission, every session that user already holds keeps the old authority until something forces a re-check — and the gap between the change and the enforcement is a privilege-escalation window. This guide sits within Session Isolation & State Management and shows how to drive a role change into live sessions deterministically, using a session epoch claim, a Redis revocation list, and pub/sub fan-out, while keeping the staleness window measured in seconds, not hours.
Problem Framing
Stateless access tokens are fast precisely because nobody asks the database on every request. The same property makes them stale on purpose: a JWT signed at 09:00 with roles: ["admin"] still asserts admin at 09:45, even after the role was revoked at 09:15. Three things break when a role changes mid-session.
First, long-lived tokens outlive the grant. A one-hour access token revoked five minutes after issue stays valid for fifty-five more minutes unless you add a revocation check. Second, revoking one device misses the others. A user with an active web session, a mobile app, and a CLI token has three independent credentials; deleting one server-side record leaves the rest. Third, the check itself must be cheap. If enforcing freshness means a database round-trip per request, you have thrown away the reason you chose stateless tokens at all.
The fix is a monotonic counter — a session epoch — stored per user and embedded in every token. A role change bumps the counter; any token carrying an older epoch is rejected on its next request. The cheap check is a single read of the current epoch, which lives in Redis and is broadcast to every node the instant it changes. The flow below shows the write path (role change) and the read path (request validation) meeting at the shared epoch.
Step-by-Step Guide
1. Embed a session epoch in every token
When you mint an access token, read the user's current epoch and stamp it into a custom claim. The epoch is per tenant and per user so a role change in one tenant never invalidates a user's sessions elsewhere. Aligning these claims with your wider scheme is covered in JWT claims for tenant scoping.
import { SignJWT } from "jose";
async function mintAccessToken(tenantId: string, userId: string, roles: string[]) {
const epoch = await redis.get(`epoch:${tenantId}:${userId}`) ?? "0";
return new SignJWT({ tid: tenantId, roles, sep: Number(epoch) }) // sep = session epoch
.setProtectedHeader({ alg: "RS256", kid: currentKid })
.setSubject(userId)
.setExpirationTime("15m")
.sign(privateKey);
}
2. Bump the epoch atomically on any authority change
Every code path that alters a user's effective permissions — role assignment, role removal, group membership, account suspension — must increment the epoch. Use INCR so the operation is atomic and never collides under concurrent admin actions. Source the change from your RBAC layer; see role-based access control per tenant.
async function revokeRole(tenantId: string, userId: string, role: string) {
await db.deleteUserRole(tenantId, userId, role);
// Single source of truth for "every prior token for this user is now stale".
const next = await redis.incr(`epoch:${tenantId}:${userId}`);
await redis.publish("epoch:changed", JSON.stringify({ tenantId, userId, epoch: next }));
}
3. Validate the epoch on each request, against a cached value
On every authenticated request, compare the token's sep claim to the user's current epoch. Reading Redis per request defeats the point of stateless tokens, so cache the epoch in process memory with a short TTL. That TTL is the upper bound on your staleness window: with a 5-second TTL, a revoked token is honored for at most 5 seconds longer than a node that received the pub/sub message.
const epochCache = new Map<string, { value: number; expires: number }>();
async function currentEpoch(tenantId: string, userId: string): Promise<number> {
const k = `${tenantId}:${userId}`;
const hit = epochCache.get(k);
if (hit && hit.expires > Date.now()) return hit.value;
const value = Number(await redis.get(`epoch:${k}`) ?? "0");
epochCache.set(k, { value, expires: Date.now() + 5_000 }); // 5s staleness bound
return value;
}
async function assertFresh(claims: { tid: string; sub: string; sep: number }) {
if (claims.sep < (await currentEpoch(claims.tid, claims.sub))) {
throw new Error("token revoked: stale session epoch");
}
}
4. Fan out epoch changes with pub/sub for instant invalidation
The cache TTL bounds the worst case, but you want the common case to be immediate. Subscribe every node to the epoch:changed channel and have it overwrite or evict its local cache entry the moment a change is published. Now the staleness window collapses to network latency plus message delivery — typically single-digit milliseconds — and the TTL only matters if a node missed the message.
const sub = redis.duplicate();
await sub.subscribe("epoch:changed");
sub.on("message", (_channel, raw) => {
const { tenantId, userId, epoch } = JSON.parse(raw);
// Overwrite immediately; do not wait for the TTL to lapse.
epochCache.set(`${tenantId}:${userId}`, { value: epoch, expires: Date.now() + 5_000 });
});
5. Maintain a short-lived revocation list for blast-radius events
Epoch bumps handle per-user changes. For wider events — a leaked signing key, a compromised tenant admin, a "log everyone out" action — keep an explicit revocation list keyed by token ID (jti) or by a tenant-wide floor epoch. Store entries with a TTL equal to the access-token lifetime so the set self-prunes and never grows unbounded.
async function revokeToken(jti: string, ttlSeconds: number) {
// TTL == access-token lifetime: once the token would expire anyway, drop the entry.
await redis.set(`revoked:jti:${jti}`, "1", "EX", ttlSeconds);
}
async function isRevoked(jti: string): Promise<boolean> {
return (await redis.exists(`revoked:jti:${jti}`)) === 1;
}
6. Force token re-mint at the refresh boundary
The access-token check stops a stale token from acting; you also need the next token to carry the new epoch. Have the refresh endpoint re-read the epoch and reject the refresh outright if the user's epoch advanced past a hard floor, which logs the device out and forces full re-authentication. The Redis mechanics behind this revocation are detailed in using Redis for tenant session isolation.
async function refresh(refreshClaims: { tid: string; sub: string; sep: number }) {
const epoch = await currentEpoch(refreshClaims.tid, refreshClaims.sub);
if (refreshClaims.sep < epoch - 5) {
throw new Error("refresh denied: re-authentication required");
}
return mintAccessToken(refreshClaims.tid, refreshClaims.sub, await loadRoles(refreshClaims));
}
Verification
Prove that a role change is enforced before the access token would naturally expire. The test bumps the epoch on a live session and asserts the next request is rejected without waiting for token expiry.
test("role change invalidates an outstanding token within the staleness bound", async () => {
const token = await mintAccessToken("acme", "u-42", ["admin"]);
const claims = await verify(token); // sep = 0
await expect(assertFresh(claims)).resolves.toBeUndefined();
await revokeRole("acme", "u-42", "admin"); // INCR -> epoch 1, publish
await waitForCacheEvict("acme:u-42"); // pub/sub delivered
await expect(assertFresh(claims)).rejects.toThrow(/stale session epoch/);
});
test("epoch is isolated per tenant", async () => {
await revokeRole("acme", "u-42", "admin");
// Same user id in a different tenant is untouched.
expect(await currentEpoch("globex", "u-42")).toBe(0);
});
You can also confirm propagation directly against Redis: watch the channel in one terminal while triggering a revocation in another.
# Terminal 1: observe the fan-out
redis-cli SUBSCRIBE epoch:changed
# 1) "message" 2) "epoch:changed" 3) "{\"tenantId\":\"acme\",\"userId\":\"u-42\",\"epoch\":1}"
# Terminal 2: inspect the stored epoch after a revocation
redis-cli GET epoch:acme:u-42
# "1"
Failure Modes & Gotchas
- Stale tokens linger for the full token lifetime. Symptom: a demoted user keeps admin power for an hour. Root cause: no epoch check on requests, so only natural expiry enforces the change. Fix: validate
sepagainst the cached epoch on every authenticated request. - Only one device gets logged out. Symptom: revoking access on the web leaves the mobile app authorized. Root cause: revocation deleted a single session record instead of bumping the per-user epoch. Fix: bump the epoch, which invalidates every token regardless of device.
- Cross-tenant over-invalidation. Symptom: changing a user's role in one tenant logs them out everywhere. Root cause: a global epoch key (
epoch:{userId}) shared across tenants. Fix: key the epoch asepoch:{tenant_id}:{user_id}. - Redis read on every request. Symptom: latency and Redis load scale linearly with traffic. Root cause: no local cache, so each request hits Redis for the epoch. Fix: cache per process with a short TTL and refresh via pub/sub.
FAQ
How long can a revoked session stay valid? At most the local cache TTL after the pub/sub message is delivered, and in the rare case a node misses the message, exactly one cache TTL. With a 5-second TTL the worst-case staleness window is about 5 seconds, which you tune against the per-request Redis read you are willing to pay.
Why bump an epoch instead of deleting the session record? Stateless access tokens are not stored server-side, so there is no record to delete. The epoch is a single monotonic value that invalidates every outstanding token for a user across all devices at once, without enumerating them, and the per-request check is a single integer comparison.
Do I still need short token lifetimes if I have epoch checks? Yes. Short lifetimes bound your exposure if the epoch cache, pub/sub, or revocation list fails, and they cap how long the refresh path can carry a stale identity. Treat the epoch as the fast path and short expiry as the backstop.