Rotating Tenant-Specific JWT Signing Keys

Rotating the signing key behind a tenant's JWTs has to happen without invalidating every token already in flight, which means new and old keys must verify side by side for a bounded window. This page sits inside Tenant-Aware JWT & Token Management and covers how to rotate per-tenant keys with overlapping kid validity, where to keep the private material, and how to force an emergency rotation when a key leaks.

Problem Framing

A signing key is the root of trust for every token it has ever issued. If you replace it the naive way — generate a new keypair, swap it in, throw the old one away — every unexpired token signed by the old key fails verification the instant the swap lands. With 15-minute access tokens that is a fifteen-minute wave of 403s across the fleet; with refresh tokens it is worse. Rotation is therefore not a swap, it is an overlap: the new private key starts signing immediately while the old public key stays published long enough for the last token it signed to expire.

Per-tenant keys raise the stakes. If you isolate signing material by tenant — a separate keypair per tenant, or per tenant group — then a leaked key compromises only that tenant, and rotation can be scoped to that tenant alone instead of forcing a global re-issue. That isolation only pays off if the verification side can hold multiple keys per tenant at once and pick the right one per token. The mechanism that makes this work is the key id: every token header carries a kid, the verifier resolves kid to a public key, and rotation becomes "publish a new kid, keep the old kid until its tokens drain." Getting tenant_id into the verified payload is the prerequisite covered in JWT claims for tenant scoping best practices; this page assumes that scope is already correct and focuses on the key behind the signature.

The decision that matters is the grace window. Too short and you cut off live tokens; too long and a compromised key keeps verifying. The window should be exactly the maximum token lifetime plus a small margin for clock skew, and the private key should never live in application memory — it belongs in a KMS or HSM, which ties directly to per-tenant encryption and key management.

During the grace window both keys verify; only the new key signs. The old key is retired once every token it signed has expired.

Step-by-Step Guide

1. Generate the keypair inside KMS, export only the public key

The private key must be created and used inside the KMS or HSM so the raw bytes never touch your application. The app references the key by ARN/ID and asks KMS to sign; only the public half ever leaves. Tag each key with its tenant so rotation and audit can scope to one tenant.

# Create an asymmetric RSA-2048 signing key for one tenant in AWS KMS
aws kms create-key \
  --key-spec RSA_2048 \
  --key-usage SIGN_VERIFY \
  --tags TagKey=tenant_id,TagValue=3b7d4e21-tenant TagKey=purpose,TagValue=jwt-signing \
  --description "JWT signing key for tenant 3b7d4e21"

# Export only the public key (DER); never the private key
aws kms get-public-key --key-id alias/jwt-3b7d4e21 \
  --query PublicKey --output text | base64 -d > tenant-3b7d4e21-pub.der

2. Derive a stable `kid` and publish a per-tenant JWKS

The kid must be deterministic from the public key so the verifier and signer always agree. The RFC 7638 JWK thumbprint is the standard choice. Publish each tenant's active and grace-window public keys as a JWKS that verifiers fetch and cache.

import json, hashlib, base64

def b64u(data: bytes) -> str:
    return base64.urlsafe_b64encode(data).rstrip(b"=").decode()

def jwk_thumbprint(n: bytes, e: bytes) -> str:
    # RFC 7638: canonical members, sorted, no whitespace
    canonical = json.dumps(
        {"e": b64u(e), "kty": "RSA", "n": b64u(n)}, separators=(",", ":")
    ).encode()
    return b64u(hashlib.sha256(canonical).digest())

def to_jwk(n: bytes, e: bytes) -> dict:
    kid = jwk_thumbprint(n, e)
    return {"kty": "RSA", "use": "sig", "alg": "RS256", "kid": kid,
            "n": b64u(n), "e": b64u(e)}

# A tenant's JWKS holds both the current key and any in their grace window
jwks = {"keys": [to_jwk(n_current, e_current), to_jwk(n_previous, e_previous)]}
print(json.dumps(jwks))

3. Sign with the current key and stamp the `kid` in the header

When issuing a token, set the JWT header kid to the current key's thumbprint and ask KMS to produce the signature. The verifier will use that kid to pick the matching public key.

package main

import (
	"context"
	"github.com/aws/aws-sdk-go-v2/service/kms"
	"github.com/aws/aws-sdk-go-v2/service/kms/types"
)

// signWithKMS returns the raw signature; the JWT header must carry currentKid.
func signWithKMS(ctx context.Context, c *kms.Client, keyID string, signingInput []byte) ([]byte, error) {
	out, err := c.Sign(ctx, &kms.SignInput{
		KeyId:            &keyID,
		Message:          signingInput, // base64url(header) + "." + base64url(payload)
		MessageType:      types.MessageTypeRaw,
		SigningAlgorithm: types.SigningAlgorithmSpecRsassaPkcs1V15Sha256,
	})
	if err != nil {
		return nil, err
	}
	return out.Signature, nil
}

4. Resolve `kid` to a key on the verify path, accepting any published key

The verifier fetches the tenant's JWKS, indexes by kid, and verifies against whichever key the token names. Both the current and grace-window keys are present, so a token signed before rotation still verifies. Reject the token if its kid is absent from the JWKS.

import { createRemoteJWKSet, jwtVerify } from "jose";

// JWKS endpoint is scoped per tenant; cache honors Cache-Control
const jwksFor = (tenantId: string) =>
  createRemoteJWKSet(new URL(`https://auth.example.com/tenants/${tenantId}/jwks.json`));

export async function verify(tenantId: string, token: string) {
  const { payload, protectedHeader } = await jwtVerify(token, jwksFor(tenantId), {
    algorithms: ["RS256"],
    issuer: "auth.example.com",
  });
  if (payload.tenant_id !== tenantId) throw new Error("tenant_id mismatch");
  return { payload, kid: protectedHeader.kid };
}

5. Run the rotation: add the new key before removing the old

Rotation is a sequence, not an event. Publish the new public key to the JWKS first and let verifier caches pick it up. Then flip the signer to the new key. Only after one full token lifetime — when nothing signed by the old key can still be valid — remove the old key from the JWKS.

# 1. Generate + publish new key, JWKS now lists BOTH kids
aws kms create-key --key-spec RSA_2048 --key-usage SIGN_VERIFY \
  --tags TagKey=tenant_id,TagValue=3b7d4e21-tenant
# (regenerate and deploy jwks.json including the new public key)

# 2. Point the signer alias at the new key — new tokens use kid-B
aws kms update-alias --alias-name alias/jwt-3b7d4e21 --target-key-id <new-key-id>

# 3. After grace window (max token TTL + skew), drop kid-A from JWKS
#    then schedule the old KMS key for deletion
aws kms schedule-key-deletion --key-id <old-key-id> --pending-window-in-days 7

6. Forced rotation on compromise: retire immediately, accept the fallout

A scheduled rotation respects the grace window; a compromise cannot. If a private key leaks, remove its kid from the JWKS at once and disable the KMS key. Every token signed by it stops verifying immediately, forcing affected users to re-authenticate — which is the correct outcome. Scope it to the one tenant whose key leaked so the rest of the fleet is untouched.

# Emergency: no grace window. Disable the key and purge its kid from JWKS now.
aws kms disable-key --key-id <compromised-key-id>
# Redeploy jwks.json for this tenant WITHOUT the compromised kid.
# Bump the tenant policy version so refresh tokens are rejected too:
redis-cli SET "policy_ver:3b7d4e21-tenant" "$(($(redis-cli GET policy_ver:3b7d4e21-tenant)+1))"

Verification

Confirm that a token signed before rotation still verifies during the grace window, that the JWKS lists the expected kids, and that a retired kid is rejected.

# The header kid must be present in the tenant's published JWKS
KID=$(echo "$JWT" | cut -d. -f1 | base64 -d 2>/dev/null | jq -r .kid)
curl -s "https://auth.example.com/tenants/3b7d4e21-tenant/jwks.json" \
  | jq -e --arg kid "$KID" '.keys[] | select(.kid == $kid)' >/dev/null \
  && echo "kid is published" || echo "kid NOT in JWKS — token will fail"

import pytest
from jose import jwt
from jose.exceptions import JWTError

def test_pre_rotation_token_verifies_during_grace(token_signed_with_old_key, jwks_with_both_keys):
    # During the grace window the old kid is still in the JWKS
    claims = jwt.decode(token_signed_with_old_key, jwks_with_both_keys,
                        algorithms=["RS256"], issuer="auth.example.com")
    assert claims["tenant_id"] == "3b7d4e21-tenant"

def test_token_with_retired_kid_is_rejected(token_signed_with_old_key, jwks_after_retirement):
    with pytest.raises(JWTError):
        jwt.decode(token_signed_with_old_key, jwks_after_retirement,
                   algorithms=["RS256"], issuer="auth.example.com")

Log the kid alongside tenant_id on every verification so an audit trail can show exactly when traffic moved from the old key to the new one.

Failure Modes & Gotchas

Old key removed before tokens drained. Symptom: a burst of 403s right after rotation. Root cause: the grace window was shorter than the token lifetime, or the JWKS was redeployed too early. Fix: keep the old kid for at least max token TTL plus clock skew before removing it.
Verifier JWKS cache too aggressive. Symptom: tokens signed by the new key fail until caches expire. Root cause: verifiers cached the old JWKS past the new key's introduction. Fix: publish the new key before signing with it, and set a Cache-Control max-age shorter than the rotation lead time.
Missing or non-deterministic kid. Symptom: verifier cannot select a key and rejects valid tokens. Root cause: kid omitted from the header, or generated randomly so signer and JWKS disagree. Fix: derive kid from the RFC 7638 thumbprint and always stamp it in the header.
Private key in application memory. Symptom: a single host compromise leaks the signing key for a tenant. Root cause: the keypair was generated in the app and loaded from an env var. Fix: create and use the key inside KMS/HSM; export only the public half.

FAQ

How long should the grace window be? Exactly the maximum lifetime of any token the old key signed, plus a small margin for clock skew — typically the access-token TTL plus a minute or two. If refresh tokens are also signed by the same key, the window must cover the refresh-token lifetime, which is usually a reason to sign refresh tokens with a separate, longer-lived key.

Should every tenant get its own signing key, or share one? Per-tenant keys give you tenant-scoped blast radius: a leak compromises and forces rotation for only that tenant. The cost is more KMS keys and a per-tenant JWKS. A common middle ground is per-tenant-group keys for small tenants and dedicated keys for large or regulated ones, which pairs naturally with per-tenant encryption and key management.

What happens to refresh tokens during a forced rotation? A forced rotation invalidates access tokens immediately by pulling the kid, but refresh tokens may be checked against server-side state rather than only the signature. Bump the per-tenant policy version (or revoke the refresh-token family) so the next refresh is rejected and the user must re-authenticate.