Subscription & Plan Enforcement

Plan enforcement is the layer that turns a subscription record into runtime behavior — granting or denying every API call, seat assignment, and feature toggle according to what the tenant pays for — and it operates within the broader Tenant Billing & Usage Metering framework that ties metered consumption back to revenue. A plan that exists only in your pricing page and a Stripe product is not enforcement; enforcement is the code that refuses the 5,001st row, the 11th seat, and the call to an endpoint the Starter tier never bought.

The premise is unforgiving: every limit a salesperson promises is a check an engineer must write, and every check that runs late, runs unscoped, or fails open is either a revenue leak or a customer-facing outage. The discipline here is to model entitlements once, evaluate them deterministically on the hot path, and degrade gracefully when a tenant crosses a boundary — never to scatter if (tenant.plan === 'pro') across a hundred call sites.

Prerequisites

Before wiring enforcement into request handling, confirm the surrounding plumbing exists. An entitlement check is only as trustworthy as the plan data and tenant context it reads.

[ ] A resolved tenant context on every request — a validated tenant_id bound to an async-local store, never a default fallback.
[ ] A source of truth for each tenant's current plan: a subscriptions table synced from your billing provider, not a hard-coded map.
[ ] An entitlements catalog that maps each plan to its limits and feature flags, versioned and loadable without a deploy.
[ ] A low-latency counter store for quota state — Redis 6+ or a Postgres table with atomic upserts — readable in under 5 ms.
[ ] A usage signal you trust, fed by the usage metering event pipelines that aggregate consumption per tenant.
[ ] A billing webhook listener so plan changes propagate within seconds, covered under billing sync with Stripe.
[ ] A test harness that drives requests under two plans in the same process to catch entitlement bleed.

Step-by-Step Implementation

The build proceeds in five stages: model entitlements, resolve them per tenant, check quota atomically, gate seats and features, and handle the overage boundary. Each step below is independently runnable.

Step 1 — Model entitlements as data, not code

An entitlement is the resolved answer to "what may this tenant do, and how much." Model it as a flat structure keyed by a stable identifier, so adding a plan or adjusting a limit is a data change, not a code change. Limits are numbers, features are booleans, and null means unlimited.

export interface Entitlements {
  planId: string;
  limits: Record<string, number | null>; // null = unlimited
  features: Record<string, boolean>;
}

export const PLAN_CATALOG: Record<string, Omit<Entitlements, 'planId'>> = {
  starter: {
    limits: { projects: 3, seats: 5, api_calls_per_day: 10_000 },
    features: { sso: false, audit_log: false, custom_domain: false },
  },
  pro: {
    limits: { projects: 50, seats: 25, api_calls_per_day: 1_000_000 },
    features: { sso: true, audit_log: true, custom_domain: false },
  },
  enterprise: {
    limits: { projects: null, seats: null, api_calls_per_day: null },
    features: { sso: true, audit_log: true, custom_domain: true },
  },
};

Keep the catalog identifiers identical to the metering dimensions and the billing provider's metadata. When api_calls_per_day is the limit key, the metered counter and the Stripe metadata field must use the same string — divergence here is the root of most "we billed them but never throttled them" incidents.

Step 2 — Resolve a tenant's entitlements once per request

Read the tenant's current plan from the subscription record, fold it onto the catalog, and cache the result for the request's lifetime. Resolve once; checking ten limits should not mean ten lookups.

import { tenantContext } from './tenant-middleware';
import { PLAN_CATALOG, Entitlements } from './entitlements';

const cache = new Map<string, { value: Entitlements; expires: number }>();

export async function resolveEntitlements(db: Db): Promise<Entitlements> {
  const ctx = tenantContext.getStore();
  if (!ctx?.tenantId) throw new Error('Entitlement check rejected: no tenant context');

  const hit = cache.get(ctx.tenantId);
  if (hit && hit.expires > Date.now()) return hit.value;

  const sub = await db.subscriptions.findActive(ctx.tenantId);
  const plan = PLAN_CATALOG[sub?.planId ?? 'starter'];
  const value: Entitlements = { planId: sub?.planId ?? 'starter', ...plan };

  cache.set(ctx.tenantId, { value, expires: Date.now() + 60_000 });
  return value;
}

Bound the cache TTL tightly — 60 seconds is a safe ceiling — and invalidate the key explicitly when a billing webhook reports a plan change, so a downgrade takes effect in seconds rather than a minute. The cache must be keyed by tenantId; a cache that forgets the tenant key serves one tenant's plan to another, the same failure class as an unscoped query.

Step 3 — Check quota atomically before the side effect

A counted limit must be reserved before the work happens, not measured after. Race conditions here are revenue: two concurrent requests that each read "4,999 used, limit 5,000" will both proceed if the check and the increment are separate operations. Use an atomic counter so the read-and-reserve is a single instruction.

import redis

r = redis.Redis()

def reserve_quota(tenant_id: str, dimension: str, limit, amount: int = 1) -> bool:
    if limit is None:  # unlimited plan
        return True
    key = f"quota:{{{tenant_id}}}:{dimension}"
    used = r.incrby(key, amount)
    if used == amount:  # first write in this window
        r.expire(key, 86_400)  # daily window
    if used > limit:
        r.decrby(key, amount)  # roll back the reservation
        return False
    return True

The hash-tag braces around tenant_id pin every dimension for one tenant to the same Redis Cluster slot, so a multi-key transaction stays on one node. The roll-back on overflow keeps the counter honest: a rejected request must not consume quota it never used. The deeper mechanics of feeding these counters from a durable stream — and surviving replays — are covered in enforcing plan limits with tenant quotas.

Step 4 — Gate seats and features at assignment time

Seats and features are slow-moving entitlements, so check them at the mutation that changes them — adding a user, toggling a capability — rather than on every request. A seat limit enforced only at login lets an admin over-provision and surfaces the failure to the wrong person.

export async function assignSeat(db: Db, ent: Entitlements, userId: string) {
  const cap = ent.limits.seats;
  if (cap !== null) {
    const active = await db.members.countActive(); // already tenant-scoped
    if (active >= cap) {
      throw new PlanLimitError('seats', { used: active, limit: cap, planId: ent.planId });
    }
  }
  await db.members.activate(userId);
}

export function requireFeature(ent: Entitlements, feature: string) {
  if (!ent.features[feature]) {
    throw new FeatureGateError(feature, { planId: ent.planId, upgradeTo: 'pro' });
  }
}

A feature flag scoped to a plan is distinct from a permission scoped to a role. Entitlements answer "did the tenant buy this," authorization answers "may this user do it," and both must pass — a question the role-based access control per tenant layer resolves where entitlements meet permissions. An admin on the Starter plan has the role to configure SSO but not the entitlement; both checks fire, the entitlement check first.

Step 5 — Decide the overage policy at the boundary

Crossing a limit is a product decision encoded as a strategy: hard-stop, throttle, or meter the overage for billing. Make it explicit per dimension so the boundary behavior is auditable rather than an accident of which exception bubbles up first.

type OveragePolicy = 'block' | 'throttle' | 'bill';

const POLICY: Record<string, OveragePolicy> = {
  seats: 'block',
  projects: 'block',
  api_calls_per_day: 'throttle',
  storage_gb: 'bill',
};

export function onLimitExceeded(dimension: string, ctx: LimitContext) {
  switch (POLICY[dimension] ?? 'block') {
    case 'block':
      throw new PlanLimitError(dimension, ctx); // 402 Payment Required
    case 'throttle':
      return { retryAfter: secondsUntilWindowReset(ctx) }; // 429 + Retry-After
    case 'bill':
      emitOverageEvent(dimension, ctx.amount); // record, allow, invoice later
      return { allowed: true };
  }
}

A metered-overage dimension must still emit the event idempotently or you will bill twice on a retry; route those events through the same pipeline that handles idempotent usage event ingestion. The HTTP semantics matter: a hard limit is 402 Payment Required, a rate boundary is 429 Too Many Requests with Retry-After, and a billed overage returns 200 and a quiet meter increment.

Choosing an Enforcement Strategy

Each limit type has a natural check point and overage policy. The table maps the decision so you are not improvising the response at the boundary.

Limit type	Check point	Storage	Typical overage policy	HTTP on breach
Rate (calls/sec, calls/day)	Per request, pre-handler	Redis atomic counter	Throttle	429 + Retry-After
Volume (projects, rows, storage)	At create/write	Postgres count or counter	Block or bill	402 / 200 metered
Seats	At member activation	Postgres count	Block	402
Feature access	At feature entry point	Cached entitlement flag	Block	403 / 402

The split between block and bill is where pricing strategy becomes code. Volume limits sold as "soft" must meter and invoice; the same dimension sold as "hard" must refuse the write. Encoding that in a policy map rather than scattered conditionals lets product change the boundary without a code review of every call site.

How an entitlement decision flows through a request

The diagram traces a single request from ingress to a granted or refused action. The decisive moment is order: tenant context resolves first, entitlements load from cache, and the atomic quota reservation runs before any side effect — so a tenant over its limit never performs the work it cannot pay for.

Reservation precedes the side effect: a tenant over its limit is refused before any work runs, and the policy gate decides whether refusal means block, throttle, or a billed overage.

Dynamic Query Scoping & Connection Handling

Enforcement reads and writes counters on the hot path, so where that state lives determines both correctness and latency. Daily and per-second rate counters belong in Redis: an atomic INCRBY with a TTL-bounded window is a single round trip, and the hash-tag pin keeps a tenant's dimensions co-located for transactional roll-back. Volume counts — projects, members, rows — are better derived from the authoritative table with a tenant-scoped COUNT, because a Redis counter that drifts from the source of truth will eventually let a tenant exceed a hard limit or block one wrongly.

The query that backs a volume check must itself be tenant-scoped, or enforcement becomes its own leak: a SELECT count(*) FROM projects without a tenant predicate counts every tenant's rows and refuses everyone once any tenant fills up. Push the predicate into the data layer rather than the handler — the same discipline the tenant-aware data routing and query scoping layer enforces for every read. Cache the resolved entitlement object per request and the count is the only live query enforcement adds.

State	Store	Read latency	Authority	Drift risk
Rate counters	Redis (atomic, TTL window)	< 2 ms	Self	Resets on window expiry
Volume counts	Postgres tenant-scoped COUNT	3–10 ms	Source table	None
Entitlement object	In-process cache, 60s TTL	< 0.1 ms	Subscription row	Stale until invalidation
Seat count	Postgres members count	3–8 ms	Source table	None

Connection handling follows the metering load. Rate enforcement should never touch the primary database; if a per-request check fans out to Postgres under load, the enforcement layer becomes the bottleneck it was meant to protect against. Reserve database round trips for the slow-moving volume and seat checks, and keep the high-frequency rate path entirely in the counter store.

Security Enforcement & Access Control

Plan enforcement and authorization are independent controls that must both pass, and conflating them is a security bug in either direction. Granting a feature because the user's role permits it — while the tenant never bought the feature — leaks paid capability. Refusing because the plan lacks an entitlement while never checking the role lets any tenant member trigger an action only an admin should. The two checks are orthogonal and run in sequence.

The first layer is tenant context: enforcement that reads the wrong tenant's plan is worse than no enforcement. The second is the entitlement check — did this tenant pay for this. The third is authorization — may this user, given their role, perform it. When the plan is downgraded, both the entitlement cache and any session isolation and state management state that cached the old plan must be invalidated, or a downgraded tenant keeps premium access until a token or cache expires.

Layer	Mechanism	Enforces	Failure mode if absent
Tenant context	Resolved + validated `tenant_id`	Whose plan applies	Wrong tenant's limits enforced
Entitlement check	Plan catalog lookup	Did the tenant buy it	Paid feature given away
Authorization	Role / permission check	May this user act	Privilege escalation in tenant
Quota reservation	Atomic counter pre-side-effect	How much is left	Revenue leak under concurrency

Order is load-bearing. The entitlement check must run before the expensive work and before any cache write, so a refused request never warms a cache or performs a partial side effect. The quota reservation must precede the action, not follow it, so concurrent requests cannot both pass a stale read.

Operational Overhead & Scaling Metrics

Enforcement adds measurable cost on the hot path, and its failure signatures are specific to revenue and availability. Track these signals and act on the thresholds before they become incidents or invoices.

Metric	Healthy threshold	Mitigation when exceeded
Quota check latency	< 5 ms p99	Move rate checks off Postgres into Redis
Entitlement cache hit rate	> 95%	Raise TTL ceiling; warm cache on plan change
Counter / source drift	< 0.1% of checks	Reconcile Redis counters against source nightly
Stale-plan window after downgrade	< 5 s	Invalidate cache on webhook, not on TTL alone
Fail-open rejections	0 / min	Audit error handling; a store outage must fail closed
Overage events billed	matches metered usage	Dedup on idempotency key in the pipeline

The single most expensive mistake is failing open when the counter store is unreachable. A try/catch around the quota check that returns "allow" on a Redis timeout converts an outage into uncapped consumption — the busiest tenants get unlimited free usage exactly when you can least afford it. Fail closed on the dimensions where overage is revenue (seats, hard volume) and degrade to a conservative cached limit on rate dimensions, but never default the answer to "yes" because the check broke.

Pitfalls & Anti-Patterns

Checking the limit after the side effect. Reading the counter, doing the work, then incrementing leaves a race window where concurrent requests both pass a stale read. Two simultaneous writes against a limit of one both succeed. Reserve atomically before the side effect and roll back on failure; the reservation, not a later measurement, is the gate.

Failing open on a store outage. Wrapping the quota check so a Redis or database timeout returns "allow" turns every counter-store incident into unlimited free consumption for your highest-volume tenants. Fail closed on revenue-bearing limits; degrade to a conservative cached value on rate limits; never silently grant.

Hard-coding plan logic at call sites. Scattering if (plan === 'pro') across handlers means every pricing change is a code change touching dozens of files, and a missed site is a leak. Resolve entitlements once into a data object and check that object; pricing changes become catalog edits.

Conflating entitlements with permissions. Treating "the tenant bought SSO" and "this user may configure SSO" as one check leaks paid features to the unauthorized or blocks the authorized. They are independent gates; both must pass, in sequence, entitlement first.

Leaving the plan cache stale after a downgrade. Caching the entitlement object without invalidating it on a billing webhook lets a downgraded or cancelled tenant keep premium access until the TTL expires. Invalidate the cache key on the plan-change event, and treat TTL as a backstop, not the primary mechanism.

Frequently Asked Questions

Should plan enforcement fail open or fail closed when the counter store is down? Fail closed on revenue-bearing limits — seats, hard volume caps — because failing open hands unlimited free usage to your busiest tenants during the outage. For rate limits you may degrade to a conservative cached limit so availability survives, but the default answer when a check cannot complete must never be an unconditional grant.

Where do entitlements end and permissions begin? Entitlements answer "did this tenant pay for the capability"; permissions answer "may this specific user, given their role, use it." They are independent and both must pass. A Starter-plan admin has the role to configure SSO but not the entitlement, so the action is refused at the entitlement gate before the role check even matters.

How do I prevent a tenant from exceeding a limit under concurrent requests? Use an atomic reserve-then-check, not a separate read and write. An atomic INCRBY returns the post-increment value in one operation; if it exceeds the limit, roll the increment back and refuse. Separating the read from the write opens a race where two requests both see headroom that only one of them has.

How fast must a plan downgrade take effect? Within seconds, driven by the billing webhook rather than the cache TTL. Cache the entitlement object with a short TTL as a backstop, but invalidate the tenant's cache key explicitly when the plan-change event arrives. Relying on TTL alone leaves a downgraded tenant with premium access for the full cache window.