Subscription & Plan Enforcement
Plan enforcement is the layer that turns a subscription record into runtime behavior — granting or denying every API call, seat assignment, and feature toggle according to what the tenant pays for — and it operates within the broader Tenant Billing & Usage Metering framework that ties metered consumption back to revenue. A plan that exists only in your pricing page and a Stripe product is not enforcement; enforcement is the code that refuses the 5,001st row, the 11th seat, and the call to an endpoint the Starter tier never bought.
The premise is unforgiving: every limit a salesperson promises is a check an engineer must write, and every check that runs late, runs unscoped, or fails open is either a revenue leak or a customer-facing outage. The discipline here is to model entitlements once, evaluate them deterministically on the hot path, and degrade gracefully when a tenant crosses a boundary — never to scatter if (tenant.plan === 'pro') across a hundred call sites.
Prerequisites
Before wiring enforcement into request handling, confirm the surrounding plumbing exists. An entitlement check is only as trustworthy as the plan data and tenant context it reads.
- [ ] A resolved tenant context on every request — a validated
tenant_idbound to an async-local store, never a default fallback. - [ ] A source of truth for each tenant's current plan: a
subscriptionstable synced from your billing provider, not a hard-coded map. - [ ] An entitlements catalog that maps each plan to its limits and feature flags, versioned and loadable without a deploy.
- [ ] A low-latency counter store for quota state — Redis 6+ or a Postgres table with atomic upserts — readable in under 5 ms.
- [ ] A usage signal you trust, fed by the usage metering event pipelines that aggregate consumption per tenant.
- [ ] A billing webhook listener so plan changes propagate within seconds, covered under billing sync with Stripe.
- [ ] A test harness that drives requests under two plans in the same process to catch entitlement bleed.
Step-by-Step Implementation
The build proceeds in five stages: model entitlements, resolve them per tenant, check quota atomically, gate seats and features, and handle the overage boundary. Each step below is independently runnable.
Step 1 — Model entitlements as data, not code
An entitlement is the resolved answer to "what may this tenant do, and how much." Model it as a flat structure keyed by a stable identifier, so adding a plan or adjusting a limit is a data change, not a code change. Limits are numbers, features are booleans, and null means unlimited.
export interface Entitlements {
planId: string;
limits: Record<string, number | null>; // null = unlimited
features: Record<string, boolean>;
}
export const PLAN_CATALOG: Record<string, Omit<Entitlements, 'planId'>> = {
starter: {
limits: { projects: 3, seats: 5, api_calls_per_day: 10_000 },
features: { sso: false, audit_log: false, custom_domain: false },
},
pro: {
limits: { projects: 50, seats: 25, api_calls_per_day: 1_000_000 },
features: { sso: true, audit_log: true, custom_domain: false },
},
enterprise: {
limits: { projects: null, seats: null, api_calls_per_day: null },
features: { sso: true, audit_log: true, custom_domain: true },
},
};
Keep the catalog identifiers identical to the metering dimensions and the billing provider's metadata. When api_calls_per_day is the limit key, the metered counter and the Stripe metadata field must use the same string — divergence here is the root of most "we billed them but never throttled them" incidents.
Step 2 — Resolve a tenant's entitlements once per request
Read the tenant's current plan from the subscription record, fold it onto the catalog, and cache the result for the request's lifetime. Resolve once; checking ten limits should not mean ten lookups.
import { tenantContext } from './tenant-middleware';
import { PLAN_CATALOG, Entitlements } from './entitlements';
const cache = new Map<string, { value: Entitlements; expires: number }>();
export async function resolveEntitlements(db: Db): Promise<Entitlements> {
const ctx = tenantContext.getStore();
if (!ctx?.tenantId) throw new Error('Entitlement check rejected: no tenant context');
const hit = cache.get(ctx.tenantId);
if (hit && hit.expires > Date.now()) return hit.value;
const sub = await db.subscriptions.findActive(ctx.tenantId);
const plan = PLAN_CATALOG[sub?.planId ?? 'starter'];
const value: Entitlements = { planId: sub?.planId ?? 'starter', ...plan };
cache.set(ctx.tenantId, { value, expires: Date.now() + 60_000 });
return value;
}
Bound the cache TTL tightly — 60 seconds is a safe ceiling — and invalidate the key explicitly when a billing webhook reports a plan change, so a downgrade takes effect in seconds rather than a minute. The cache must be keyed by tenantId; a cache that forgets the tenant key serves one tenant's plan to another, the same failure class as an unscoped query.
Step 3 — Check quota atomically before the side effect
A counted limit must be reserved before the work happens, not measured after. Race conditions here are revenue: two concurrent requests that each read "4,999 used, limit 5,000" will both proceed if the check and the increment are separate operations. Use an atomic counter so the read-and-reserve is a single instruction.
import redis
r = redis.Redis()
def reserve_quota(tenant_id: str, dimension: str, limit, amount: int = 1) -> bool:
if limit is None: # unlimited plan
return True
key = f"quota:{{{tenant_id}}}:{dimension}"
used = r.incrby(key, amount)
if used == amount: # first write in this window
r.expire(key, 86_400) # daily window
if used > limit:
r.decrby(key, amount) # roll back the reservation
return False
return True
The hash-tag braces around tenant_id pin every dimension for one tenant to the same Redis Cluster slot, so a multi-key transaction stays on one node. The roll-back on overflow keeps the counter honest: a rejected request must not consume quota it never used. The deeper mechanics of feeding these counters from a durable stream — and surviving replays — are covered in enforcing plan limits with tenant quotas.
Step 4 — Gate seats and features at assignment time
Seats and features are slow-moving entitlements, so check them at the mutation that changes them — adding a user, toggling a capability — rather than on every request. A seat limit enforced only at login lets an admin over-provision and surfaces the failure to the wrong person.
export async function assignSeat(db: Db, ent: Entitlements, userId: string) {
const cap = ent.limits.seats;
if (cap !== null) {
const active = await db.members.countActive(); // already tenant-scoped
if (active >= cap) {
throw new PlanLimitError('seats', { used: active, limit: cap, planId: ent.planId });
}
}
await db.members.activate(userId);
}
export function requireFeature(ent: Entitlements, feature: string) {
if (!ent.features[feature]) {
throw new FeatureGateError(feature, { planId: ent.planId, upgradeTo: 'pro' });
}
}
A feature flag scoped to a plan is distinct from a permission scoped to a role. Entitlements answer "did the tenant buy this," authorization answers "may this user do it," and both must pass — a question the role-based access control per tenant layer resolves where entitlements meet permissions. An admin on the Starter plan has the role to configure SSO but not the entitlement; both checks fire, the entitlement check first.
Step 5 — Decide the overage policy at the boundary
Crossing a limit is a product decision encoded as a strategy: hard-stop, throttle, or meter the overage for billing. Make it explicit per dimension so the boundary behavior is auditable rather than an accident of which exception bubbles up first.
type OveragePolicy = 'block' | 'throttle' | 'bill';
const POLICY: Record<string, OveragePolicy> = {
seats: 'block',
projects: 'block',
api_calls_per_day: 'throttle',
storage_gb: 'bill',
};
export function onLimitExceeded(dimension: string, ctx: LimitContext) {
switch (POLICY[dimension] ?? 'block') {
case 'block':
throw new PlanLimitError(dimension, ctx); // 402 Payment Required
case 'throttle':
return { retryAfter: secondsUntilWindowReset(ctx) }; // 429 + Retry-After
case 'bill':
emitOverageEvent(dimension, ctx.amount); // record, allow, invoice later
return { allowed: true };
}
}
A metered-overage dimension must still emit the event idempotently or you will bill twice on a retry; route those events through the same pipeline that handles idempotent usage event ingestion. The HTTP semantics matter: a hard limit is 402 Payment Required, a rate boundary is 429 Too Many Requests with Retry-After, and a billed overage returns 200 and a quiet meter increment.
Choosing an Enforcement Strategy
Each limit type has a natural check point and overage policy. The table maps the decision so you are not improvising the response at the boundary.
| Limit type | Check point | Storage | Typical overage policy | HTTP on breach |
|---|---|---|---|---|
| Rate (calls/sec, calls/day) | Per request, pre-handler | Redis atomic counter | Throttle | 429 + Retry-After |
| Volume (projects, rows, storage) | At create/write | Postgres count or counter | Block or bill | 402 / 200 metered |
| Seats | At member activation | Postgres count | Block | 402 |
| Feature access | At feature entry point | Cached entitlement flag | Block | 403 / 402 |
The split between block and bill is where pricing strategy becomes code. Volume limits sold as "soft" must meter and invoice; the same dimension sold as "hard" must refuse the write. Encoding that in a policy map rather than scattered conditionals lets product change the boundary without a code review of every call site.
How an entitlement decision flows through a request
The diagram traces a single request from ingress to a granted or refused action. The decisive moment is order: tenant context resolves first, entitlements load from cache, and the atomic quota reservation runs before any side effect — so a tenant over its limit never performs the work it cannot pay for.
Dynamic Query Scoping & Connection Handling
Enforcement reads and writes counters on the hot path, so where that state lives determines both correctness and latency. Daily and per-second rate counters belong in Redis: an atomic INCRBY with a TTL-bounded window is a single round trip, and the hash-tag pin keeps a tenant's dimensions co-located for transactional roll-back. Volume counts — projects, members, rows — are better derived from the authoritative table with a tenant-scoped COUNT, because a Redis counter that drifts from the source of truth will eventually let a tenant exceed a hard limit or block one wrongly.
The query that backs a volume check must itself be tenant-scoped, or enforcement becomes its own leak: a SELECT count(*) FROM projects without a tenant predicate counts every tenant's rows and refuses everyone once any tenant fills up. Push the predicate into the data layer rather than the handler — the same discipline the tenant-aware data routing and query scoping layer enforces for every read. Cache the resolved entitlement object per request and the count is the only live query enforcement adds.
| State | Store | Read latency | Authority | Drift risk |
|---|---|---|---|---|
| Rate counters | Redis (atomic, TTL window) | < 2 ms | Self | Resets on window expiry |
| Volume counts | Postgres tenant-scoped COUNT | 3–10 ms | Source table | None |
| Entitlement object | In-process cache, 60s TTL | < 0.1 ms | Subscription row | Stale until invalidation |
| Seat count | Postgres members count | 3–8 ms | Source table | None |
Connection handling follows the metering load. Rate enforcement should never touch the primary database; if a per-request check fans out to Postgres under load, the enforcement layer becomes the bottleneck it was meant to protect against. Reserve database round trips for the slow-moving volume and seat checks, and keep the high-frequency rate path entirely in the counter store.
Security Enforcement & Access Control
Plan enforcement and authorization are independent controls that must both pass, and conflating them is a security bug in either direction. Granting a feature because the user's role permits it — while the tenant never bought the feature — leaks paid capability. Refusing because the plan lacks an entitlement while never checking the role lets any tenant member trigger an action only an admin should. The two checks are orthogonal and run in sequence.
The first layer is tenant context: enforcement that reads the wrong tenant's plan is worse than no enforcement. The second is the entitlement check — did this tenant pay for this. The third is authorization — may this user, given their role, perform it. When the plan is downgraded, both the entitlement cache and any session isolation and state management state that cached the old plan must be invalidated, or a downgraded tenant keeps premium access until a token or cache expires.
| Layer | Mechanism | Enforces | Failure mode if absent |
|---|---|---|---|
| Tenant context | Resolved + validated tenant_id |
Whose plan applies | Wrong tenant's limits enforced |
| Entitlement check | Plan catalog lookup | Did the tenant buy it | Paid feature given away |
| Authorization | Role / permission check | May this user act | Privilege escalation in tenant |
| Quota reservation | Atomic counter pre-side-effect | How much is left | Revenue leak under concurrency |
Order is load-bearing. The entitlement check must run before the expensive work and before any cache write, so a refused request never warms a cache or performs a partial side effect. The quota reservation must precede the action, not follow it, so concurrent requests cannot both pass a stale read.
Operational Overhead & Scaling Metrics
Enforcement adds measurable cost on the hot path, and its failure signatures are specific to revenue and availability. Track these signals and act on the thresholds before they become incidents or invoices.
| Metric | Healthy threshold | Mitigation when exceeded |
|---|---|---|
| Quota check latency | < 5 ms p99 | Move rate checks off Postgres into Redis |
| Entitlement cache hit rate | > 95% | Raise TTL ceiling; warm cache on plan change |
| Counter / source drift | < 0.1% of checks | Reconcile Redis counters against source nightly |
| Stale-plan window after downgrade | < 5 s | Invalidate cache on webhook, not on TTL alone |
| Fail-open rejections | 0 / min | Audit error handling; a store outage must fail closed |
| Overage events billed | matches metered usage | Dedup on idempotency key in the pipeline |
The single most expensive mistake is failing open when the counter store is unreachable. A try/catch around the quota check that returns "allow" on a Redis timeout converts an outage into uncapped consumption — the busiest tenants get unlimited free usage exactly when you can least afford it. Fail closed on the dimensions where overage is revenue (seats, hard volume) and degrade to a conservative cached limit on rate dimensions, but never default the answer to "yes" because the check broke.
Pitfalls & Anti-Patterns
Checking the limit after the side effect. Reading the counter, doing the work, then incrementing leaves a race window where concurrent requests both pass a stale read. Two simultaneous writes against a limit of one both succeed. Reserve atomically before the side effect and roll back on failure; the reservation, not a later measurement, is the gate.
Failing open on a store outage. Wrapping the quota check so a Redis or database timeout returns "allow" turns every counter-store incident into unlimited free consumption for your highest-volume tenants. Fail closed on revenue-bearing limits; degrade to a conservative cached value on rate limits; never silently grant.
Hard-coding plan logic at call sites. Scattering if (plan === 'pro') across handlers means every pricing change is a code change touching dozens of files, and a missed site is a leak. Resolve entitlements once into a data object and check that object; pricing changes become catalog edits.
Conflating entitlements with permissions. Treating "the tenant bought SSO" and "this user may configure SSO" as one check leaks paid features to the unauthorized or blocks the authorized. They are independent gates; both must pass, in sequence, entitlement first.
Leaving the plan cache stale after a downgrade. Caching the entitlement object without invalidating it on a billing webhook lets a downgraded or cancelled tenant keep premium access until the TTL expires. Invalidate the cache key on the plan-change event, and treat TTL as a backstop, not the primary mechanism.
Frequently Asked Questions
Should plan enforcement fail open or fail closed when the counter store is down? Fail closed on revenue-bearing limits — seats, hard volume caps — because failing open hands unlimited free usage to your busiest tenants during the outage. For rate limits you may degrade to a conservative cached limit so availability survives, but the default answer when a check cannot complete must never be an unconditional grant.
Where do entitlements end and permissions begin? Entitlements answer "did this tenant pay for the capability"; permissions answer "may this specific user, given their role, use it." They are independent and both must pass. A Starter-plan admin has the role to configure SSO but not the entitlement, so the action is refused at the entitlement gate before the role check even matters.
How do I prevent a tenant from exceeding a limit under concurrent requests?
Use an atomic reserve-then-check, not a separate read and write. An atomic INCRBY returns the post-increment value in one operation; if it exceeds the limit, roll the increment back and refuse. Separating the read from the write opens a race where two requests both see headroom that only one of them has.
How fast must a plan downgrade take effect? Within seconds, driven by the billing webhook rather than the cache TTL. Cache the entitlement object with a short TTL as a backstop, but invalidate the tenant's cache key explicitly when the plan-change event arrives. Relying on TTL alone leaves a downgraded tenant with premium access for the full cache window.