Tenant-Partitioned Time-Series for Metering
Billing a SaaS on usage means summing millions of raw events per tenant per billing period, and a flat append-only table makes that scan slower every month — so the metering store has to be partitioned by both time and tenant from day one. This guide is part of the Usage Metering Event Pipelines section of the broader Tenant Billing & Usage Metering reference.
Problem Framing
A metering pipeline emits one row per billable action: an API call, a gigabyte stored, a seat-second consumed. Each row carries a tenant id, a timestamp, a metric name, and a quantity. The volume is high and write-mostly, and the read pattern is narrow but unforgiving — at invoice time you ask "sum every api_call for tenant acme between the start and end of last month," and you ask it for every tenant at once. On a single ever-growing table, that query degrades linearly: the planner scans more heap and more index every billing cycle, and a late-arriving event can silently change a number a customer was already charged for.
Two axes of partitioning solve different halves of the problem. Time partitioning bounds how much data any one query touches: a billing-period query reads only the chunks that overlap the period instead of the whole history. Tenant partitioning bounds the blast radius of a single tenant's load and makes per-tenant retention and deletion a metadata operation rather than a full-table DELETE. TimescaleDB gives you both — a hypertable that automatically shards into time chunks, with optional space partitioning on tenant_id so each tenant's rows cluster together within each time range. This is the same partition-by-tenant instinct that drives schema-per-tenant architecture, applied to an append-only event store instead of mutable relational tables.
The third piece is precomputation. Even with good partitioning, re-summing a billing period from raw events at invoice time is wasteful when the same period is queried repeatedly — by the billing job, by an in-app usage dashboard, and by support. A continuous aggregate maintains a rolling, incrementally refreshed rollup (hourly or daily totals per tenant per metric) so the billing query reads thousands of pre-summed buckets instead of millions of raw rows. Retention then drops raw chunks once they are safely rolled up and past the dispute window, keeping the hot path small. The sequence below builds the store in that order: hypertable, rollup, retention, query.
Step-by-Step Guide
1. Create the hypertable partitioned by time and tenant
Define a plain table with a composite primary key that includes the time column, then convert it to a hypertable. Add space partitioning on tenant_id so a tenant's rows colocate within each time chunk.
CREATE TABLE usage_events (
tenant_id text NOT NULL,
event_time timestamptz NOT NULL,
metric text NOT NULL,
quantity numeric NOT NULL,
event_id uuid NOT NULL,
PRIMARY KEY (tenant_id, event_time, event_id)
);
SELECT create_hypertable(
'usage_events', 'event_time',
partitioning_column => 'tenant_id',
number_partitions => 16,
chunk_time_interval => INTERVAL '1 day'
);
The time column drives chunking, so create_hypertable must own event_time; the composite primary key includes it because TimescaleDB requires every unique constraint to cover the partitioning columns. Space partitioning by tenant_id hashes tenants across 16 partitions, which keeps one noisy tenant from dominating a chunk and lets the planner prune to the partitions a single-tenant query needs. Size chunk_time_interval so the most recent chunk plus its indexes fit comfortably in memory — one day suits high-volume metering; widen it for sparse workloads. Include event_id in the key now so ingestion can deduplicate on it, which is the foundation for idempotent usage event ingestion.
2. Build a continuous aggregate for per-tenant rollups
Create a materialized view over the hypertable that buckets events into daily totals per tenant per metric. TimescaleDB refreshes it incrementally as new data lands.
CREATE MATERIALIZED VIEW usage_daily
WITH (timescaledb.continuous) AS
SELECT
tenant_id,
metric,
time_bucket('1 day', event_time) AS bucket,
sum(quantity) AS total_quantity,
count(*) AS event_count
FROM usage_events
GROUP BY tenant_id, metric, time_bucket('1 day', event_time)
WITH NO DATA;
time_bucket is the time-series equivalent of date_trunc and aligns rows into fixed daily windows. WITH NO DATA creates the view without an expensive initial backfill, which a refresh policy then fills in the background. Group by tenant_id first so the rollup is itself tenant-partitioned and a per-tenant billing query scans only that tenant's buckets. Keep the aggregate at the finest grain you ever bill or report on — daily here — because you can always re-bucket daily totals up to monthly in a query, but you can never recover daily detail from a monthly rollup.
3. Add a refresh policy with a lag window
Schedule the aggregate to refresh on a cadence, but hold back the most recent window so late-arriving events still land in raw chunks before the bucket is finalized.
SELECT add_continuous_aggregate_policy('usage_daily',
start_offset => INTERVAL '3 days',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '30 minutes');
end_offset => INTERVAL '1 hour' means the policy never materializes the last hour, leaving a gap for events that arrive out of order from retries or clock skew. start_offset => INTERVAL '3 days' bounds how far back each run re-checks, so a late event up to three days old still corrects its bucket. Tune start_offset to your tolerated late-arrival window: wider catches stragglers but costs more refresh work each cycle. Run the schedule often enough (every 30 minutes) that an in-app usage dashboard stays close to real time without hammering the raw chunks.
4. Set per-tenant-aware retention and compression
Compress old chunks to shrink storage, then drop raw chunks once they are rolled up and past the dispute window. Retention runs on the raw hypertable, not the aggregate, so summarized history survives.
ALTER TABLE usage_events SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'tenant_id, metric'
);
SELECT add_compression_policy('usage_events', INTERVAL '7 days');
SELECT add_retention_policy('usage_events', INTERVAL '95 days');
compress_segmentby = 'tenant_id, metric' groups compressed rows by the columns you filter on, so a single-tenant query still skips other tenants' compressed batches. The compression policy compresses chunks older than 7 days — well past the active write window — and the retention policy drops raw chunks older than 95 days, comfortably beyond a typical 90-day billing dispute window. Crucially, retention only touches usage_events; the usage_daily aggregate keeps its rollups, so you can still answer "what did this tenant use last quarter" long after the raw events are gone. Set the retention interval from your longest legal or contractual obligation to retain raw evidence, not from convenience.
5. Run the per-tenant billing-period query against the rollup
At invoice time, sum the daily rollup for one tenant over the billing period. Re-bucket daily totals to whatever grain the invoice needs.
SELECT
metric,
sum(total_quantity) AS billable_quantity
FROM usage_daily
WHERE tenant_id = 'acme'
AND bucket >= date_trunc('month', timestamptz '2026-05-01')
AND bucket < date_trunc('month', timestamptz '2026-06-01')
GROUP BY metric;
This reads at most ~31 pre-summed buckets per metric instead of every raw event in the month, so it returns in milliseconds and its cost is flat as history grows. The half-open range (>= start, < next-month start) avoids double-counting the boundary day. Drive tenant_id and the period from the billing job per tenant; because the aggregate is partitioned by tenant, the planner reads only that tenant's slice. Hand these billable_quantity totals to the plan-enforcement and Stripe-sync stages of the pipeline as the authoritative usage figure for the period.
Verification
Confirm the rollup agrees with the raw events for a tenant and period before trusting it for billing. Sum both sides and compare.
WITH from_raw AS (
SELECT sum(quantity) AS n
FROM usage_events
WHERE tenant_id = 'acme' AND metric = 'api_call'
AND event_time >= timestamptz '2026-05-01'
AND event_time < timestamptz '2026-06-01'
),
from_rollup AS (
SELECT sum(total_quantity) AS n
FROM usage_daily
WHERE tenant_id = 'acme' AND metric = 'api_call'
AND bucket >= timestamptz '2026-05-01'
AND bucket < timestamptz '2026-06-01'
)
SELECT from_raw.n = from_rollup.n AS totals_match
FROM from_raw, from_rollup;
totals_match must be t. A mismatch usually means the refresh policy has not caught up to the latest events in the period — run CALL refresh_continuous_aggregate('usage_daily', NULL, NULL); for the window and re-check. Run this comparison only on a period that has fully passed the aggregate's end_offset; checking the current hour will always differ because the aggregate deliberately lags. Confirm partition pruning is working too — EXPLAIN a single-tenant query and verify the plan touches only that tenant's chunks:
EXPLAIN (ANALYZE, BUFFERS)
SELECT sum(total_quantity) FROM usage_daily
WHERE tenant_id = 'acme'
AND bucket >= timestamptz '2026-05-01'
AND bucket < timestamptz '2026-06-01';
The plan should show a small number of chunks scanned, not the full hypertable. If every chunk is read, the tenant_id predicate is not pruning — check that the literal type matches and that you are querying the aggregate, not the raw table.
Failure Modes & Gotchas
- Billing query reads raw events instead of the rollup. Symptom: invoice generation slows down every month. Cause: the query targets
usage_eventsdirectly. Fix: point billing at theusage_dailycontinuous aggregate and reserve the raw table for audit and reconciliation. - Late events miss their bucket. Symptom: a customer's total is short by events that arrived after the invoice ran. Cause:
end_offsetfinalized the bucket before the straggler landed. Fix: widenstart_offsetandend_offsetto cover your real late-arrival window, and bill only after the period clears that lag. - Retention drops raw chunks the aggregate had not rolled up. Symptom: a gap appears in historical rollups. Cause: the retention interval is shorter than the aggregate's
start_offsetreach. Fix: keep raw retention strictly longer than the refresh window so every chunk is materialized before it is dropped. - One tenant floods a single chunk. Symptom: queries for unrelated tenants slow during a noisy tenant's spike. Cause: too few space partitions, so the hot tenant's rows dominate each chunk. Fix: raise
number_partitionsand ensuretenant_idis the space-partitioning column so load spreads.
FAQ
Why not just add a monthly index on a normal table? An index speeds lookups but does not stop the heap from growing without bound, so scans and vacuum costs still rise every billing cycle. A hypertable with retention keeps the working set bounded by dropping old chunks as whole files, and a continuous aggregate removes the per-invoice scan entirely.
Should I partition by tenant or by time first? Both, in that order of priority: time is the primary partition because it bounds every query and enables retention, and tenant is the space dimension that isolates load and lets per-tenant queries prune. Make sure each raw event is also deduplicated on a stable key, as covered in idempotent usage event ingestion.
Can I keep raw events forever for audit? You can, but compress them and move them off the hot path. Apply a compression policy to old chunks and, if storage matters, tier them to cheaper object storage; the continuous aggregate still answers billing-period queries regardless of how the raw chunks are stored.