Per-Tenant Data Deletion Workflows

An erasure request is not a DELETE statement; it is a distributed transaction across a primary database, a search index, several caches, an analytics warehouse, and immutable backups that you cannot rewrite. This page shows how to orchestrate right-to-erasure for one tenant's subject so that every store converges on the same outcome and you can prove it. It operates within the broader GDPR Data Subject Requests workflow that governs how a SaaS processor serves Article 17.

Problem Framing

The naive deletion deletes the row a user can see and leaves their personal data in five other systems. A typical SaaS write fans out: the primary row is mirrored into a search engine, denormalized into Redis, streamed onto a message bus, replicated into a columnar warehouse for analytics, and captured in nightly snapshots. Each of those is a separate copy of personal data, each with its own deletion semantics, and Article 17 obliges the controller to erase the data "without undue delay" everywhere it propagated. Miss one store and the request is unfulfilled regardless of how clean the primary looks.

The hard constraint is backups. You cannot surgically rewrite a write-once snapshot or a WAL archive to excise one subject without destroying the snapshot's integrity for every other tenant in it, and regulators accept that restoring, editing, and re-snapshotting petabytes for a single erasure is neither proportionate nor safe. The standard resolution is crypto-shredding: encrypt each subject's (or each tenant's) data under a key you control, and "delete" by destroying the key so the ciphertext in cold backups becomes permanently unrecoverable. That makes erasure depend directly on per-tenant encryption and key management, because the granularity of your keys sets the granularity of your erasure — a single key for an entire tenant cannot crypto-shred one of that tenant's users.

The second constraint is proof: a deletion you cannot evidence is, to an auditor, a deletion that did not happen, which is why every step writes to your tenant audit logging architecture. The third is ordering. Erasure is a multi-store transaction with no global commit, so partial failures are the norm rather than the exception: search may purge while the cache survives, or the key may be scheduled for destruction while a foreign-key row still pins the data live. The workflow must therefore be both idempotent — safe to replay any step — and convergent, retrying until every store reports the same terminal state. The flow below is what you are orchestrating.

The orchestrator is the single transaction boundary: it fans deletion to every live store, destroys the key that covers cold backups, and only then signs a proof record.

Step-by-Step Guide

1. Resolve the erasure scope from a data map

Before deleting anything, expand the subject into every primary-key and foreign-key reference that holds their personal data, scoped strictly to the requesting tenant so you never touch another tenant's rows. The scope is the most error-prone part of the whole workflow: personal data hides in audit trails, free-text comment fields, denormalized JSON blobs, and join tables that no obvious foreign key reaches. Drive resolution from a maintained data inventory — a registry that names every table, column, and external store containing personal data and the predicate that selects it — not ad-hoc joins, so a table added last quarter is not silently missed and quietly left full of PII.

-- Resolve all rows for one subject within one tenant.
SELECT 'orders' AS store, id FROM orders
  WHERE tenant_id = $1 AND subject_id = $2
UNION ALL
SELECT 'messages', id FROM messages
  WHERE tenant_id = $1 AND author_subject_id = $2
UNION ALL
SELECT 'attachments', id FROM attachments
  WHERE tenant_id = $1 AND owner_subject_id = $2;

2. Choose soft delete for the request lifecycle, hard delete for fulfillment

Mark the request as in-flight with a soft delete so the subject disappears from the application immediately and concurrent writes are blocked, but treat soft delete only as a quarantine state. The grace window between the two phases earns its keep: it absorbs accidental or fraudulent requests, gives downstream replicas time to converge, and lets a controller cancel before anything becomes irreversible. Erasure is not complete until the row is physically removed; a deleted_at flag still contains the personal data and does not satisfy Article 17, so the soft state must always carry a deadline that promotes it to a hard delete or a crypto-shred.

-- Phase A: quarantine instantly (reversible, within the grace window).
UPDATE orders SET deleted_at = now(), pii_state = 'pending_erasure'
  WHERE tenant_id = $1 AND subject_id = $2;
-- Phase B: hard delete after the grace window (irreversible).
DELETE FROM orders
  WHERE tenant_id = $1 AND subject_id = $2
    AND pii_state = 'pending_erasure'
    AND deleted_at < now() - interval '30 days';

Use the table below to decide which mechanism applies to each store, because no single technique fits all of them. Live transactional rows get a hard delete; immutable or non-deletable stores get crypto-shredding; only the request lifecycle uses soft delete.

Store	Mechanism	Why	Reversible?
Primary DB (live rows)	Hard delete after grace	Row is mutable and authoritative	No, after Phase B
Request lifecycle	Soft delete (`deleted_at`)	Needs a cancellable quarantine window	Yes, within window
Backups / WAL archives	Crypto-shred (destroy key)	Snapshots cannot be rewritten	No, once key gone
Analytics warehouse	Crypto-shred or scrub + rebuild	Columnar stores resist row deletes	No
Search index	Delete-by-query	Index is a rebuildable derivative	No
Cache	Key-prefix eviction	Entries are ephemeral copies	No

3. Orchestrate the fan-out as a durable, idempotent workflow

A network blip mid-fan-out must not leave search purged but the cache populated. Run each store deletion as an idempotent step of a durable workflow that retries until every step reports success, keyed on the request id so replays do not double-act. Idempotency is not optional here because retries are guaranteed, not exceptional: deleting an already-deleted row, evicting an absent cache key, or re-issuing a delete-by-query must all succeed silently rather than error and stall the workflow short of convergence.

func (w *EraseWorkflow) Run(ctx context.Context, req EraseRequest) error {
	steps := []Step{
		{"primary_db", w.deletePrimary},
		{"search", w.deleteSearch},
		{"cache", w.evictCache},
		{"analytics", w.scrubAnalytics},
		{"key_store", w.destroyKey},
	}
	for _, s := range steps {
		if err := retry(ctx, 5, func() error { return s.fn(ctx, req) }); err != nil {
			return fmt.Errorf("erase step %s failed: %w", s.name, err)
		}
		w.audit.Record(ctx, req.ID, s.name, "deleted")
	}
	return w.signProof(ctx, req)
}

4. Purge search and evict caches by tenant-scoped predicate

Search engines and caches hold denormalized copies that no foreign key reaches. Delete from search by a query that pins the tenant, and evict cache entries by their tenant-namespaced key prefix rather than guessing individual keys.

func (w *EraseWorkflow) deleteSearch(ctx context.Context, r EraseRequest) error {
	body := fmt.Sprintf(`{"query":{"bool":{"filter":[
		{"term":{"tenant_id":%q}},{"term":{"subject_id":%q}}]}}}`,
		r.TenantID, r.SubjectID)
	return w.es.DeleteByQuery(ctx, "documents", body)
}

func (w *EraseWorkflow) evictCache(ctx context.Context, r EraseRequest) error {
	pattern := fmt.Sprintf("t:%s:subj:%s:*", r.TenantID, r.SubjectID)
	return w.redis.ScanDelete(ctx, pattern)
}

5. Crypto-shred for backups and analytics you cannot rewrite

Cold snapshots and WAL archives are immutable, so you erase by destroying the per-subject data key, rendering the ciphertext in every backup unreadable. This only works if the data was encrypted at write time under a key whose scope matches the erasure unit, which is why the encryption design and the deletion design are the same design. Schedule the key destruction rather than firing it instantly: a key store enforces a pending window so an erroneous request can be cancelled, and that same window lets the destruction line up behind any legal retention or litigation hold that legally forbids deletion until it lifts. The same shred neutralizes any analytics extracts or data-lake parquet files encrypted under that key, which is the cleanest way to reach a warehouse that does not support row-level deletes.

func (w *EraseWorkflow) destroyKey(ctx context.Context, r EraseRequest) error {
	keyID := fmt.Sprintf("dek/%s/%s", r.TenantID, r.SubjectID)
	// Schedule deletion after retention; KMS prevents instant key loss.
	_, err := w.kms.ScheduleKeyDeletion(ctx, &kms.ScheduleKeyDeletionInput{
		KeyId:               &keyID,
		PendingWindowInDays:  7,
	})
	return err
}

6. Emit a tamper-evident proof-of-deletion record

The final step writes an immutable record of what was deleted, when, and from which stores, signed so an auditor can verify it was not edited after the fact. Store the subject as a one-way hash, never the cleartext, so the proof itself does not re-introduce the personal data you just erased.

func (w *EraseWorkflow) signProof(ctx context.Context, r EraseRequest) error {
	proof := DeletionProof{
		RequestID:   r.ID,
		TenantID:    r.TenantID,
		SubjectHash: sha256Hex(r.TenantID + ":" + r.SubjectID),
		Stores:      []string{"primary_db", "search", "cache", "analytics", "key_store"},
		CompletedAt: time.Now().UTC(),
	}
	proof.Signature = w.signer.Sign(proof.Canonical())
	return w.proofs.Append(ctx, proof) // write-once store
}

Verification

Prove the subject is gone from every live store, then prove the proof record is intact. The query below must return zero across the union, and the proof's signature must validate.

-- Must return 0; any non-zero row is an unfulfilled erasure.
SELECT count(*) AS residual FROM (
  SELECT id FROM orders      WHERE tenant_id = $1 AND subject_id = $2
  UNION ALL
  SELECT id FROM messages    WHERE tenant_id = $1 AND author_subject_id = $2
  UNION ALL
  SELECT id FROM attachments WHERE tenant_id = $1 AND owner_subject_id = $2
) residual_rows;
-- PASS criteria:
--   residual = 0
--   GET /_count on search with the tenant+subject filter returns 0
--   redis EXISTS on the t:<tenant>:subj:<subject>:* prefix returns 0
--   KMS key state for dek/<tenant>/<subject> is PendingDeletion
--   proof signature verifies against the public key

If residual is greater than zero, a table was added after the data map was last updated; reconcile the inventory in step 1 before retrying, because the workflow only deletes what the map enumerates.

Failure Modes & Gotchas

Soft delete treated as final. Symptom: an audit finds deleted_at rows still full of PII months later. Root cause: the quarantine phase was never promoted to a hard delete. Fix: schedule the Phase B DELETE to run automatically after the grace window.
Backups left readable. Symptom: erased data is recoverable from last week's snapshot. Root cause: deletion ran only against live stores, never against the key. Fix: make destroyKey a mandatory, non-skippable workflow step gated on retention expiry.
Cache repopulates after deletion. Symptom: the subject reappears seconds after erasure. Root cause: a read replica or in-flight request re-cached the row before the primary delete committed. Fix: evict the cache last, after the primary commit, and set the tenant key prefix to a short TTL during the request.
Cross-tenant over-deletion. Symptom: another tenant reports missing records. Root cause: the erasure predicate matched on subject_id without tenant_id. Fix: require tenant_id in every delete clause and assert it in the workflow before any store is touched.

FAQ

Does soft delete satisfy GDPR right-to-erasure? No. A deleted_at flag hides the data from the application but the personal data still exists in the row, so it does not meet Article 17 on its own; soft delete is only valid as a short reversible quarantine that is always promoted to a physical hard delete or a crypto-shred within a defined grace window.

How do I erase data from immutable backups without rewriting them? Encrypt each subject's or tenant's data under a dedicated key and destroy that key — crypto-shredding — so the ciphertext in any snapshot, WAL archive, or analytics extract becomes permanently unrecoverable without ever touching the backup file itself.

What must a proof-of-deletion record contain? The request id, the tenant, a one-way hash of the subject (never cleartext), the list of stores erased, a UTC completion timestamp, and a signature over the canonical form, written to a write-once store so an auditor can confirm the record was not altered after deletion.