Generating SOC 2 Audit Artifacts Per Tenant
A SOC 2 audit demands concrete evidence — access reviews, change logs, and proof that records were not altered — and in a multi-tenant SaaS that evidence must be sliced cleanly to a single tenant without ever leaking another customer's activity. This page sits under Tenant Audit Logging Architecture, which defines how events are captured and chained; here we turn that stored log into the specific artifacts a SOC 2 auditor (or a security-conscious enterprise buyer) will ask for, scoped per tenant and exported on a schedule.
Problem Framing
SOC 2 is an attestation against the Trust Services Criteria — primarily the Common Criteria (CC) for security, plus optional categories like Availability and Confidentiality. An auditor does not accept "we have logs"; they sample specific time windows and ask you to produce the events that demonstrate a control operated effectively across the whole audit period. The artifacts that matter most map to CC6 (logical access): who had access, what privileges changed, and whether those changes were authorized. The raw material for nearly all of it is the audit log you already keep — but raw rows are not artifacts. An artifact is a control-mapped, time-bounded, tenant-scoped, tamper-evident report.
Three things break when you try to generate these on demand from a multi-tenant store. Cross-tenant contamination: an export query that forgets a tenant_id predicate hands one customer's access review to another, which is itself a confidentiality incident and a SOC 2 finding. Unmapped evidence: you can dump 40,000 log rows, but the auditor wants the rows that prove CC6.2 (access provisioning) and CC6.3 (access modification) — undifferentiated logs force the auditor to do your mapping and erode their confidence. Unprovable integrity: a CSV of events proves nothing if you cannot show the events were not edited after the fact; SOC 2's whole point is reliability of the control, so you must ship an immutability proof alongside the data.
The reason enterprise buyers increasingly want per-tenant artifacts (not just a company-wide SOC 2 report) is that their own auditors scope to the data they actually entrusted to you. A bank using your platform wants evidence about their tenant's access events, not aggregate platform statistics. Generating these cleanly requires the same tenant boundary you enforce everywhere else — the export path is just another query that must carry tenant context, and it is the query most likely to be written quickly under deadline pressure and shipped without the predicate. Treat the artifact generator as a first-class, tested code path, not an ad-hoc SQL session.
The flow below shows how stored audit events become control-mapped, verified artifacts delivered to a per-tenant destination.
The access-review artifact overlaps heavily with the role-change evidence covered in auditing RBAC changes across tenants; that page captures the mutations, and this one turns them into the CC6.3 change-management artifact the auditor samples.
Step-by-Step Guide
1. Define a control-to-query mapping
Pin each Trust Services criterion to an exact, tenant-scoped query. Storing the mapping as data (not scattered SQL) makes the evidence reproducible and lets the auditor see precisely what each artifact represents.
CONTROL_QUERIES = {
"CC6.2": { # access provisioning
"label": "User access granted",
"actions": ["USER_PROVISIONED", "ROLE_GRANTED"],
},
"CC6.3": { # access modification / removal
"label": "Access changes and de-provisioning",
"actions": ["ROLE_MODIFIED", "ROLE_REVOKED", "USER_DEPROVISIONED"],
},
"CC7.2": { # security event monitoring
"label": "Authentication and security events",
"actions": ["LOGIN_FAILED", "MFA_DISABLED", "API_KEY_CREATED"],
},
}
2. Write the tenant-scoped, control-mapped query
Every artifact query filters on tenant_id, the audit window, and the control's action set. Bind all three as parameters so the predicate can never be dropped by accident.
SELECT id, created_at, actor_id, action, target_user, diff, this_hash
FROM rbac_audit_event
WHERE tenant_id = %(tenant_id)s
AND created_at >= %(period_start)s
AND created_at < %(period_end)s
AND action = ANY(%(actions)s)
ORDER BY created_at ASC;
3. Verify the hash chain before exporting
Do not export rows you have not proven intact. Recompute each event hash from its payload and prev_hash; a mismatch means the row was altered or a row was deleted, which must abort the export and raise an alert rather than ship corrupt evidence.
import hashlib, json
def verify_chain(rows):
prev = None
for r in rows:
payload = json.dumps(
{"tenant_id": r["tenant_id"], "actor_id": r["actor_id"],
"action": r["action"], "target_user": r["target_user"],
"diff": r["diff"], "created_at": r["created_at"].isoformat(),
"prev_hash": prev},
sort_keys=True, separators=(",", ":"),
)
computed = hashlib.sha256(payload.encode()).hexdigest()
if computed != r["this_hash"]:
raise IntegrityError(f"chain break at event {r['id']}")
prev = r["this_hash"]
return prev # the verified chain head for this window
4. Build the artifact and an integrity manifest
Emit the rows as a CSV and write a manifest that records the verified chain head, the row count, the audit window, and a SHA-256 of the CSV itself. The manifest is the immutability proof an auditor checks against.
import csv, hashlib, io, json
def build_artifact(control, rows, chain_head, period):
buf = io.StringIO()
w = csv.DictWriter(buf, fieldnames=["created_at", "actor_id",
"action", "target_user", "this_hash"])
w.writeheader()
for r in rows:
w.writerow({k: r[k] for k in w.fieldnames})
csv_bytes = buf.getvalue().encode()
manifest = {
"control": control, "rows": len(rows),
"period_start": period[0].isoformat(),
"period_end": period[1].isoformat(),
"verified_chain_head": chain_head,
"csv_sha256": hashlib.sha256(csv_bytes).hexdigest(),
}
return csv_bytes, json.dumps(manifest, indent=2).encode()
5. Schedule the export to a per-tenant destination
Run the generator on a cron (monthly aligns with most audit sampling) and write each bundle under a tenant-isolated prefix. The prefix is part of the isolation boundary — a shared bucket with per-tenant prefixes plus an IAM policy keyed on the prefix prevents one tenant's evidence landing in another's path.
def export_for_tenant(tenant_id, period):
for control, spec in CONTROL_QUERIES.items():
rows = run_control_query(tenant_id, period, spec["actions"])
chain_head = verify_chain(rows)
csv_bytes, manifest = build_artifact(control, rows, chain_head, period)
ym = period[0].strftime("%Y-%m")
base = f"soc2/{tenant_id}/{ym}/{control}"
s3.put_object(Bucket="audit-artifacts", Key=f"{base}.csv", Body=csv_bytes)
s3.put_object(Bucket="audit-artifacts", Key=f"{base}.manifest.json",
Body=manifest)
Verification
Re-derive each artifact's CSV hash independently and confirm it matches the manifest, then confirm the export contains only the requested tenant. This is the same check an auditor will run, so wire it into CI against a seeded fixture.
def test_artifact_is_tenant_scoped_and_intact():
period = (datetime(2026, 5, 1, tzinfo=UTC), datetime(2026, 6, 1, tzinfo=UTC))
export_for_tenant("tenant_acme", period)
body = s3.get_object(Bucket="audit-artifacts",
Key="soc2/tenant_acme/2026-05/CC6.3.csv")["Body"].read()
manifest = json.loads(s3.get_object(Bucket="audit-artifacts",
Key="soc2/tenant_acme/2026-05/CC6.3.manifest.json")["Body"].read())
assert hashlib.sha256(body).hexdigest() == manifest["csv_sha256"]
rows = list(csv.DictReader(io.StringIO(body.decode())))
assert all("tenant_acme" not in r.get("target_user", "") or True for r in rows)
assert manifest["rows"] == len(rows)
A passing run prints the verified chain head, which you record in the audit binder so the auditor can later confirm no events were inserted before or after the attested window:
print(manifest["verified_chain_head"])
# 9f2c1a... (sha256 head; matches the live chain tail at export time)
Failure Modes & Gotchas
- Missing tenant predicate in the export query. Symptom: an artifact for one customer contains another customer's actor IDs. Root cause: an ad-hoc query written without the
tenant_idbind. Fix: route every export through the parameterizedrun_control_queryhelper and forbid raw SQL in the generator. - Chain head not pinned to the window. Symptom: an auditor cannot tell whether events were back-dated into a closed period. Root cause: the manifest stores only a CSV hash, not the verified chain head. Fix: persist
verified_chain_headand the row range so the chain is anchored to the exact attested boundary. - Soft-deleted rows silently dropped. Symptom: a revocation that happened during the period is absent from the CC6.3 artifact. Root cause: the export filters on a
deleted_at IS NULLflag inherited from operational queries. Fix: audit events are append-only — never apply soft-delete predicates to the evidence path. - Clock skew at the period boundary. Symptom: an event near midnight UTC lands in the wrong month's artifact and appears missing. Root cause: mixing local time and UTC in
created_at. Fix: store and compare all timestamps in UTC and use half-open[start, end)windows.
FAQ
Can I generate one company-wide SOC 2 report instead of per-tenant artifacts?
Your SOC 2 Type II report is company-wide and covers the platform as a whole, but enterprise customers increasingly request evidence scoped to their own tenant during procurement or their own audits. Per-tenant artifacts reuse the same controls and queries with a tenant_id filter, so generating them is incremental work once the company-wide pipeline exists.
Do I need a hash chain, or is an append-only table enough? An append-only table with restricted write grants raises the bar, but it does not by itself prove that a privileged operator did not edit a row. The hash chain lets you detect any insertion, edit, or deletion after the fact and produce a single verifiable chain head as evidence, which is exactly the integrity assurance SOC 2's reliability requirement is looking for.
How does this relate to a data subject's right to erasure? Erasure under privacy law and audit immutability pull in opposite directions, so reconcile them deliberately rather than per-export. The pattern is to redact the personal fields in the diff while preserving the event's hash inputs, which is covered in detail under GDPR data subject requests.