TTL-First Cache Table Patterns (FaceTheory ISR)

This document describes operationally safe patterns for TTL-based cache metadata tables used for ISR:

how to choose a TTL attribute and window
how to separate “freshness” from “expiry”
how to account for DynamoDB TTL deletion lag
operational recommendations to avoid surprises in production

Schema context: docs/facetheory/isr-cache-schema.md.

Choose a TTL attribute

Recommended:

Use a single numeric attribute named ttl that stores epoch seconds.
Treat it as garbage collection, not as correctness logic.

Why:

TTL is asynchronous and can delete later than the timestamp you set.
A stable ttl attribute name keeps Go/TypeScript/Python models aligned.

Freshness vs expiry (two clocks)

Use two separate concepts:

Freshness window (correctness boundary):
- fresh_until = generated_at + revalidate_seconds
Expiry / GC horizon (storage boundary):
- ttl = generated_at + retention_seconds (+ safety_buffer)

Rules of thumb:

revalidate_seconds is usually minutes to hours.
retention_seconds is usually days to weeks (enough for debuggability and rollback).
ttl SHOULD be much larger than revalidate_seconds.

DynamoDB TTL deletion lag

TTL deletion is best-effort; items can remain after their ttl passes.

Implications:

✅ Treat ttl as “eligible for deletion”, not “will be deleted immediately”.
✅ Readers MUST decide fresh/stale based on generated_at and revalidate_seconds.
❌ Never treat “item missing” as “item never existed”; it may have been deleted earlier or later than expected.

Operational recommendations

Hot partitions: avoid putting raw URL paths directly in pk; use a stable hash to reduce common-prefix hotspots.
Capacity: for ISR tables, on-demand billing is often simplest; watch ThrottledRequests and adjust.
Alarms: alert on read/write throttles, elevated UserErrors, and high latency; consider tracking TimeToLiveDeletedItemCount as a sanity signal (not a correctness signal).
Retention: choose a retention long enough for investigations (and long enough to survive TTL lag), but not so long that storage cost grows unbounded.

Example: write metadata with TTL, then read and decide fresh/stale

This example uses the META row pattern (sk = "META").

Write

Compute generated_at = now_unix.
Set ttl = generated_at + retention_seconds.
Store revalidate_seconds alongside the metadata.

Read

Fetch the META item by primary key.
Compute fresh_until = generated_at + revalidate_seconds.
Consider it fresh iff now_unix < fresh_until.

Go example

nowUnix := time.Now().Unix()

meta := &FaceTheoryCacheMetadata{
	PK:                "TENANT#t1#CACHE#abc",
	SK:                "META",
	S3Key:             "pages/t1/abc.html",
	GeneratedAt:       nowUnix,
	RevalidateSeconds: 60,              // freshness window
	TTL:               nowUnix + 86400, // GC horizon (1 day)
}

if err := db.Model(meta).CreateOrUpdate(); err != nil {
	return err
}

var got FaceTheoryCacheMetadata
if err := db.Model(&FaceTheoryCacheMetadata{}).
	Where("PK", "=", meta.PK).
	Where("SK", "=", "META").
	First(&got); err != nil {
	return err
}

freshUntil := got.GeneratedAt + got.RevalidateSeconds
isFresh := time.Now().Unix() < freshUntil