openberth-infra · operator's glance · approved

A datacenter /
by scroll.

One VM per workspace. Containers inside. A four-state lifecycle, a handful of verbs, and one Postgres to hold it all — the control plane, at a glance.

4 states 4 flavors 7 tables gRPC only mTLS agents 24h RPO 4 specs · 04-18 → 04-19

scroll

Chapter I · lifecycle

Workspaces & Lifecycle.

approved 2026-04-18flavors · states · transitions · durationscovers 01 → 06

Each workspace is one VM. Applications live inside it as gVisor containers. Four persistent states, four transient markers — and no payment_failed to be seen.

01Flavors

fixed tiers · one escape hatch

Four sizes, arbitrary headroom.

A flavor defines the VM's resource envelope. App count is a marketing approximation — not a contract. Sizing: 1 GB floor + apps × observed footprint + build headroom.

Hobby

2 vCPU

RAM4 GB

Disk25 GB

~apps5

~static40

Pro

4 vCPU

RAM8 GB

Disk50 GB

~apps10

~static100

Team

8 vCPU

RAM20 GB

Disk100 GB

~apps25

~static300

Custom

∞ SALES

RAMneg.

Diskneg.

~apps—

~static—

Trial isn't a flavor — it's a Hobby VM with upstream metadata. maxDeploys is a safety valve, not a sold quantity. Ceilings: Hobby 500 · Pro 1000 · Team 2000.

02Lifecycle

state machine

active → suspended → archived → deleted.

Four persistent states; four transient markers — suspending, archiving, restoring, deleting. active covers everything running — trialing, paying, overdue-in-grace. Infra is blind to payment_failed.

persistent

active

persistent

suspended

persistent

archived

terminal

deleted

state	vm	local disk	snapshot	url response
active	running	attached	periodic 24h	openberth serves
suspended	powered off	retained	last periodic	edge: suspended page
archived	destroyed	wiped	fresh, object storage	edge: archived page
deleted	gone	gone	purged	404 / tombstone

03Transition API

four verbs · idempotent

Same request_id, same outcome.

All transitions accept a client-supplied request_id — same id, same outcome, no duplicate work. Conflicting in-flight transitions return 409. We do not queue, cancel-forward, or auto-retry.

POST/workspaces/{id}:suspend

POST/workspaces/{id}:archive

POST/workspaces/{id}:restore

POST/workspaces/{id}:delete

GET/workspaces/{id} → state · transient-op flag · ETA

archive is snapshot-then-wipe. Snapshot uploaded and checksum-verified before local disk is wiped. Mid-flight failure rolls back to suspended; partial artifacts cleaned up.

delete verifies purge completeness. Not marked deleted until VM gone, disk wiped, snapshots purged, row pseudonymized, audit-log entry written.

iii

restore is scheduled. Picks a same-region host with capacity, pulls snapshot if from archived, boots, waits for healthcheck. Failure returns to prior state — no transient limbo.

409 on conflict. Caller retries after the transient op completes. Infra never silently blocks.

04Durations

operation timing

Archive is minutes, not instant.

Durations are published so the frontpage can size its policy timers realistically. The bars below are log-scaled against a 30-minute ceiling.

suspend

10s · max 2m

Upstream owns when. Infra owns how.

Upstream owns when. Grace periods, trial expiries, retention windows, promotional extensions. The frontpage calls the transition API at the moment each transition should happen.

Infra owns how. Atomicity, ordering, verification, idempotency, audit trail, rollback. 200 = durably complete. 202 = in progress; poll.

iii

Durations constrain policy. Archive is minutes, not instant. Upstream timers must account for this.

Account-deletion execution is ours. Decision upstream; VM-stop + disk-wipe + snapshot-purge + audit entry are ours. GDPR completeness is our obligation.

Billing records are not in our schema. We keep an infra audit log for incident post-mortems; PII pseudonymized on delete.

06Edge routing

during non-active states

The edge answers without the VM.

A host-level edge proxy (Caddy with a state-aware config, reloaded on every transition) knows the state of every workspace on its host. DNS is not manipulated per state. For archived workspaces whose VM has been destroyed, the edge retains a routing rule sufficient to serve the tombstone; restore may land on a different host — routing updates to point at the new host.

suspended

edge → static page
tap down · disk retained

archived

edge → static page
slot freed · snapshot in S3

deleted

edge → 404 tombstone
row pseudonymized

Chapter II · data

Postgres Schema.

approved 2026-04-19tables · encoding · schedulingcovers 07 → 09

Durable state in one database. A River-style job runner lives in-process; our tables own the logical record, River's own the execution mechanics.

07Schema

postgres · 2026-04-19

All state, one database.

Long-running transitions are driven by an in-process River-style job runner — no separate workflow engine. Our tables own workspace state and operation records; River's tables own execution mechanics. They join via operations.river_job_id.

regions catalog

idtext · "us-west-1"
display_nametext
stateactive / retiring
created_attimestamptz

hosts bare-metal

iduuid
region_idregions
fqdntext · unique
total_vcpu / ram / diskint · int · int
statehealthy / draining / retired
last_heartbeat_attimestamptz

workspaces main entity

iduuid
external_workspace_idtext
external_user_idtext
display_nametext
region_id / host_idregions / hosts
flavorhobby · pro · team · custom
vcpu / ram_gb / disk_gbenvelope
stateactive · suspended · archived · deleted
current_operation_idoperations (deferred)
snapshot_interval_secondsdefault 86400
snapshot_retention_countdefault 30

workspace_domains edge routing

iduuid
workspace_idworkspaces
domaintext · unique
kinddefault_subdomain / custom
cert_statepending · issued · failed · expired
verified_at / cert_issued_attimestamptz

operations transitions

iduuid
workspace_idworkspaces
verbsuspend · archive · restore · delete
request_idUNIQUE(ws, req)
statuspending · running · ok · failed · rolled_back
step_statejsonb · resumable
river_job_idbigint
requested/started/completed_attimestamptz

snapshots object store

iduuid
workspace_idworkspaces
object_uritext
size_bytes / checksumbigint · text
toolkopia · restic · qemu-img
kindperiodic · pre_archive
verified_atonce checksum verified

audit_log append-only · survives delete

idbigserial
workspace_idnullable
former_workspace_iduuid · set after delete
event_typee.g. transition.archive.succeeded
event_datajsonb · no PII
actorsystem · api:<caller> · admin:<user>

Legend · ◆ primary key · → foreign key · ·pii scrubbed atomically on delete.

08State encoding

destination vs. in-flight

Never archiving in the state column.

workspaces.state only holds persistent destinations. Transient verbs live on operations.status. A denormalized pointer current_operation_id gives readers O(1) access to "is this workspace busy?" without scanning operations.

workspaces.state

active

persistent destination

current_operation_id

operations.status

running · verb=archive

source of truth for in-flight

Single transaction, two writes. current_operation_id and operations.status update atomically — never one without the other.

Idempotency at DB level. UNIQUE (workspace_id, request_id) catches double-submits in Postgres, not app code.

iii

Resumable workers. step_state JSONB carries crash-safe progress. Example for archive: {snapshot_id, uploaded:true, verified:false, step:"verifying"}.

On delete: PII scrub + state change in ONE UPDATE. Operations rows deleted. Audit_log rows have workspace_id copied to former_workspace_id and NULLed.

09Capacity

scheduling query

Allocated capacity is derived, not stored.

No materialized allocated_* columns on hosts. Small fleet, cheap query, zero drift risk. Suspended workspaces still count — their disk is on the host.

-- find a host in region $1 with room for $2 vcpu / $3 ram / $4 disk
SELECT h.id
FROM hosts h
WHERE h.region_id = $1
  AND h.state = 'healthy'
  AND h.total_vcpu    - COALESCE((SELECT SUM(w.vcpu)    FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $2
  AND h.total_ram_gb  - COALESCE((SELECT SUM(w.ram_gb)  FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $3
  AND h.total_disk_gb - COALESCE((SELECT SUM(w.disk_gb) FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $4
FOR UPDATE SKIP LOCKED
LIMIT 1;

FOR UPDATE SKIP LOCKED lets concurrent schedulers pick different hosts. The workspace INSERT in the same transaction reserves the slot.

Chapter III · surface

III

Control Plane API.

approved 2026-04-19grpc · bearer · async · errors · obscovers 10 → 12

Consumed only by backends. Every mutating RPC returns an Operation immediately; callers poll. A closed, versioned reason set lets clients branch.

10Protocol

control plane api · 2026-04-19

gRPC only. No REST transcoding.

All consumers are backends — frontpage, admin tooling, host agents. No browsers, no third-party integrations. One schema/tooling stack north- and southbound. grpcurl covers terminal debugging. A single WorkspaceService hosts every RPC; admin RPCs are gated by credential scope.

bearer tokens over TLS scope standard · admin perimeter subnets, then tokens rotation manual · MVP upgrade → mTLS without schema change

Logging contract: the authorization metadata key MUST be redacted at every log site. Enforced via middleware; unit-tested.

11Verbs

one service · typed rpcs

Return an Operation. Callers poll.

Every mutating RPC returns an Operation immediately. No server-streaming progress in v1. request_id lives on the request message (not metadata) — DB enforces idempotency via UNIQUE (workspace_id, request_id).

Workspace lifecycle

CreateWorkspacepick host · schedule · provision

standard

GetWorkspaceincludes in-flight via current_operation_id

standard

ListWorkspacescursor · filter region, state, user, flavor

standard

UpdateWorkspacesnapshot policy · display_name · external ids

standard

SuspendWorkspace→ Operation

standard

ArchiveWorkspacesnapshot-then-wipe

standard

RestoreWorkspacescheduled onto any host with capacity

standard

DeleteWorkspaceGDPR-grade purge

standard

ResizeWorkspacein-place cgroup OR archive+restore migration

standard

Domain / routing

AddCustomDomain→ WorkspaceDomain

standard

RemoveCustomDomain→ Empty

standard

ListDomainscursor-paginated

standard

Operations

GetOperationterminal status · step_state · error

standard

ListOperationsfilter workspace · status · verb

standard

Fleet

RegisterHostadd bare-metal to fleet

admin

DrainHoststop scheduling · migrate workspaces off

admin

RetireHosthide from scheduler queries

admin

GetHostderived capacity · heartbeat

admin

ListHostscursor-paginated

admin

message Operation {
  string                    id           = 1;
  string                    workspace_id = 2;
  OperationVerb             verb         = 3;  // CREATE · SUSPEND · ARCHIVE · RESTORE · DELETE · RESIZE
  OperationStatus           status       = 4;  // PENDING · RUNNING · SUCCEEDED · FAILED · ROLLED_BACK
  map<string,string>        step_state   = 5;
  string                    error        = 6;
  google.protobuf.Timestamp requested_at = 7;
  google.protobuf.Timestamp started_at   = 8;
  google.protobuf.Timestamp completed_at = 9;
}

12Errors · Obs

status codes · cursors · metrics

A closed, versioned reason set.

Domain semantics ride on google.rpc.ErrorInfo. reason values are a closed, versioned set — clients branch on them. New reasons are additive; existing reasons never change meaning.

NOT_FOUND

Referenced workspace / operation / host does not exist.

ALREADY_EXISTS

Idempotent resubmit with divergent payload, or external_workspace_id collision.

FAILED_PRECONDITION

Illegal state transition (e.g. Restore on active).

ABORTED

Conflicting in-flight op. ErrorInfo.metadata.current_operation_id set. Retry after terminal.

RESOURCE_EXHAUSTED

No host capacity in region for requested flavor.

INVALID_ARGUMENT

Malformed flavor / domain / region.

UNAUTHENTICATED

Missing or malformed bearer token.

PERMISSION_DENIED

Token scope insufficient for the RPC.

UNAVAILABLE

Temporary server issue; retry with backoff.

INTERNAL

Programming bug; not retryable.

Pagination

Cursor-based on every List* RPC. Opaque base64 of (last_id, ordering_key). Forward-only; stable under concurrent mutation. No "go to page N". No total count.

WHERE id > :last_id ORDER BY id LIMIT :page_size   -- default 50, max 500

Observability · :9090/metrics

grpc_requests_total {rpc, code} grpc_request_duration_seconds {rpc} workspaces_total {state, region, flavor} host_capacity_used_ratio {host_id, resource} operation_duration_seconds {verb, status} river_jobs_queued · in_progress · retries_total

OpenTelemetry spans propagate into River jobs — a single trace spans the full transition: API call → job start → step execution → completion.

Chapter IV · edge

Host Agent Contract.

approved 2026-04-19mtls · commands · events · reconcilecovers 13 → 15

One long-lived bidi gRPC session per host. Agents dial — no inbound holes. Semantic commands abstract qemu-img away from the plane.

13Host agents

host agent contract · 2026-04-19

Agents dial. No inbound holes.

Each bare-metal host runs a long-lived daemon that owns everything local — hypervisor, snapshots, tap devices, edge proxy. The control plane issues semantic commands (ProvisionVM, TakeSnapshot) — never qemu-img. One long-lived bidi gRPC session per host. Hosts can sit behind NAT in different clouds; only the control plane needs inbound reachability.

control plane

WorkspaceService · mTLS

agent ca · 90-day certs · revocation list

agent a

host-01us-west-1

agent b

host-02eu-central-1

agent c

host-03us-east-1

Commands →

Control plane → agent dispatch. Ack confirms receipt and validation, not execution. Command command_id = operations.id.

Events ←

Agent → control plane telemetry. Monotonic seq per session, high-water-mark acks, durable replay buffer on reconnect.

mTLS from day one bootstrap token + enrollment URL cert CN=host_id · OU=region_id rotate at 60 days · 7-day overlap heartbeat 10s · stale @ 30s reconnect · expo-backoff cap 30s

14Commands · Events

semantic verbs · structured telemetry

Two sub-streams, independent back-pressure.

A slow consumer on events can't stall command dispatch; a heavy command can't delay heartbeats. Both bidi RPCs ride the same mTLS connection.

Commands · control plane → agent

ProvisionVM

Create the VM. Inputs: workspace_id, envelope, network config, optional snapshot_uri. Outcome: VM running, openberth healthcheck passing.

StopVM

Graceful ACPI shutdown (default 60s timeout), then force-stop. For suspend. Leaves disk in place.

StartVM

Boot an existing stopped VM. For restore from suspended. Same healthcheck gate as provision.

DestroyVM

Stop, unmount, secure-wipe disk. For archive (after snapshot verified) and delete.

TakeSnapshot

Snapshot disk (quiesced if possible), upload to object storage, return URI + checksum + size. Kind: periodic or pre_archive.

RestoreFromSnapshot

Download snapshot, verify checksum, stage as new disk. Caller follows with ProvisionVM or StartVM.

UpdateEdgeRoute

Update host edge-proxy config for a workspace. State + domains + cert states. Atomic reload.

ResizeVM

In-place cgroup adjustments on a running VM. Migration path is archive + restore, not this command.

RotateAgentCert

Issue a new cert; old cert valid for 7 days. Triggered by agent at 60 days remaining.

Events · agent → control plane

Heartbeat

Every 10s. Free capacity (vcpu/ram/disk), uptime, agent version, counts of VMs by local state.

CommandProgress

Long-running command emits progress. command_id, step name, optional numeric progress (e.g. snapshot upload %).

CommandResult

Command terminates. command_id, status (succeeded/failed), payload or error.

WorkspaceLocalEvent

Unsolicited: VM crashed and was restarted, disk-usage threshold, custom-domain cert issued/failed/expired, healthcheck flapping.

AgentStopping

Planned maintenance shutdown. Carries reason.

Flow control: agent buffer caps at 10 000 events. On overflow, routine Heartbeat and CommandProgress drop first; CommandResult and WorkspaceLocalEvent are retained preferentially. On reconnect, Inventory + buffered events bring the control plane up to date.

15Reconciliation

at session open

Neither side trusts state after a disconnect.

On every session open, agent and control plane exchange ground truth and reconcile. No special "reconcile command" type — directives are composed of the same Command messages as normal operation. The agent's command handler stays trivially uniform.

AgentHello. host_id, agent version, uptime, hypervisor type + version, capacity summary.

Inventory. Full list of VMs on disk (workspace_id, running/stopped/missing, disk presence, observed envelope), in-flight local ops (snapshot running, provision in progress), custom domains in the edge proxy.

iii

ControlHello. Session accepted, server time, per-command ack deadline config.

ReconcileDirective. Derived from comparing Inventory vs. Postgres:

Expected but missing: corrective ProvisionVM, or mark operations row failed if not resumable.
Orphans (agent has it, control plane doesn't): log + alert. Never auto-destroy on MVP — operator decides.
Stuck running ops: if operations.status = running but the agent has no record of the command, mark failed with reason agent_reconnected_without_completion. Upstream retries with a fresh request_id.

Normal command / event flow begins. All five reconciliation directives are ordinary Command messages on the same stream.

Invariants that make the contract boring on purpose: command_id = operations.id, ack means received not executed, reconcile uses normal commands, agents never auto-destroy, event seq is monotonic per session.

A datacenter / by scroll.

Workspaces & Lifecycle.

Four sizes, arbitrary headroom.

active → suspended → archived → deleted.

Same request_id, same outcome.

Archive is minutes, not instant.

Upstream owns when. Infra owns how.

The edge answers without the VM.

Postgres Schema.

All state, one database.

regions catalog

hosts bare-metal

workspaces main entity

workspace_domains edge routing

operations transitions

snapshots object store

audit_log append-only · survives delete

Never archiving in the state column.

Allocated capacity is derived, not stored.

Control Plane API.

gRPC only. No REST transcoding.

Return an Operation. Callers poll.

Workspace lifecycle

Domain / routing

Operations

Fleet

A closed, versioned reason set.

Pagination

Observability · :9090/metrics

Host Agent Contract.

Agents dial. No inbound holes.

Commands →

Events ←

Two sub-streams, independent back-pressure.

Commands · control plane → agent

Events · agent → control plane

Neither side trusts state after a disconnect.

A datacenter /
by scroll.