OPI · ESTABLISHING LINK
openberth-infra · operator's glance · approved

A datacenter /
by scroll.

One VM per workspace. Containers inside. A four-state lifecycle, a handful of verbs, and one Postgres to hold it all — the control plane, at a glance.

4 states 4 flavors 7 tables gRPC only mTLS agents 24h RPO 4 specs · 04-18 → 04-19
scroll
Chapter I · lifecycle
I

Workspaces & Lifecycle.

approved 2026-04-18flavors · states · transitions · durationscovers 01 → 06

Each workspace is one VM. Applications live inside it as gVisor containers. Four persistent states, four transient markers — and no payment_failed to be seen.

01Flavors
fixed tiers · one escape hatch

Four sizes, arbitrary headroom.

A flavor defines the VM's resource envelope. App count is a marketing approximation — not a contract. Sizing: 1 GB floor + apps × observed footprint + build headroom.

Hobby
2 vCPU
RAM4 GB
Disk25 GB
~apps5
~static40
Pro
4 vCPU
RAM8 GB
Disk50 GB
~apps10
~static100
Team
8 vCPU
RAM20 GB
Disk100 GB
~apps25
~static300
Custom
SALES
RAMneg.
Diskneg.
~apps
~static

Trial isn't a flavor — it's a Hobby VM with upstream metadata. maxDeploys is a safety valve, not a sold quantity. Ceilings: Hobby 500 · Pro 1000 · Team 2000.

02Lifecycle
state machine

active → suspended → archived → deleted.

Four persistent states; four transient markers — suspending, archiving, restoring, deleting. active covers everything running — trialing, paying, overdue-in-grace. Infra is blind to payment_failed.

persistent
active
persistent
suspended
persistent
archived
terminal
deleted
statevmlocal disksnapshoturl response
activerunningattachedperiodic 24hopenberth serves
suspendedpowered offretainedlast periodicedge: suspended page
archiveddestroyedwipedfresh, object storageedge: archived page
deletedgonegonepurged404 / tombstone
03Transition API
four verbs · idempotent

Same request_id, same outcome.

All transitions accept a client-supplied request_id — same id, same outcome, no duplicate work. Conflicting in-flight transitions return 409. We do not queue, cancel-forward, or auto-retry.

POST/workspaces/{id}:suspend
POST/workspaces/{id}:archive
POST/workspaces/{id}:restore
POST/workspaces/{id}:delete
GET/workspaces/{id}  → state · transient-op flag · ETA
i
archive is snapshot-then-wipe. Snapshot uploaded and checksum-verified before local disk is wiped. Mid-flight failure rolls back to suspended; partial artifacts cleaned up.
ii
delete verifies purge completeness. Not marked deleted until VM gone, disk wiped, snapshots purged, row pseudonymized, audit-log entry written.
iii
restore is scheduled. Picks a same-region host with capacity, pulls snapshot if from archived, boots, waits for healthcheck. Failure returns to prior state — no transient limbo.
iv
409 on conflict. Caller retries after the transient op completes. Infra never silently blocks.
04Durations
operation timing

Archive is minutes, not instant.

Durations are published so the frontpage can size its policy timers realistically. The bars below are log-scaled against a 30-minute ceiling.

suspend
10s · max 2m
archive
5–30m
restore ← suspended
20s · max 2m
restore ← archived
5–30m
delete
1m · max 10m
24h

Default snapshot cadence. Incremental, backed by a periodic full. Retention N matches the archive window; enterprise may opt into tighter cadence. RPO promise: up to 24 hours of data loss on host failure during active.

05Contracts
with upstream (frontpage)

Upstream owns when. Infra owns how.

i
Upstream owns when. Grace periods, trial expiries, retention windows, promotional extensions. The frontpage calls the transition API at the moment each transition should happen.
ii
Infra owns how. Atomicity, ordering, verification, idempotency, audit trail, rollback. 200 = durably complete. 202 = in progress; poll.
iii
Durations constrain policy. Archive is minutes, not instant. Upstream timers must account for this.
iv
Account-deletion execution is ours. Decision upstream; VM-stop + disk-wipe + snapshot-purge + audit entry are ours. GDPR completeness is our obligation.
v
Billing records are not in our schema. We keep an infra audit log for incident post-mortems; PII pseudonymized on delete.
06Edge routing
during non-active states

The edge answers without the VM.

A host-level edge proxy (Caddy with a state-aware config, reloaded on every transition) knows the state of every workspace on its host. DNS is not manipulated per state. For archived workspaces whose VM has been destroyed, the edge retains a routing rule sufficient to serve the tombstone; restore may land on a different host — routing updates to point at the new host.

suspended
edge → static page
tap down · disk retained
archived
edge → static page
slot freed · snapshot in S3
deleted
edge → 404 tombstone
row pseudonymized
Chapter II · data
II

Postgres Schema.

approved 2026-04-19tables · encoding · schedulingcovers 07 → 09

Durable state in one database. A River-style job runner lives in-process; our tables own the logical record, River's own the execution mechanics.

07Schema
postgres · 2026-04-19

All state, one database.

Long-running transitions are driven by an in-process River-style job runner — no separate workflow engine. Our tables own workspace state and operation records; River's tables own execution mechanics. They join via operations.river_job_id.

regions catalog

  • idtext · "us-west-1"
  • display_nametext
  • stateactive / retiring
  • created_attimestamptz

hosts bare-metal

  • iduuid
  • region_idregions
  • fqdntext · unique
  • total_vcpu / ram / diskint · int · int
  • statehealthy / draining / retired
  • last_heartbeat_attimestamptz

workspaces main entity

  • iduuid
  • external_workspace_idtext
  • external_user_idtext
  • display_nametext
  • region_id / host_idregions / hosts
  • flavorhobby · pro · team · custom
  • vcpu / ram_gb / disk_gbenvelope
  • stateactive · suspended · archived · deleted
  • current_operation_idoperations (deferred)
  • snapshot_interval_secondsdefault 86400
  • snapshot_retention_countdefault 30

workspace_domains edge routing

  • iduuid
  • workspace_idworkspaces
  • domaintext · unique
  • kinddefault_subdomain / custom
  • cert_statepending · issued · failed · expired
  • verified_at / cert_issued_attimestamptz

operations transitions

  • iduuid
  • workspace_idworkspaces
  • verbsuspend · archive · restore · delete
  • request_idUNIQUE(ws, req)
  • statuspending · running · ok · failed · rolled_back
  • step_statejsonb · resumable
  • river_job_idbigint
  • requested/started/completed_attimestamptz

snapshots object store

  • iduuid
  • workspace_idworkspaces
  • object_uritext
  • size_bytes / checksumbigint · text
  • toolkopia · restic · qemu-img
  • kindperiodic · pre_archive
  • verified_atonce checksum verified

audit_log append-only · survives delete

  • idbigserial
  • workspace_idnullable
  • former_workspace_iduuid · set after delete
  • event_typee.g. transition.archive.succeeded
  • event_datajsonb · no PII
  • actorsystem · api:<caller> · admin:<user>

Legend · primary key · foreign key · ·pii scrubbed atomically on delete.

08State encoding
destination vs. in-flight

Never archiving in the state column.

workspaces.state only holds persistent destinations. Transient verbs live on operations.status. A denormalized pointer current_operation_id gives readers O(1) access to "is this workspace busy?" without scanning operations.

workspaces.state
active
persistent destination
current_operation_id
operations.status
running · verb=archive
source of truth for in-flight
i
Single transaction, two writes. current_operation_id and operations.status update atomically — never one without the other.
ii
Idempotency at DB level. UNIQUE (workspace_id, request_id) catches double-submits in Postgres, not app code.
iii
Resumable workers. step_state JSONB carries crash-safe progress. Example for archive: {snapshot_id, uploaded:true, verified:false, step:"verifying"}.
iv
On delete: PII scrub + state change in ONE UPDATE. Operations rows deleted. Audit_log rows have workspace_id copied to former_workspace_id and NULLed.
09Capacity
scheduling query

Allocated capacity is derived, not stored.

No materialized allocated_* columns on hosts. Small fleet, cheap query, zero drift risk. Suspended workspaces still count — their disk is on the host.

-- find a host in region $1 with room for $2 vcpu / $3 ram / $4 disk
SELECT h.id
FROM hosts h
WHERE h.region_id = $1
  AND h.state = 'healthy'
  AND h.total_vcpu    - COALESCE((SELECT SUM(w.vcpu)    FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $2
  AND h.total_ram_gb  - COALESCE((SELECT SUM(w.ram_gb)  FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $3
  AND h.total_disk_gb - COALESCE((SELECT SUM(w.disk_gb) FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $4
FOR UPDATE SKIP LOCKED
LIMIT 1;

FOR UPDATE SKIP LOCKED lets concurrent schedulers pick different hosts. The workspace INSERT in the same transaction reserves the slot.

Chapter III · surface
III

Control Plane API.

approved 2026-04-19grpc · bearer · async · errors · obscovers 10 → 12

Consumed only by backends. Every mutating RPC returns an Operation immediately; callers poll. A closed, versioned reason set lets clients branch.

10Protocol
control plane api · 2026-04-19

gRPC only. No REST transcoding.

All consumers are backends — frontpage, admin tooling, host agents. No browsers, no third-party integrations. One schema/tooling stack north- and southbound. grpcurl covers terminal debugging. A single WorkspaceService hosts every RPC; admin RPCs are gated by credential scope.

bearer tokens over TLS scope standard · admin perimeter subnets, then tokens rotation manual · MVP upgrade → mTLS without schema change

Logging contract: the authorization metadata key MUST be redacted at every log site. Enforced via middleware; unit-tested.

11Verbs
one service · typed rpcs

Return an Operation. Callers poll.

Every mutating RPC returns an Operation immediately. No server-streaming progress in v1. request_id lives on the request message (not metadata) — DB enforces idempotency via UNIQUE (workspace_id, request_id).

Workspace lifecycle

CreateWorkspacepick host · schedule · provision
standard
GetWorkspaceincludes in-flight via current_operation_id
standard
ListWorkspacescursor · filter region, state, user, flavor
standard
UpdateWorkspacesnapshot policy · display_name · external ids
standard
SuspendWorkspace→ Operation
standard
ArchiveWorkspacesnapshot-then-wipe
standard
RestoreWorkspacescheduled onto any host with capacity
standard
DeleteWorkspaceGDPR-grade purge
standard
ResizeWorkspacein-place cgroup OR archive+restore migration
standard

Domain / routing

AddCustomDomain→ WorkspaceDomain
standard
RemoveCustomDomain→ Empty
standard
ListDomainscursor-paginated
standard

Operations

GetOperationterminal status · step_state · error
standard
ListOperationsfilter workspace · status · verb
standard

Fleet

RegisterHostadd bare-metal to fleet
admin
DrainHoststop scheduling · migrate workspaces off
admin
RetireHosthide from scheduler queries
admin
GetHostderived capacity · heartbeat
admin
ListHostscursor-paginated
admin
message Operation {
  string                    id           = 1;
  string                    workspace_id = 2;
  OperationVerb             verb         = 3;  // CREATE · SUSPEND · ARCHIVE · RESTORE · DELETE · RESIZE
  OperationStatus           status       = 4;  // PENDING · RUNNING · SUCCEEDED · FAILED · ROLLED_BACK
  map<string,string>        step_state   = 5;
  string                    error        = 6;
  google.protobuf.Timestamp requested_at = 7;
  google.protobuf.Timestamp started_at   = 8;
  google.protobuf.Timestamp completed_at = 9;
}
12Errors · Obs
status codes · cursors · metrics

A closed, versioned reason set.

Domain semantics ride on google.rpc.ErrorInfo. reason values are a closed, versioned set — clients branch on them. New reasons are additive; existing reasons never change meaning.

NOT_FOUND
Referenced workspace / operation / host does not exist.
ALREADY_EXISTS
Idempotent resubmit with divergent payload, or external_workspace_id collision.
FAILED_PRECONDITION
Illegal state transition (e.g. Restore on active).
ABORTED
Conflicting in-flight op. ErrorInfo.metadata.current_operation_id set. Retry after terminal.
RESOURCE_EXHAUSTED
No host capacity in region for requested flavor.
INVALID_ARGUMENT
Malformed flavor / domain / region.
UNAUTHENTICATED
Missing or malformed bearer token.
PERMISSION_DENIED
Token scope insufficient for the RPC.
UNAVAILABLE
Temporary server issue; retry with backoff.
INTERNAL
Programming bug; not retryable.

Pagination

Cursor-based on every List* RPC. Opaque base64 of (last_id, ordering_key). Forward-only; stable under concurrent mutation. No "go to page N". No total count.

WHERE id > :last_id ORDER BY id LIMIT :page_size   -- default 50, max 500

Observability · :9090/metrics

grpc_requests_total {rpc, code} grpc_request_duration_seconds {rpc} workspaces_total {state, region, flavor} host_capacity_used_ratio {host_id, resource} operation_duration_seconds {verb, status} river_jobs_queued · in_progress · retries_total

OpenTelemetry spans propagate into River jobs — a single trace spans the full transition: API call → job start → step execution → completion.

Chapter IV · edge
IV

Host Agent Contract.

approved 2026-04-19mtls · commands · events · reconcilecovers 13 → 15

One long-lived bidi gRPC session per host. Agents dial — no inbound holes. Semantic commands abstract qemu-img away from the plane.

13Host agents
host agent contract · 2026-04-19

Agents dial. No inbound holes.

Each bare-metal host runs a long-lived daemon that owns everything local — hypervisor, snapshots, tap devices, edge proxy. The control plane issues semantic commands (ProvisionVM, TakeSnapshot) — never qemu-img. One long-lived bidi gRPC session per host. Hosts can sit behind NAT in different clouds; only the control plane needs inbound reachability.

control plane
WorkspaceService · mTLS
agent ca · 90-day certs · revocation list
agent a
host-01us-west-1
agent b
host-02eu-central-1
agent c
host-03us-east-1

Commands

Control plane → agent dispatch. Ack confirms receipt and validation, not execution. Command command_id = operations.id.

Events

Agent → control plane telemetry. Monotonic seq per session, high-water-mark acks, durable replay buffer on reconnect.

mTLS from day one bootstrap token + enrollment URL cert CN=host_id · OU=region_id rotate at 60 days · 7-day overlap heartbeat 10s · stale @ 30s reconnect · expo-backoff cap 30s
14Commands · Events
semantic verbs · structured telemetry

Two sub-streams, independent back-pressure.

A slow consumer on events can't stall command dispatch; a heavy command can't delay heartbeats. Both bidi RPCs ride the same mTLS connection.

Commands · control plane → agent

ProvisionVM
Create the VM. Inputs: workspace_id, envelope, network config, optional snapshot_uri. Outcome: VM running, openberth healthcheck passing.
StopVM
Graceful ACPI shutdown (default 60s timeout), then force-stop. For suspend. Leaves disk in place.
StartVM
Boot an existing stopped VM. For restore from suspended. Same healthcheck gate as provision.
DestroyVM
Stop, unmount, secure-wipe disk. For archive (after snapshot verified) and delete.
TakeSnapshot
Snapshot disk (quiesced if possible), upload to object storage, return URI + checksum + size. Kind: periodic or pre_archive.
RestoreFromSnapshot
Download snapshot, verify checksum, stage as new disk. Caller follows with ProvisionVM or StartVM.
UpdateEdgeRoute
Update host edge-proxy config for a workspace. State + domains + cert states. Atomic reload.
ResizeVM
In-place cgroup adjustments on a running VM. Migration path is archive + restore, not this command.
RotateAgentCert
Issue a new cert; old cert valid for 7 days. Triggered by agent at 60 days remaining.

Events · agent → control plane

Heartbeat
Every 10s. Free capacity (vcpu/ram/disk), uptime, agent version, counts of VMs by local state.
CommandProgress
Long-running command emits progress. command_id, step name, optional numeric progress (e.g. snapshot upload %).
CommandResult
Command terminates. command_id, status (succeeded/failed), payload or error.
WorkspaceLocalEvent
Unsolicited: VM crashed and was restarted, disk-usage threshold, custom-domain cert issued/failed/expired, healthcheck flapping.
AgentStopping
Planned maintenance shutdown. Carries reason.

Flow control: agent buffer caps at 10 000 events. On overflow, routine Heartbeat and CommandProgress drop first; CommandResult and WorkspaceLocalEvent are retained preferentially. On reconnect, Inventory + buffered events bring the control plane up to date.

15Reconciliation
at session open

Neither side trusts state after a disconnect.

On every session open, agent and control plane exchange ground truth and reconcile. No special "reconcile command" type — directives are composed of the same Command messages as normal operation. The agent's command handler stays trivially uniform.

i
AgentHello. host_id, agent version, uptime, hypervisor type + version, capacity summary.
ii
Inventory. Full list of VMs on disk (workspace_id, running/stopped/missing, disk presence, observed envelope), in-flight local ops (snapshot running, provision in progress), custom domains in the edge proxy.
iii
ControlHello. Session accepted, server time, per-command ack deadline config.
iv
ReconcileDirective. Derived from comparing Inventory vs. Postgres:
  • Expected but missing: corrective ProvisionVM, or mark operations row failed if not resumable.
  • Orphans (agent has it, control plane doesn't): log + alert. Never auto-destroy on MVP — operator decides.
  • Stuck running ops: if operations.status = running but the agent has no record of the command, mark failed with reason agent_reconnected_without_completion. Upstream retries with a fresh request_id.
v
Normal command / event flow begins. All five reconciliation directives are ordinary Command messages on the same stream.
5

Invariants that make the contract boring on purpose: command_id = operations.id, ack means received not executed, reconcile uses normal commands, agents never auto-destroy, event seq is monotonic per session.