A datacenter /
by scroll.
One VM per workspace. Containers inside. A four-state lifecycle, a handful of verbs, and one Postgres to hold it all — the control plane, at a glance.
Workspaces & Lifecycle.
Each workspace is one VM. Applications live inside it as gVisor containers. Four persistent states, four transient markers — and no payment_failed to be seen.
Four sizes, arbitrary headroom.
A flavor defines the VM's resource envelope. App count is a marketing approximation — not a contract. Sizing: 1 GB floor + apps × observed footprint + build headroom.
Trial isn't a flavor — it's a Hobby VM with upstream metadata. maxDeploys is a safety valve, not a sold quantity. Ceilings: Hobby 500 · Pro 1000 · Team 2000.
active → suspended → archived → deleted.
Four persistent states; four transient markers — suspending, archiving, restoring, deleting. active covers everything running — trialing, paying, overdue-in-grace. Infra is blind to payment_failed.
| state | vm | local disk | snapshot | url response |
|---|---|---|---|---|
| active | running | attached | periodic 24h | openberth serves |
| suspended | powered off | retained | last periodic | edge: suspended page |
| archived | destroyed | wiped | fresh, object storage | edge: archived page |
| deleted | gone | gone | purged | 404 / tombstone |
Same request_id, same outcome.
All transitions accept a client-supplied request_id — same id, same outcome, no duplicate work. Conflicting in-flight transitions return 409. We do not queue, cancel-forward, or auto-retry.
Archive is minutes, not instant.
Durations are published so the frontpage can size its policy timers realistically. The bars below are log-scaled against a 30-minute ceiling.
Default snapshot cadence. Incremental, backed by a periodic full. Retention N matches the archive window; enterprise may opt into tighter cadence. RPO promise: up to 24 hours of data loss on host failure during active.
Upstream owns when. Infra owns how.
The edge answers without the VM.
A host-level edge proxy (Caddy with a state-aware config, reloaded on every transition) knows the state of every workspace on its host. DNS is not manipulated per state. For archived workspaces whose VM has been destroyed, the edge retains a routing rule sufficient to serve the tombstone; restore may land on a different host — routing updates to point at the new host.
tap down · disk retained
slot freed · snapshot in S3
row pseudonymized
Postgres Schema.
Durable state in one database. A River-style job runner lives in-process; our tables own the logical record, River's own the execution mechanics.
All state, one database.
Long-running transitions are driven by an in-process River-style job runner — no separate workflow engine. Our tables own workspace state and operation records; River's tables own execution mechanics. They join via operations.river_job_id.
regions catalog
- idtext · "us-west-1"
- display_nametext
- stateactive / retiring
- created_attimestamptz
hosts bare-metal
- iduuid
- region_idregions
- fqdntext · unique
- total_vcpu / ram / diskint · int · int
- statehealthy / draining / retired
- last_heartbeat_attimestamptz
workspaces main entity
- iduuid
- external_workspace_idtext
- external_user_idtext
- display_nametext
- region_id / host_idregions / hosts
- flavorhobby · pro · team · custom
- vcpu / ram_gb / disk_gbenvelope
- stateactive · suspended · archived · deleted
- current_operation_idoperations (deferred)
- snapshot_interval_secondsdefault 86400
- snapshot_retention_countdefault 30
workspace_domains edge routing
- iduuid
- workspace_idworkspaces
- domaintext · unique
- kinddefault_subdomain / custom
- cert_statepending · issued · failed · expired
- verified_at / cert_issued_attimestamptz
operations transitions
- iduuid
- workspace_idworkspaces
- verbsuspend · archive · restore · delete
- request_idUNIQUE(ws, req)
- statuspending · running · ok · failed · rolled_back
- step_statejsonb · resumable
- river_job_idbigint
- requested/started/completed_attimestamptz
snapshots object store
- iduuid
- workspace_idworkspaces
- object_uritext
- size_bytes / checksumbigint · text
- toolkopia · restic · qemu-img
- kindperiodic · pre_archive
- verified_atonce checksum verified
audit_log append-only · survives delete
- idbigserial
- workspace_idnullable
- former_workspace_iduuid · set after delete
- event_typee.g. transition.archive.succeeded
- event_datajsonb · no PII
- actorsystem · api:<caller> · admin:<user>
Legend · ◆ primary key · → foreign key · ·pii scrubbed atomically on delete.
Never archiving in the state column.
workspaces.state only holds persistent destinations. Transient verbs live on operations.status. A denormalized pointer current_operation_id gives readers O(1) access to "is this workspace busy?" without scanning operations.
Allocated capacity is derived, not stored.
No materialized allocated_* columns on hosts. Small fleet, cheap query, zero drift risk. Suspended workspaces still count — their disk is on the host.
-- find a host in region $1 with room for $2 vcpu / $3 ram / $4 disk SELECT h.id FROM hosts h WHERE h.region_id = $1 AND h.state = 'healthy' AND h.total_vcpu - COALESCE((SELECT SUM(w.vcpu) FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $2 AND h.total_ram_gb - COALESCE((SELECT SUM(w.ram_gb) FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $3 AND h.total_disk_gb - COALESCE((SELECT SUM(w.disk_gb) FROM workspaces w WHERE w.host_id = h.id AND w.state IN ('active','suspended')), 0) >= $4 FOR UPDATE SKIP LOCKED LIMIT 1;
FOR UPDATE SKIP LOCKED lets concurrent schedulers pick different hosts. The workspace INSERT in the same transaction reserves the slot.
Control Plane API.
Consumed only by backends. Every mutating RPC returns an Operation immediately; callers poll. A closed, versioned reason set lets clients branch.
gRPC only. No REST transcoding.
All consumers are backends — frontpage, admin tooling, host agents. No browsers, no third-party integrations. One schema/tooling stack north- and southbound. grpcurl covers terminal debugging. A single WorkspaceService hosts every RPC; admin RPCs are gated by credential scope.
Logging contract: the authorization metadata key MUST be redacted at every log site. Enforced via middleware; unit-tested.
Return an Operation. Callers poll.
Every mutating RPC returns an Operation immediately. No server-streaming progress in v1. request_id lives on the request message (not metadata) — DB enforces idempotency via UNIQUE (workspace_id, request_id).
Workspace lifecycle
Domain / routing
Operations
Fleet
message Operation { string id = 1; string workspace_id = 2; OperationVerb verb = 3; // CREATE · SUSPEND · ARCHIVE · RESTORE · DELETE · RESIZE OperationStatus status = 4; // PENDING · RUNNING · SUCCEEDED · FAILED · ROLLED_BACK map<string,string> step_state = 5; string error = 6; google.protobuf.Timestamp requested_at = 7; google.protobuf.Timestamp started_at = 8; google.protobuf.Timestamp completed_at = 9; }
A closed, versioned reason set.
Domain semantics ride on google.rpc.ErrorInfo. reason values are a closed, versioned set — clients branch on them. New reasons are additive; existing reasons never change meaning.
Pagination
Cursor-based on every List* RPC. Opaque base64 of (last_id, ordering_key). Forward-only; stable under concurrent mutation. No "go to page N". No total count.
WHERE id > :last_id ORDER BY id LIMIT :page_size -- default 50, max 500
Observability · :9090/metrics
OpenTelemetry spans propagate into River jobs — a single trace spans the full transition: API call → job start → step execution → completion.
Host Agent Contract.
One long-lived bidi gRPC session per host. Agents dial — no inbound holes. Semantic commands abstract qemu-img away from the plane.
Agents dial. No inbound holes.
Each bare-metal host runs a long-lived daemon that owns everything local — hypervisor, snapshots, tap devices, edge proxy. The control plane issues semantic commands (ProvisionVM, TakeSnapshot) — never qemu-img. One long-lived bidi gRPC session per host. Hosts can sit behind NAT in different clouds; only the control plane needs inbound reachability.
Commands →
Control plane → agent dispatch. Ack confirms receipt and validation, not execution. Command command_id = operations.id.
Events ←
Agent → control plane telemetry. Monotonic seq per session, high-water-mark acks, durable replay buffer on reconnect.
Two sub-streams, independent back-pressure.
A slow consumer on events can't stall command dispatch; a heavy command can't delay heartbeats. Both bidi RPCs ride the same mTLS connection.
Commands · control plane → agent
Events · agent → control plane
Flow control: agent buffer caps at 10 000 events. On overflow, routine Heartbeat and CommandProgress drop first; CommandResult and WorkspaceLocalEvent are retained preferentially. On reconnect, Inventory + buffered events bring the control plane up to date.
Neither side trusts state after a disconnect.
On every session open, agent and control plane exchange ground truth and reconcile. No special "reconcile command" type — directives are composed of the same Command messages as normal operation. The agent's command handler stays trivially uniform.
- Expected but missing: corrective ProvisionVM, or mark operations row failed if not resumable.
- Orphans (agent has it, control plane doesn't): log + alert. Never auto-destroy on MVP — operator decides.
- Stuck running ops: if operations.status = running but the agent has no record of the command, mark failed with reason agent_reconnected_without_completion. Upstream retries with a fresh request_id.
Invariants that make the contract boring on purpose: command_id = operations.id, ack means received not executed, reconcile uses normal commands, agents never auto-destroy, event seq is monotonic per session.