Architecture
stratum is a Cargo workspace of six crates. The flow is config → desired resources → diff against prior state → plan → apply via providers → new state → post-apply drift check.
Workspace layout
crates/
core/ stratum-core
config/ stratum-config
cli/ stratum-cli (the `stratum` binary)
providers/
ssh/ stratum-provider-ssh
docker/ stratum-provider-docker
system/ stratum-provider-system
git/ stratum-provider-git
core has no provider dependencies. config depends on core only for its DesiredResource / ResourceAddr types. Providers depend on core for the trait. The CLI wires everything together.
Core types
All types live in crates/core/src/lib.rs.
ResourceAddr
#![allow(unused)] fn main() { struct ResourceAddr { kind: String, name: String } }
Renders as <kind>.<name>. Used as the key in the state map and as the user-facing identifier in CLI output.
ResourceState
#![allow(unused)] fn main() { struct ResourceState { addr: ResourceAddr, provider: String, attrs: serde_json::Value, } }
One per tracked resource. attrs is whatever the provider returned from its last create / update.
State
#![allow(unused)] fn main() { struct State { version: u32, // file format version, default 1 resources: BTreeMap<String, ResourceState>, // keyed by addr.key() } }
On-disk JSON, loaded with State::load(path) and saved with State::save(path). Default path is .stratum/state.json. A missing file is treated as an empty state. The parent directory is created on save.
Action and FieldChange
#![allow(unused)] fn main() { enum Action { NoOp, Create, Update { changes: Vec<FieldChange> }, Delete, } struct FieldChange { field: String, from: Value, to: Value } }
Action::symbol() returns the two-character prefix used in plan output ( , +, ~, -).
Observed and Drift
#![allow(unused)] fn main() { enum Observed { Present(Value), // resource exists; attrs normalized to state shape Absent, // confirmed gone on the host Unknown(String), // provider can't tell (carries a reason) } struct Drift { changes: Vec<FieldChange>, missing: bool, // state says exists but observed == Absent unreadable: Option<String>, // observed == Unknown OR read returned Err } }
Drift is per-resource, populated by refresh_plan. Drift::is_clean() is true when all three fields are empty/false/none.
PlannedResource and Plan
#![allow(unused)] fn main() { struct PlannedResource { addr: ResourceAddr, provider: String, desired: Value, prior: Value, action: Action, drift: Option<Drift>, // None unless refresh_plan was called } struct Plan { steps: Vec<PlannedResource> } }
Plan::summary() returns a PlanSummary { create, update, delete, noop, drifted, missing, unreadable }. Plan::was_refreshed() is true iff any step has a Some(drift).
DesiredResource
#![allow(unused)] fn main() { struct DesiredResource { addr: ResourceAddr, provider: String, attrs: Value, } }
The output of the config evaluator and the input to build_plan.
Plan / apply flow
- Resolve sources. If
-n NAMEis set, the CLI loads the manifest, extracts the named namespace, and resolves the config list to[manifest, ...namespace.configs]and the state path to.stratum/<name>.json(or the namespace's explicitstate =). Otherwise the configs and state path come straight from-c/-s. See Manifest discovery. - Parse config.
stratum_config::load_files(paths)(orload_filefor a single path) runs lex → parse on each file, tags each block with its source path, concatenates into oneDocument, and runs a multi-pass extract: hosts → secrets → namespaces → providers + resources. The result is anExtracted { hosts, providers, resources, secrets, namespaces, redaction_map }. Anysystem_filewithcontent_fileis inlined during this step — seecontent_file. Duplicate hosts/providers/secrets/resources/namespaces across files are hard errors that name both paths. - Cross-namespace check. In namespace mode only, re-load every sibling namespace's configs and check the current namespace's
docker_containerresources for port and container-name collisions. See Cross-namespace validator. - Load state. In bundle mode,
State::load(state_path). In namespace mode,State::load_merged(state_path, _shared.json)— see Split state. Missing file → default empty state. - Build plan.
build_plan(extracted.resources, &state) -> Result<Plan>:- Run two planner-side validators before classification: a port-conflict check on every
docker_container.ports, and adepends_ontopo sort that orders create / update steps and rejects cycles / unknown refs. - For each desired resource (in topo order): lookup prior by addr key. For kinds in
SECRET_CONTENT_TO_SHA, normalize desired before diffing (see Secret-content normalization on plan). Rundiff_observed(prior.attrs, normalize_for_plan(kind, desired.attrs)). No diff →NoOp. OtherwiseCreate(no prior) orUpdate { changes }. - For each prior resource not in desired →
Delete, in forward topo order over state-residentdepends_onedges.
- Run two planner-side validators before classification: a port-conflict check on every
- Optional refresh. With
plan --refresh, runrefresh_plan(&mut plan, ®istry)to annotate every non-create step with observed drift. - Print plan. Symbol per resource, fields-changed lines for updates, drift annotations if refreshed, summary at the end.
- Confirmation gate. Without
-y, exit here.applywithout-yis identical toplanplus the "Apply? Re-run with -y to execute" line. - Build registry. Instantiate all providers. (No shipped provider reads its
provider { ... }block today.) - Execute. For each plan step (in topo order), look up the provider by kind prefix and call
create/update/delete. After everydocker_containercreate or update, run the post-apply readiness wait before moving to the next step. Update state with the returned attrs (or remove the entry on delete). - Save state. In bundle mode,
state.save(state_path). In namespace mode,state.save_split(state_path, _shared.json)—_stratum_*addresses route to the shared file, everything else to the namespace's file. - Post-apply self-check. Reload the config, rebuild the plan against the new state, run
refresh_planagain, and print one summary line:post-apply drift: cleanorpost-apply drift: N differ, M missing, K unreadable — run 'stratum plan --refresh' to see details.
The Provider trait
#![allow(unused)] fn main() { #[async_trait] trait Provider: Send + Sync { fn name(&self) -> &str; fn kinds(&self) -> &[&'static str]; fn configure(&mut self, _attrs: &Value) -> Result<()> { Ok(()) } async fn create(&self, kind: &str, name: &str, attrs: &Value) -> Result<Value>; async fn update(&self, kind: &str, name: &str, prior: &Value, attrs: &Value) -> Result<Value>; async fn delete(&self, kind: &str, name: &str, prior: &Value) -> Result<()>; async fn read(&self, _kind: &str, _name: &str, _prior: &Value) -> Result<Observed> { Ok(Observed::Unknown("provider does not implement read".into())) } } }
name()is the lookup key in the registry.kinds()lists every kind the provider owns. The registry'sfor_kindscans providers and returns the first match.configureis called once at apply time, with theprovider "<name>" { ... }body. Default impl ignores it. No shipped provider implements it today.create,update,deletereturn the newattrsto record in state (or()for delete). The returned value is what the next plan will diff against.readmust be non-destructive — it's a query, not a side effect. Default impl returnsUnknown. Implementations should normalize the returnedValueto the same shape as state attrs.
The diff algorithm
There are two diff functions in core. They serve different purposes.
diff (symmetric, used by Action::Update legacy path)
#![allow(unused)] fn main() { fn diff(prior: &Value, desired: &Value) -> Vec<FieldChange> }
A recursive walk over JSON values:
- If
prior == desiredexactly, return no changes. - If both are JSON objects, walk their union of keys (sorted, deduplicated). For each key, recurse with the dotted path
<prefix>.<key>. - Otherwise, emit a single
FieldChange { field: <path>, from: prior, to: desired }. Thefieldis"<root>"when the diff lives at the document root.
diff_observed (one-sided, used by build_plan and refresh_plan)
#![allow(unused)] fn main() { fn diff_observed(prior: &Value, observed: &Value) -> Vec<FieldChange> }
Used both by build_plan (comparing state-stored prior against desired config) and by refresh_plan (comparing state against live observation). Rules differ from diff:
- State-only fields are ignored. Only keys present in
observedare walked. A field that's inpriorbut not inobserveddoes not generate drift. This is what lets providers store extra fields (container_id,sha256, etc.) without polluting plans. - Missing key vs empty container = no drift. If
priorhas no keykbutobservedhask: {}ork: [], that's not drift. Same forprior: nullvsobserved: {}/[]. - String arrays are compared as sets.
["a", "b"]and["b", "a"]are equal. Non-string arrays are compared by order. - Added keys in observed → flagged. A key in
observedbut not inpriorshows up asfrom: null, to: <value>.
The provider's read implementation is responsible for trimming the observed value to a shape that mirrors state, so noise doesn't leak through. For example, docker_container strips com.docker.* labels and intersects with the state's label key set.
Drift detection
refresh_plan(&mut plan, ®istry) annotates each plan step with observed drift from live reality.
#![allow(unused)] fn main() { async fn refresh_plan(plan: &mut Plan, registry: &Registry); }
Rules:
Action::Createis skipped — there's no prior state to read.- Sequential per resource. SSH round-trips are I/O-bound but ~10 resources doesn't justify parallelism yet.
- Per-resource errors are caught, not propagated. They become
Drift::unreadable = Some("read failed: ...").refresh_planitself never returnsErr. - The provider's
readis called with(kind, name, &step.prior). The returnedObservedis mapped:Present(observed)→drift.changes = diff_observed(&step.prior, &observed)Absent→drift.missing = trueUnknown(reason)→drift.unreadable = Some(reason)
PlanSummary counts:
- drifted — count of steps where
drift.changesis non-empty. - missing — count of steps where
drift.missing == trueand the action is notDelete. (ADeletestep whose resource is already gone is annotated(already gone on host; delete will noop)instead — that's not drift.) - unreadable — count of steps where
drift.unreadable.is_some().
Planner-side validators
Port-conflict validator
Before classifying steps, build_plan walks every docker_container.ports value across the desired set and checks for (host, ip, host_port) collisions. Two resources binding the same port on the same host is a hard error at plan time, naming both. A 0.0.0.0:N bind symmetrically collides with 127.0.0.1:N — the wildcard bind subsumes the loopback one.
Random ports ("5432" — docker picks the host port) are skipped silently. Port ranges ("8000-8010:8000-8010") get a warning but are not validated. Unrecognized port shapes are skipped to keep the validator forward-compatible.
depends_on topo sort
The planner runs a stable Kahn's-algorithm topo sort over the docker_container.depends_on edges (see depends_on). Properties:
- Stable. Resources without edges keep their input (file) order. Where ties exist, a
BTreeSetready-set picks them in lexicographic addr order. - Implicit
_stratum_*resources stay at the front. They carry no edges and havein_degree = 0, so they land first. - Cycles are a hard error citing the cycle path.
- Unknown references are hard errors citing both the source and the missing target.
The topo order applies to Create and Update steps; Delete order is computed separately.
Secret-content normalization on plan
For kinds where a content field carries a secret value, state stores only sha256 (the plaintext is unrecoverable from state) but desired carries the full plaintext at plan time. A naive diff_observed(prior, desired) would emit content: null -> "<plaintext>" on every plan, leaking the value into CLI output.
build_plan normalizes desired before diffing. The kinds that opt into this live in a const SECRET_CONTENT_TO_SHA: &[(&str, &str)]:
| kind | content field |
|---|---|
system_secret_file | content |
For each entry, normalize_for_plan(kind, attrs) clones attrs, removes the named field, and inserts sha256: <hex> derived from its UTF-8 bytes. Diff then compares sha against sha — exactly the same shape state holds. Plaintext never reaches the diff.
This is the inverse half of the kind's own apply-time unchanged check (which compares the same sha against prior state to decide whether to re-upload). The two together guarantee that a plaintext secret never appears in plan output, in state, or in apply logs.
Plan-level secret redaction
After build_plan returns and after refresh_plan runs, the CLI calls Extracted::redact_plan(&mut plan) once before printing. This walk does two things:
- Apply substring redaction to every step's
desired,prior, and per-FieldChangefrom/to. A leaf string containing a known secret plaintext (introduced via${...}interpolation) gets each occurrence replaced with the inline<secret:NAME:sha256:HEX>marker. Exact-match leaves are replaced with the object marker, same as everywhere else. - Drop redaction-cancelled changes. When state holds the inline substring marker and observed returns plaintext, both sides collapse to the same marker after the walk. Any
FieldChangewherefrom == topost-redaction is dropped. If anAction::Update's changes list becomes empty, the step is downgraded toAction::NoOp— drift that was only a substring-marker-vs-plaintext difference disappears entirely.
This is what stops plan --refresh from emitting spurious updates on every secret-bearing interpolated field. See Secrets: substring redaction.
Post-apply readiness wait
After every successful docker_container create or update, the planner pauses before moving on to the next step (which may be a dependent declared via depends_on). The wait lives in the CLI in post_apply_wait:
- If
desired.healthcheckis present, polldocker inspect --format '{{.State.Health.Status}}' <name>once a second, up to 60 polls. Terminal statuses arehealthy(proceed),unhealthy(fail the apply), or empty /noneon the first poll (no health check at the docker level — proceed).startingand other interim values keep polling. - Otherwise, sleep 500ms. This is cosmetic — docker often needs a beat to wire networks and volumes before something else pokes the container.
Non-docker_container steps return immediately. The provider's own create / update is synchronous: git_repo clones return when done, system_secret_file returns when the SSH upload completes.
The poll loop itself is in core (poll_container_health), separated from SSH plumbing so it's unit-testable with a mocked inspector.
Delete ordering
build_plan emits delete steps in forward topo order over state-resident depends_on edges. For two resources X and Y where X depends_on Y at runtime, X is torn down before Y — the dependent goes first so the dependency is still serving while it shuts down.
Resources without recorded depends_on edges fall through with in_degree = 0 and end up before any edged resources, in reverse-iteration order of the state BTreeMap (which preserves the prior file-order-independent behavior). This keeps the heuristic close to "leaves before roots" for hand-written configs even when no depends_on is declared.
depends_on is recorded in state at create / update time and survives across apply runs, so a delete computed against state still knows the edges the resource was declared with — even when the resource is no longer in config.
Implicit per-host resources
For every host block in the merged document, extract injects three implicit resources before any user-declared ones, addressed under the _stratum_ prefix:
| addr | kind | purpose |
|---|---|---|
ssh_exec._stratum_swap_<host> | ssh_exec | Creates a 4 GB /swapfile, enables it, persists in fstab. |
system_file._stratum_sshd_oom_<host> | system_file | Drops /etc/systemd/system/ssh.service.d/oom.conf with OOMScoreAdjust=-1000. |
ssh_exec._stratum_sshd_reload_<host> | ssh_exec | systemctl daemon-reload && systemctl restart ssh. |
The first two exist so that under memory pressure the kernel does not kill sshd — which would lock the operator out of recovery. The third applies the drop-in. They are stable across versions and live at the front of the desired list (in_degree 0), so they apply before any user resource on the host.
In namespace mode they are routed to _shared.json so multiple namespaces sharing a host don't each try to recreate them. In bundle mode they share the single state file with everything else.
The _stratum_ prefix is reserved. User-declared resources should not use it.
Manifest discovery (namespace mode)
When -n NAME is set, the CLI resolves the config + state paths as follows. See Namespaces for the syntax.
- Locate the manifest. If
--manifest PATHwas passed, that path is used. Otherwise the CLI requires./stratum.stratto exist; if it doesn't, the command errors. - Load the manifest. Runs
stratum_config::load_file(manifest), producing anExtractedwith one or morenamespacedeclarations. - Look up the namespace. If the named namespace isn't in the manifest, error with the list of known namespaces.
- Resolve configs. The merged list is
[manifest, ...ns.configs]. The manifest is always first so its top-levelhost/secret/providerblocks are visible to every per-namespace file. Eachconfigsentry is absolutized at parse time against the manifest's directory. - Resolve state. Priority order: explicit
-son the command line, then the namespace's body-levelstate =, then.stratum/<name>.json.
Passing -c together with -n is a hard error — the namespace's configs = [...] is the config list, and a -c override would silently shadow it. Bundle mode (no -n) is unchanged by namespace support.
Cross-namespace validator
Namespace mode's plan and apply run a sibling-collision check before classification. The check exists because build_plan operates inside one namespace's view of the world — it has no visibility into what other namespaces declare — so two namespaces could each plan a docker_container binding the same (host, host_port) and only discover the conflict at apply time, when one fails over a port already taken by the other.
The check, in validate_cross_namespace:
- Re-loads the manifest (cheap; it has no resources).
- For each sibling namespace (every one except the current), loads its configs with
LoadOptions::allow_unresolved_secrets = trueso a missing env var in some unrelated namespace doesn't block planning the current one. - Walks every
docker_containerin every sibling, collecting:- Port claims. Each
portsentry is parsed for the host-port half ofH:CorIP:H:C. Ranges (8000-8010:...) and bare-port shapes (where docker picks the host port) are skipped. - Name claims. The container's
nameattribute, falling back to the resource's label.
- Port claims. Each
- Checks every
docker_containerin the current namespace against the collected claims, erroring on the first(host, port)or(host, name)collision and naming both the offending current-namespace address and the sibling that owns the claim.
The validator is skipped entirely in bundle mode. Within a single namespace, the existing planner-side port-conflict validator catches collisions within the same desired set. The cross-namespace validator is strictly the inter-namespace layer above it.
The sibling loader uses allow_unresolved_secrets = true defensively — it's only collecting addresses, ports, and names, none of which depend on secret plaintext. If a sibling load fails for any other reason, the error is logged and that sibling is skipped (the plan still proceeds), so a broken sibling doesn't gate apply of an unrelated namespace.
Split state (namespace mode)
In namespace mode the state on disk is two files instead of one:
.stratum/
<name>.json # user-declared resources for namespace `<name>`
_shared.json # implicit per-host _stratum_* resources
State::save_split(ns_path, shared_path) walks self.resources and routes each entry by addr name: anything starting with _stratum_ goes to _shared.json, everything else to <name>.json. Both files are written every save (with parent dirs created), even when one side is empty — that keeps the next load predictable.
State::load_merged(ns_path, shared_path) is the inverse. It loads both files and unions their resources maps, with the namespace's entry winning any addr.key() collision (the more recently touched of the two, since the active scope just ran). Missing files become empty state (matches load).
Bundle mode keeps using the single-file State::load(path) and State::save(path). The CLI picks the right pair via the -n flag — load_state / save_state in crates/cli/src/main.rs switch on whether a shared path is set.
The split is what lets two namespaces targeting the same host co-exist without each trying to own the per-host tuning resources. First namespace applies: _stratum_swap_*, _stratum_sshd_oom_*, _stratum_sshd_reload_* land in _shared.json. Second namespace plans: load_merged pulls them back from the shared file into its working state, so the new plan sees them as no-op. Without the split, the second apply would see them missing from its state file and recreate them, churning the swap file and restarting sshd on every cross-namespace apply.
State file shape
{
"version": 1,
"resources": {
"docker_container.traefik": {
"addr": { "kind": "docker_container", "name": "traefik" },
"provider": "docker",
"attrs": {
"host": "root@192.0.2.10",
"image": "traefik:v2.11",
"container_id": "abc123...",
"...": "..."
}
},
"system_package.docker": { "...": "..." }
}
}
Resources are keyed by <kind>.<name> in a BTreeMap, so the on-disk order is deterministic (lexicographic). The file is overwritten in full on every successful apply.
Secret markers in state
When a resource attr resolves from a secret ref, the provider receives plaintext but state stores a redaction marker:
{
"env": {
"POSTGRES_PASSWORD": {
"__secret": "pg_password",
"__secret_sha256": "sha256:f7c3bc1d808e04..."
}
}
}
The marker is written by Extracted::redact_into, called between every provider return and state.upsert. diff and diff_observed are marker-aware (see core::secret_compare): a marker compares equal to plaintext when the plaintext's hash matches the marker's __secret_sha256, and a marker-vs-marker compare uses only the hashes. This is what keeps --refresh from showing perpetual drift on secret-bearing fields. The CLI's render function prints markers as <secret:NAME sha:abc123> — six hex chars, enough to spot a rotation, not enough to attack offline.