Dispatch lifecycle events
What actually happens between pangolin dispatch run and the JSON you get
back. Useful for reading worker stdout, diagnosing failures, and
understanding which layer to blame when something breaks.
The authoritative source is packages/pangolin-worker/src/entrypoint.ts —
its 14-step prologue is the worker’s runbook. This doc is the readable
overview.
The two halves
Section titled “The two halves”A dispatch has two halves:
- Orchestrator side (your machine, where
pangolinruns): resolves names → registered hashes, picks aComputeProvider, asks the provider to start a worker container, then awaits its exit. - Worker side (inside the container): fetches the bundles, overlays
them onto a workspace, runs an optional
pangolin-setup.sh, hands off to theRuntimeAdapter(claude binary), and emits a terminal lifecycle event.
Most of what you see in stdout is the worker’s structured-log stream.
The 14 worker steps (collapsed)
Section titled “The 14 worker steps (collapsed)”1. parse env vars ← PANGOLIN_* env tells the worker what to fetch2. load runtime adapter ← .js plugin per `PANGOLIN_ADAPTER` (claude-code by default)3. fetch + integrity-verify bundles ← StorageProvider.get each ref, sha256-check4. wire callback HMAC + LifecycleEmitter5. emit `dispatch.started`6. overlay capability bundles ← writes files to <workspace>/, merge rules per §6.37. resolve env-bundle secrets ← Secrets Manager lookups for `secrets:` entries8. merge env ← base + bundles + per-dispatch secrets9. run pangolin-setup.sh ← if present at workspace root, bounded by timeout10. start channel subscription ← background poll for inbound channel messages11. invoke runtime adapter ← claude --print <prompt>, captures stdout/stderr12. stop channel subscription13. resolve needs_input sentinel ← stat <workspace>/.pangolin/needs_input.json14. emit terminal event ← dispatch.finished / .needs_input / .failed / .cancelledEverything after step 5 is bracketed by the appropriate lifecycle event so the orchestrator can attribute failures.
Steps 1–10 and 12–13 are the worker’s chassis — fetch, overlay, env, setup, channels. Step 11 plus the success-path capture/verify/seal work is the payload, and the payload is now executed by a block-pipeline runner (next section). The step list above still describes exactly what runs by default.
The block-pipeline runner
Section titled “The block-pipeline runner”The worker’s execution core is a runner of typed block-pipelines. The legacy hardcoded steps did not change behavior — they became the default pipeline, built per-dispatch:
[ agent → capture(patch) → script(verify lens, if subagent.verify) → capture(outputs) ] + sealflowchart LR
subgraph DEF["default pipeline — built per-dispatch"]
A["agent<br/>runtime adapter"] --> C1["capture<br/>what: patch"]
C1 --> V["script · verify lens<br/>only if subagent.verify —<br/>report-only, never fails the dispatch"]
V --> C2["capture<br/>what: outputs"]
end
C2 --> SEAL["seal — structural terminal step,<br/>always appended by the runner itself;<br/>never authored in a spec"]
DECL["declared pipeline<br/>pangolin pipeline register → pinned on inputs.pipeline<br/>re-validated by the worker · per-block blocks[] evidence"]
DECL -.->|"replaces the default"| DEF
This default is byte-identical to the pre-runner worker — golden tests pin the output sentinel’s bytes against the legacy path, so dev-pack consumers see no hash changes. Three block kinds exist:
| Kind | What it runs | Failure semantics |
|---|---|---|
agent | The registered subagent’s runtime adapter (claude --print …) | Non-zero exit aborts the pipeline (provider-failed); a throw stays worker-failed, as before. |
script | A shell command via the bounded-command primitive (time-bounded, output-capped, secret-redacted), in the workspace with the firewalled merged env | lens: 'gate' (default): non-zero exit / timeout aborts the pipeline → the dispatch fails with provider-failed, exit code carried. lens: 'verify': report-only — never fails the dispatch (the Self-verify contract, same primitive). |
capture | what: 'patch' captures the workspace diff; what: 'outputs' content-addresses files under outputs/ | n/a — capture is evidence collection. |
seal is structural, not a block: the runner itself always appends it as
the terminal step. It is never authored in a spec (validatePipelineSpec
rejects a literal seal block), and no caller can omit or reorder it — every
successful pipeline ends with the sealed output sentinel.
A declared pipeline replaces the default: register a spec with
pangolin pipeline register (or
client.pipeline.register),
then pin its ref on the work item’s reserved inputs.pipeline key. The pinned
spec rides the existing bundle channel (PANGOLIN_BUNDLE_REFS_JSON), is
integrity-verified against its content hash, and is re-validated by the
worker before running — a parse or validation failure routes through the
established integrity-failed path, like any malformed bundle. Declared
pipelines additionally write per-block evidence into the output sentinel as
blocks[] (kind, ordinal, status, exit code, duration per block); the implicit
default writes the legacy sentinel unchanged. The pipeline ref itself is sealed
into the dispatch manifest (pipelineRef), so “this exact pipeline ran” —
every block, command, and lens — is provable from the audit bundle.
Model selection and cost evidence
Section titled “Model selection and cost evidence”Requested model in the dispatch manifest
Section titled “Requested model in the dispatch manifest”Before firing the worker container, the orchestrator resolves which model was
requested and seals it into the dispatch manifest (executorManifest.model.id).
The precedence chain (highest to lowest):
- The subagent definition’s stored
modelfield (read at fire time from the subagent blob viaresolveLatest). - The
DispatchExecutor’s configureddefaultModel. - Unset —
executorManifest.model.idis sealed as''.
An empty string is treated the same as unset at every step. The resolution is
best-effort: if the subagent blob is unresolvable or unparseable, the executor
falls back to defaultModel (which may be undefined) and the fire proceeds.
The PANGOLIN_MODEL env var is only injected into the worker when the resolved
model is a non-empty string.
For the level vocabulary (fast / standard / max and pass-through
provider-native ids), see the
model field and level vocabulary table.
Actual usage in the output sentinel
Section titled “Actual usage in the output sentinel”After the runtime adapter exits, the worker seals an OutputSentinel to
.pangolin/output.json (and uploads it to the per-dispatch dispatch-record URI).
The sentinel optionally carries an additive usage block:
interface RuntimeUsage { models: string[]; // actual model ids that served the invocation costUsd?: number; turns?: number; durationMs?: number; // MODEL time as reported by the runtime CLI}usage is best-effort: it is absent whenever the runtime’s output cannot
be parsed. Its absence leaves the sentinel byte-identical to the pre-usage
shape; when present it is additive and backward-compatible — older readers
ignore the extra key.
Three durations distinguished
Section titled “Three durations distinguished”Three different durationMs values appear in the response chain; they measure
different things:
| Field | What it measures |
|---|---|
usage.durationMs (in the sentinel) | Model time as reported by the runtime CLI for that invocation. |
DispatchResult.durationMs | Worker wall time (container start to result collection). |
BlockOutcome.durationMs (declared pipelines only) | Wall time for a single block within the pipeline. |
usage is a capture-only boundary: it is sealed into the sentinel and
readable from the dispatch record, but it is not forwarded into
ExecutionResult (the orchestrator’s normalized result type). Use the
dispatch record directly to read cost evidence after a run.
Self-verify (optional)
Section titled “Self-verify (optional)”If the subagent declares a verify.command (via
client.subagent.register({ verify: { command, timeout } })), the worker runs
that command over the agent’s edit after the agent finishes and before the
workspace is sealed, and records { passed, report, durationMs } into the
output sentinel. It is surfaced on the dispatch result and on the orchestrator’s
status / watch for that item.
It is report-only: a failed verify does not change the dispatch outcome
(no dispatch.failed) — it is evidence so an operator reads green/red without
re-running by hand. The patch is captured before verify runs, so the verify
command’s build artifacts (node_modules, dist/, …) never pollute the sealed
patch. Registered secrets are redacted from the captured report.
The command is language-agnostic — whatever shell string the subagent declares
(npm test, dotnet test, cargo test, pytest, …), run in the workspace. Its
toolchain must be present in the worker image or installed by pangolin-setup.sh.
The worker emits a verify.ran event:
{"kind":"verify.ran","dispatchId":"...","passed":true,"durationMs":548}The 6 lifecycle events (closed vocabulary)
Section titled “The 6 lifecycle events (closed vocabulary)”| Event | Meaning | Worker exit code |
|---|---|---|
dispatch.accepted | Orchestrator validated names + resolved refs; worker has not started yet | n/a |
dispatch.started | Worker container booted, runtime adapter loaded, ready to overlay | n/a |
dispatch.finished | Adapter exited 0, no needs_input sentinel | 0 |
dispatch.needs_input | Adapter wrote a valid needs_input sentinel; orchestrator should re-dispatch with the answer | 0 |
dispatch.failed | Anything else — see failure reasons below | non-zero |
dispatch.cancelled | pangolin dispatch cancel <id> was honored mid-flight | n/a |
The vocabulary is intentionally closed. Future kinds would require an ADR amendment (see ADR-0004 — lifecycle vocabulary closed at six).
Ordered across the worker’s steps, the six events and the four
dispatch.failed reason branch points look like this:
stateDiagram-v2
[*] --> dispatch_accepted: orchestrator validated names + resolved refs
dispatch_accepted --> dispatch_started: step 5 — worker booted, adapter loaded
dispatch_started --> dispatch_finished: step 14 — adapter exit 0, no sentinel
dispatch_started --> dispatch_needs_input: step 13 — valid needs_input sentinel
dispatch_started --> dispatch_cancelled: cancelled by caller
dispatch_started --> dispatch_failed: reason → (below)
state dispatch_failed {
[*] --> integrity_failed: step 3 — bundle sha256 mismatch / overlay
[*] --> fetch_failed: step 4 / 7 — secret ref resolution failed
[*] --> worker_failed: step 1b/2/9/13 — storage/adapter/setup/sentinel
[*] --> provider_failed: step 11 — runtime adapter exited non-zero, no sentinel
}
dispatch_finished --> [*]
dispatch_needs_input --> [*]
dispatch_cancelled --> [*]
dispatch_failed --> [*]
The diagram follows the code (packages/pangolin-worker/src/entrypoint.ts):
fetch-failed covers both the step-4 callback-HMAC-key resolution and the
step-7 env-bundle secret resolution, and worker-failed is the catch-all for
several infra steps (storage construction 1b, adapter load 2, setup-script 9,
and a malformed/oversized needs_input sentinel 13) — the single-step mappings in
the table above are the most common case for each reason, not the only one.
What dispatch.failed.reason means
Section titled “What dispatch.failed.reason means”| Reason | Maps to | What it means |
|---|---|---|
integrity-failed | Step 3 | A bundle’s actual sha256 didn’t match its declared contentHash. Storage tampering or a backend bug. |
fetch-failed | Step 7 | A secret reference couldn’t be resolved (typo, missing IAM, AWS outage). |
worker-failed | Step 9 / 13 | pangolin-setup.sh exited non-zero or timed out; OR the needs_input sentinel was malformed (unparseable JSON, missing question, >1 MiB serialized). |
provider-failed | Step 11 | Runtime adapter (claude binary) exited non-zero with no sentinel. Most common cause in dev: missing ANTHROPIC_API_KEY. |
Each terminal event includes durationMs measured from worker start
(runWorker entry), which slightly precedes dispatch.started.
Reading worker stdout
Section titled “Reading worker stdout”The worker emits one JSON object per line. Typical successful dispatch:
{"kind":"worker.boot","dispatchId":"..."}{"kind":"setup-script.ran","exitCode":0,"durationMs":17,"stdout":"hello\n","stderr":""}{"kind":"runtime.adapter.ran","exitCode":0,"durationMs":23248,"stdout":"<agent output>","stderr":""}{"kind":"dispatch.finished","dispatchId":"...","exitCode":0}Event field semantics:
runtime.adapter.rancarries the runtime adapter’s captured stdout/stderr/exitCode/durationMs. For the Claude Code adapter,stdoutis whateverclaude --printwrote — the agent’s final text response (tool invocations and their results don’t appear in--printoutput; only the final synthesized reply does). This is the primary signal for “what did the agent actually do/say.” Symmetric in shape withsetup-script.ran.
Notable absences:
- No
setup-script.ranevent when there’s nopangolin-setup.sh. Absent is the success state; the worker just moves to step 10. runtime.adapter.ranis only emitted when the adapter returns (whether with exit 0 or non-zero). Ifadapter.invoke()THROWS — e.g., the binary is missing or the spawn fails — the dispatch goes straight todispatch.failedwithreason: 'worker-failed'and noruntime.adapter.ranevent is emitted.
Claude Code permission modes
Section titled “Claude Code permission modes”The Claude Code runtime adapter reads PANGOLIN_CLAUDE_PERMISSION_MODE from
the dispatch’s merged env to decide whether to pass
--dangerously-skip-permissions to the spawned claude --print:
| Mode | Behavior | Use case |
|---|---|---|
bypass (default) | Flag passed. Claude’s interactive tool-call gate is disabled. | Production default — the worker container IS the sandbox; there is no human inside to approve tool calls. Without this, every Bash/Edit/Write the agent attempts is silently denied. |
strict | Flag NOT passed. Claude’s default gate applies. With no approver, all tool calls are denied. | Read-only / analytical dispatches that should produce text but make no filesystem or process changes. |
Unrecognized values fall back to bypass with a console.warn so a typo
never silently leaves dispatches paralysed.
A scoped mode (an allow-list in .claude/settings.json plus the
needs-input helper teaching “denied → write sentinel”) is tracked as a
follow-up; not shipped today.
Where stdout / stderr end up in the result
Section titled “Where stdout / stderr end up in the result”The dispatch result JSON returned by pangolin dispatch run has both:
{ "stdout": "<the structured worker event stream>", "stderr": "<unstructured stderr — node warnings, adapter complaints>", "exitCode": 0, "durationMs": 14149, "resolved": { "subagent": {}, "capabilities": [], "env": [] }}The resolved block is the audit trail: exactly which contentHash of
each artifact actually ran. It’s what pangolin dispatch describe <id>
returns later.
Common diagnostic patterns
Section titled “Common diagnostic patterns”“exit 0 but I don’t see my work happening.” The adapter ran cleanly
but its output isn’t structured. Use a ResultSink to capture, or write
a setup script that produces visible diagnostics (ls, cat, etc.) —
its stdout DOES show up in the setup-script.ran event.
“provider-failed with runtime exited with code 1.” Almost always
missing ANTHROPIC_API_KEY in the dispatch’s env. Check
result.resolved.env for the env bundle that ran, then confirm the bundle
includes the key (pangolin env get <name> shows the ref; the actual values
require pangolin env get upgrades or a manual storage inspection).
“setup-script.ran shows only one of my N skills installed.” Multiple
capabilities each shipped a pangolin-setup.sh. Only one wins
(last-write-wins on the filename). See
Worker file layout — files at adapter-
reserved paths (.claude/skills/<name>/) compose; setup scripts don’t.
“runtime.adapter.ran stdout says ‘git commands are being denied’ / ‘requires approval’.”
You’re hitting Claude Code’s interactive permission gate inside a worker
with no human to approve. Either you’ve set
PANGOLIN_CLAUDE_PERMISSION_MODE=strict deliberately, or your worker image
predates the bypass-by-default change. Fix: leave the env var unset (or
set it to bypass) and rebuild the worker image if you’re running an old
one.
“dispatch.failed integrity-failed.” Something is corrupting storage.
For local FS storage, check disk space + permissions on the rootDir.
For S3, check that nothing else is writing to the same prefix.
See also
Section titled “See also”- ADR-0004 — why the lifecycle vocabulary is closed at six kinds.
- ADR-0008, ADR-0009 — the needs_input convention.
- MVP spec §6.2 (the 14-step lifecycle), §6.3 (overlay/merge), §5.7 (lifecycle event types).