ADR-008: Council execution paths share one budget helper¶
Status: Accepted.
Pairs with: ADR-007
(the per-turn and synthesis budget this helper enforces).
Source commit: dcfb0df (2026-05-11, B1).
Context¶
The council framework has two execution paths:
CouncilOrchestrator— runs over NATS via theheddle.goals.incomingactor. The path the framework's own examples use.CouncilRunner— runs in-process. The path used by the CLI (heddle council run), the MCPcouncil.runtool (src/heddle/mcp/council_bridge.py), and tournament harnesses incontrib/council/tournament.py.
Commit 436ab2a (ADR-007) enforced synthesis_timeout_seconds and
the 5s per-turn floor in CouncilOrchestrator only.
CouncilRunner had no asyncio.wait_for on either
backend.complete or _synthesize. A wedged local backend wedged
the CLI, MCP, and tournament paths indefinitely — every consumer
except the NATS one.
The review (REPOSITORY_REVIEW_2026-05-10.md §3.2, §8.2) flagged
this as a high-impact correctness gap and recommended unifying the
two paths rather than duplicating the wrapping logic.
The decision was how to share the timeout-enforcement code without having one path quietly drift again.
Decision¶
Both execution paths route their backend.complete,
bridge.send_turn, and _synthesize calls through a shared helper
in src/heddle/contrib/council/_budget.py:
CouncilTimeoutError(TimeoutError)carrieslabel("agent:X","synthesis", etc.) andtimeout_secondsattributes for attribution. Subclassing the builtinTimeoutErrormeans existingexcept TimeoutError:blocks still catch it without callsite changes.call_with_budget(coro, *, timeout_seconds, label)wraps the coroutine inasyncio.wait_forand raisesCouncilTimeoutErroron expiry.
Both CouncilRunner._execute_agent_turn /
_execute_via_bridge and CouncilOrchestrator.handle_message
import the helper and pass cfg.per_turn_timeout() (deliberation)
or cfg.synthesis_timeout_seconds (synthesis). A timed-out turn
records [Timeout: <agent> did not respond within Ns] in the
transcript; a timed-out synthesis records [Synthesis timed out
after Ns]. The shapes are deliberately identical across both
paths.
See:
src/heddle/contrib/council/_budget.py(the helper).src/heddle/contrib/council/runner.py:204,288,373(runner sites).src/heddle/contrib/council/orchestrator.py:165,210(orchestrator sites).
Alternatives considered¶
Duplicate asyncio.wait_for inline in both paths (rejected)¶
Copy the orchestrator's asyncio.wait_for(...) wrappers into
CouncilRunner without extracting a helper.
- Rejected because the timeout-attribution shape — building the transcript entry, structured-log key, and exception type consistently — wants to be one piece of code. Two copies diverge on the first cosmetic edit and the second consumer of the budget logic forgets the new shape.
- Reviewer's note: "two paths, one budget" is the design goal; two paths and two budgets would just create the second drift opportunity.
One concrete class with both code paths as methods (rejected)¶
Collapse CouncilOrchestrator and CouncilRunner into a single
class with two entry points (a run_over_nats and a run_in_process
method on the same class).
- Rejected because the two paths have genuinely different
lifecycles: the orchestrator is a NATS subscriber actor with
BaseActor's_wait_next_messageloop; the runner is a one-shot async function with no subscription. Merging them conflates the message-loop responsibility with the deliberation-loop responsibility. - The shared piece is the budget enforcement, not the loop shape. Extract only what's shared; leave the rest separated.
Decorator-based timeout (rejected)¶
Wrap _execute_agent_turn and _synthesize with a decorator that
reads timeout_seconds from self.config.
- Rejected because the timeout label varies per call
(
"agent:critic","agent:proposer","synthesis") and a decorator that closes overselfcan't see it without introspection. The helper takes the label explicitly, which keeps the call site honest about what's being budgeted.
Consequences¶
Enables:
- A wedged provider in any of CLI, MCP, tournament, or NATS paths
surfaces as a
CouncilTimeoutErrorwith attribution rather than a hung process. - The two paths can evolve budget semantics together: a future
change to retry-on-timeout or partial-transcript-recovery lands
in
_budget.pyonce, not in two places. - Subclassing
TimeoutErrormeans downstream code that catchesTimeoutError(the MCP bridge's error reporting, tournament harnesses' result aggregation) continues to work without import changes.
Costs:
_execute_agent_turnand_execute_via_bridgenow require atimeout_secondsargument. External callers — if any future consumer subclassesCouncilRunnerand overrides these — must pass it. No callers outside the module today; the cost is hypothetical but real.- The label string is informal —
"agent:<name>"and"synthesis"are conventions, not enum values. A future consumer that wants structured attribution would need to parse the string or accept the convention. - One more module to read when investigating a council timeout
(
_budget.py). Mitigated by the cross-references from both call sites.