ADR-007: Council synthesis budget and 5s per-turn floor¶
Status: Accepted.
Pairs with: council-howto — operator-facing
formula and tuning guidance.
Source commits: 436ab2a (2026-05-10, K3 — synthesis budget +
floor); dcfb0df (2026-05-11, B1 — apply the same budget to
CouncilRunner, see ADR-008).
Context¶
The council orchestrator runs a multi-agent deliberation:
max_rounds * len(agents) per-turn LLM calls, followed by a
facilitator synthesis call that produces the final position. The
original shape had two failure modes:
- Unbounded synthesis. Synthesis ran with no
asyncio.wait_forwrapper. A wedged frontier-tier provider — token streaming stalled, TCP reset never reaching the client, rate-limit backoff misbehaving — would hang the entire council indefinitely. Operators saw a stuck goal that therecover-stuck-orchestrator-goalrunbook had to clear by hand. - Silently-shrinking per-turn budget.
cfg.timeout_secondswas divided evenly acrossmax_rounds * len(agents)per-turn slots. At 60s / 4 rounds / 3 agents the implied per-turn was 5s — and at 90s / 6 rounds / 4 agents it dropped to 3.75s, below the first-token cold-start latency of frontier providers. Configs accepted at load time produced silent per-turn timeouts at runtime.
Two related decisions: how much of the total council budget should be carved out for synthesis, and what minimum per-turn budget should the framework refuse to accept?
Decision¶
CouncilConfig.synthesis_timeout_seconds carves a dedicated
synthesis budget out of timeout_seconds. The remainder is divided
across per-turn slots via the shared CouncilConfig.per_turn_timeout()
helper. The framework rejects any config whose implied per-turn
budget falls below 5 seconds at load time.
Numerics:
synthesis_timeout_seconds: int = Field(default=60, ge=1).per_turn = (timeout_seconds - synthesis_timeout_seconds) / max(max_rounds * len(agents), 1)._PER_TURN_TIMEOUT_FLOOR_SECONDS = 5(defined as a private module-level constant incontrib/council/config.py:27).
A config that violates the floor raises ValueError from the
Pydantic model_validator with a message that names every input to
the formula:
Implied per-turn timeout is 3.75s
(( 90 - 60 ) / ( 6 rounds * 4 agents )),
below the 5s floor. Raise timeout_seconds, lower
synthesis_timeout_seconds, or reduce max_rounds * agents.
See src/heddle/contrib/council/config.py:81-108 for the validator
and the helper.
Alternatives considered¶
No synthesis-specific budget (rejected)¶
Apply the per-turn budget to synthesis as well; remove
synthesis_timeout_seconds entirely.
- Rejected because synthesis and per-turn deliberation have different shapes: synthesis is one long completion against the full transcript, deliberation is many short turns. Forcing them to share a single budget either under-budgets synthesis (the long-context call gets the same 5-20s as a one-line turn) or over-budgets deliberation (the long completion's budget bloats every turn slot).
- A wedged synthesis with no dedicated timeout is the original failure mode — the rejection rationale and the bug are the same.
Higher floor (10s, rejected)¶
Set the floor at 10s to give frontier providers a comfortable generation budget on top of first-token latency.
- Rejected because legitimate local-tier rapid-fire configs
(LM Studio with
qwen3:0.6bfor adversarial-challenge agents that emit single-line objections) finish per-turn well under 10s. A 10s floor would reject configs that work fine in production for the local tier — the framework would prefer safety over operator autonomy in a way that doesn't match the cost/risk profile. - A frontier-tier user can raise
timeout_secondsto clear the floor; a local-tier user can't lower it.
No floor at all (rejected)¶
Trust the operator. Accept any positive per-turn budget. Let runtime timeouts surface misconfigurations.
- Rejected because the silent-shrinkage failure mode is hard to recognise — the operator sees "council never produced a final result" and chases the wrong cause (provider outage, synthesizer bug). Surfacing the misconfiguration at config-load time turns a runtime mystery into a one-line error.
- Frontier-tier first-token latency is empirically 1-3s on cold-start; anything below 5s rejects valid configs while leaving genuinely small budgets to fail silently. 5s is the smallest floor that excludes "must be wrong" without excluding "small but legitimate."
Validate per-agent rather than uniformly (rejected)¶
Different agents have different model tiers. A council with one frontier agent and three local agents arguably shouldn't apply a uniform 5s floor.
- Rejected because the per-turn timeout is enforced at the
framework level, where
asyncio.wait_fordoesn't know which agent owns the current turn. Per-agent timeouts would require threading the agent context into the budget helper, which the current single-budget shape avoids. - A future refactor could thread
AgentConfigthroughcall_with_budgetand apply tier-specific floors. Out of scope for now; the uniform floor is the simpler invariant.
Consequences¶
Enables:
- Wedged providers cannot hang a council indefinitely — every
agent turn and the synthesis are wrapped in
call_with_budget(see ADR-008 for the shared helper). - Misconfigured per-turn budgets surface as a
ValueErrorfrom the Pydantic validator before the first task dispatches. Operators learn about a bad config atheddle council validate, not at "the council appears to have stopped responding." - The formula is exposed via
CouncilConfig.per_turn_timeout()so the orchestrator, the runner, and any future caller agree on the inputs and the arithmetic.
Costs:
- Configs that pre-date this validation may be rejected at load
time without code changes. The error message lists every input
so the fix is mechanical, but operators upgrading from older
versions may need to raise
timeout_secondsor lowermax_rounds * agents. - Synthesis with
synthesis_timeout_seconds=60rejects on a cold-start frontier provider that takes longer than a minute to assemble the synthesis. Operators with very long transcripts may need to raise the synthesis budget separately from the total — the two fields are independent on purpose, but that independence is a tuning surface to learn. - The floor is a magic number. Future contributors who try to lower it for "rejecting valid configs" need to find this ADR and the empirical claim about first-token latency rather than edit the constant blind.