Skip to content

Multi-Agent Deliberation with Councils

Councils let multiple AI agents discuss a topic iteratively, each with their own context, role, and perspective. Instead of one LLM processing everything, a council runs structured multi-round debates where agents react to each other's positions until they converge or hit a round limit.

When to use a council vs. a pipeline

Use case Approach
Steps A, B, C run in order, each feeding the next Pipeline
Multiple experts each weigh in, then a facilitator synthesizes Council
Blind review where reviewers shouldn't see each other's work Pipeline with parallel stages
Iterative refinement through debate and rebuttal Council
Fixed, repeatable data transformation Pipeline
Open-ended analysis where the best answer emerges from discussion Council

Councils are designed for tasks where the quality of the answer improves through iteration — architecture reviews, policy analysis, risk assessment, adversarial stress-testing, consensus-building.


Quick start

1. Define agents in a council config

# configs/councils/architecture_review.yaml
name: "architecture_review"
protocol: "round_robin"
max_rounds: 3
timeout_seconds: 300
# Budget reserved for the facilitator's synthesis call.  Subtracted
# from timeout_seconds to compute the per-turn budget:
#   per_turn = (timeout_seconds - synthesis_timeout_seconds) / (max_rounds * len(agents))
# The implied per-turn budget must be >= 5s or the config is rejected
# at load time.  Default: 60s.
synthesis_timeout_seconds: 60

convergence:
  method: "llm_judge"
  threshold: 0.8
  backend_tier: "standard"

agents:
  - name: "architect"
    worker_type: "reviewer"
    tier: "standard"
    role: "Senior software architect  focus on system design and scalability"
    sees_transcript_from: ["all"]

  - name: "security_expert"
    worker_type: "reviewer"
    tier: "standard"
    role: "Security specialist  focus on threat modeling and attack surface"
    sees_transcript_from: ["all"]

  - name: "critic"
    worker_type: "reviewer"
    tier: "frontier"
    role: "Devil's advocate  stress-test all proposals, find weaknesses"
    sees_transcript_from: ["architect", "security_expert"]

facilitator:
  tier: "standard"
  synthesis_prompt: |
    Synthesize the team's discussion into recommendations.
    Highlight agreements, unresolved tensions, and action items.
  convergence_prompt: |
    Rate agreement among participants from 0.0 to 1.0.
    Respond with JSON: {"score": 0.X, "reason": "..."}

2. Run without infrastructure (CouncilRunner)

from heddle.worker.backends import build_backends_from_env
from heddle.contrib.council.config import load_council_config
from heddle.contrib.council.runner import CouncilRunner

config = load_council_config("configs/councils/architecture_review.yaml")
runner = CouncilRunner(build_backends_from_env())

result = await runner.run("Should we migrate to microservices?", config=config)

print(result.synthesis)
print(f"Rounds: {result.rounds_completed}, Converged: {result.converged}")

3. Run via MCP tools

Add to your MCP gateway config:

tools:
  council:
    configs_dir: "configs/councils"
    enable: [start, status, transcript, intervene, stop]

Then from Claude Desktop or any MCP client:

  • council.start — start a discussion
  • council.status — check progress
  • council.transcript — read the full discussion
  • council.intervene — inject a human message mid-discussion
  • council.stop — stop early and synthesize

Core concepts

Agents

Each agent has:

  • name — unique identifier within the council
  • worker_type — which Heddle worker config to use (or bridge for external LLMs)
  • tier — which model to use (local, standard, frontier)
  • role — system-prompt-level instructions defining the agent's perspective
  • sees_transcript_from — visibility filter (which other agents' contributions this agent can see)
  • max_tokens_per_turn — token budget per response

Protocols

Protocols define who speaks when and what they see:

Protocol Behavior
round_robin All agents speak every round in config order
structured_debate Phase 1: opening statements. Phase 2+: rebuttals. Final: closing
delphi Anonymized positions (agents see "Participant A", not real names). Convergence score fed back each round to reduce anchoring bias

Convergence detection

Controls when to stop the discussion:

Method How it works
none Run all max_rounds, never stop early
position_stability Compare each agent's position across rounds using text similarity. Stop when average similarity exceeds threshold
llm_judge Ask an LLM to rate agreement 0-1 after each round. Stop when score exceeds threshold

Transcript management

The TranscriptStore maintains the full discussion history with:

  • Per-agent visibility filtering — agents only see what their sees_transcript_from config allows
  • Token-budget truncation — when the transcript exceeds the budget, the oldest entries are dropped (preserving recent context)
  • Convergence scores attached to each round

Audience participation

External participants can inject messages into a running council discussion. Agents see these as a separate [AUDIENCE REACTIONS] block and may choose to engage or ignore them.

Key components:

  • TranscriptEntry.entry_type"turn" (default, panelist) or "interjection" (audience). Backward compatible: existing code that omits entry_type gets "turn".
  • TranscriptStore.inject_interjection(agent_name, content, role) — add an audience contribution to the current round. Thread-safe.
  • CouncilRunner.inject(agent_name, content, role) — inject a spectator interjection while run() is executing. Safe to call from another thread or coroutine.
  • MCP council.intervene action — set as_spectator: true to tag the message as an interjection instead of a panelist turn.

Example — interactive Town Hall Debate:

python examples/town-hall/run.py \
    configs/councils/town_hall_debate.yaml \
    --topic "Remote work is better than office work" \
    --interactive

Type messages while the debate runs; your input appears in the next agent's context under [AUDIENCE REACTIONS].

Post-hoc scoring

Once a council finishes, two scorers can grade the result. Both implement the Scorer ABC; both return a ScoringResult.

Scorer Use case
JudgePanelScorer Adversarial / two-side debates. Each judge picks one winner; the panel aggregates by majority vote (ties → draws).
RubricScorer Independent per-participant evaluation (Q&A panels, blind taste tests). Judges grade every participant on every rubric dimension. Anonymizes the transcript at score time so judges grade content, not branding; the alias map (agent_name → "Participant A/B/C") lands in metadata for the caller's reveal step.

TournamentRunner schedules round-robin matchups across many models / topics, dispatches each through the configured scorer, and aggregates a leaderboard plus a head-to-head matchup matrix. Both the debate-arena and blind-taste-test examples are full working demonstrations.


ChatBridge — external LLM adapters

Not every council participant needs to be a standard Heddle worker. ChatBridge adapters let you bring in external LLM providers or human participants as full council members.

CouncilRunner dispatches each agent through its configured bridge when agent.bridge is set (cached per agent name across rounds, so session history is preserved). When agent.bridge is unset, the runner falls back to the tier-based LLMBackend path. Mixing both in one council is fine — agents are routed independently.

Available adapters

Adapter Provider Key feature
AnthropicChatBridge Claude API Session-aware, messages accumulate
OpenAIChatBridge OpenAI / ChatGPT GPT-4o, GPT-4, etc.; rescues thinking-model output from reasoning_content
OllamaChatBridge Ollama (local) Local models with conversation history
LMStudioChatBridge LM Studio OpenAI-compatible /v1 server (MLX or llama.cpp); subclass of OpenAIChatBridge
ManualChatBridge Human Callback or queue-based, with timeout

Using a ChatBridge agent in a council

agents:
  - name: "gpt_perspective"
    bridge: "heddle.contrib.chatbridge.openai.OpenAIChatBridge"
    bridge_config:
      model: "gpt-4o"
      api_key_env: "OPENAI_API_KEY"
    tier: "standard"  # ignored when ``bridge`` is set
    role: "External perspective  challenge assumptions from a different model's viewpoint"
    max_tokens_per_turn: 1500   # propagated to the bridge unless bridge_config overrides

For programmatic construction, chatbridge_spec(model_name) from heddle.contrib.chatbridge.discover returns (dotted_path, kwargs) ready to drop into an AgentConfig — the debate-arena and blind-taste-test examples use it to wire one bridge per debater from a CLI --models list.

Human-in-the-loop

from heddle.contrib.chatbridge.manual import ManualChatBridge

async def ask_human(message, context, session_id):
    print(f"\n--- Council asks you ({session_id}) ---")
    print(message[:500])
    return input("Your response: ")

bridge = ManualChatBridge(on_prompt=ask_human, timeout_seconds=300)

ChatBridge as a standard Heddle worker

Any ChatBridge can be wrapped as a ProcessingBackend for use in regular pipelines (not just councils):

name: "gpt4_processor"
processing_backend: "heddle.contrib.chatbridge.worker.ChatBridgeBackend"
processing_config:
  bridge_class: "heddle.contrib.chatbridge.openai.OpenAIChatBridge"
  model: "gpt-4o"
  api_key_env: "OPENAI_API_KEY"

Design patterns

Pattern 1: Architecture review council

Three agents with different expertise, full visibility, critic runs on a stronger model:

agents:
  - name: "architect"
    worker_type: "reviewer"
    tier: "standard"
    role: "System design and scalability"
    sees_transcript_from: ["all"]
  - name: "security"
    worker_type: "reviewer"
    tier: "standard"
    role: "Security and threat modeling"
    sees_transcript_from: ["all"]
  - name: "critic"
    worker_type: "reviewer"
    tier: "frontier"
    role: "Find weaknesses in every proposal"
    sees_transcript_from: ["architect", "security"]

Pattern 2: Delphi consensus

Anonymous positions to reduce anchoring bias, with convergence feedback:

protocol: "delphi"
convergence:
  method: "llm_judge"
  threshold: 0.85
agents:
  - name: "expert_a"
    worker_type: "analyst"
    tier: "standard"
    role: "Domain expert A"
    sees_transcript_from: ["all"]
  - name: "expert_b"
    worker_type: "analyst"
    tier: "standard"
    role: "Domain expert B"
    sees_transcript_from: ["all"]

Agents see "Participant A", "Participant B" instead of real names.

Pattern 3: Mixed-vendor deliberation

Use different LLM providers for diversity of perspective:

agents:
  - name: "claude_analyst"
    worker_type: "analyst"
    tier: "standard"
    role: "Analytical perspective (Claude)"
  - name: "gpt_analyst"
    bridge: "heddle.contrib.chatbridge.openai.OpenAIChatBridge"
    bridge_config:
      model: "gpt-4o"
      api_key_env: "OPENAI_API_KEY"
    role: "Alternative perspective (GPT-4)"
  - name: "local_analyst"
    bridge: "heddle.contrib.chatbridge.ollama.OllamaChatBridge"
    bridge_config:
      model: "llama3.2:3b"
    role: "Efficiency-focused perspective (local model)"

Installation

pip install heddle-ai[council]              # Council framework (no new deps)
pip install heddle-ai[chatbridge]           # ChatBridge adapters (adds openai)
pip install heddle-ai[council,chatbridge]   # Both

Or from source:

uv sync --extra council --extra chatbridge

API reference

See the Contrib API reference for class-level documentation of CouncilRunner, CouncilConfig, ChatBridge, and all adapter classes.