RFD 0004: Methodology hooks — default built-in + plugin enhancement¶

Summary¶

Every Samuel methodology (auto-mode / ralph, eventually others) is a default built-in workflow with named hook points that plugins can attach to. The framework ships a working methodology end-to-end; plugins enhance, override, or add steps without replacing the whole workflow.

The hook surface is small (~10 named hooks) and deterministic — plugins register handlers via their manifest, hooks fire in defined order, multiple plugins attaching to one hook compose by samuel.toml ordering.

This decouples the methodology shape (Samuel's job, baked into the framework) from the methodology content (plugins' job — custom quality checks, extra context generators, alternative agents, translator side-effects).

Problem statement¶

v1's auto-mode is monolithic ([[../../wiki/entities/auto-loop]]). The loop calls PrepareProgressContext, GenerateProjectSnapshot, GenerateTaskContext, then InvokeAgent, then sleeps. Every step is hardcoded. Extending requires forking Samuel.

But the methodology is almost the right shape for plugin extension. v1's pre-computed-context pattern ([[../../wiki/concepts/pre-computed-context]]) is exactly the kind of pluggable thing — a Python plugin could add a python-context.md (virtualenvs, requirements.txt, pip changes); a TDD-strict plugin could add red-green-refactor hooks; a translator plugin could mirror per-iteration logs to a tool-specific location.

v2 needs:

A built-in working methodology. Per [[../../wiki/concepts/methodology-default-plus-plugin]], the Samuel Way ships. A user who installs no methodology plugins gets a coherent default flow.
Named hook points along the methodology's lifecycle. Plugins attach to them.
Deterministic firing order. Multiple plugins on one hook compose, doesn't surprise.
Default behaviors run when no plugin overrides. The framework provides the v1-equivalent logic for every hook.
Plugin failures don't crash the loop. A hook handler that errors logs a warning, loop continues (configurable per-hook for strict semantics where the failure should abort).

Requirements¶

The ralph methodology (auto-mode) ships built-in. Works without any plugin installed.
~10 named hook points cover the lifecycle (before/after loop, before/after iteration, context generators, agent invocation, quality check, etc.).
Plugins register hook handlers via samuel-plugin.toml [provides] hooks = [...].
Hook execution order is deterministic and overridable via samuel.toml [hooks.<name>.order].
Hook handler errors are non-fatal by default; strict = true flips per-hook for must-not-fail cases.
Plugins can read state (samuel.toml config, prd.toon, etc.) and write to a scoped namespace (.samuel/run/plugins/<name>/).

Constraints¶

The hook surface must stay small (≤15 hooks at v2.0). Larger surfaces become impossible to document.
Hook signatures are stable within a plugin protocol major version ([[0003|RFD 0003]]).
Plugins running per-iteration affect token budget and latency — hook handlers should be fast (target < 50ms each).
Hook ordering across plugins must be predictable from a single config file (samuel.toml).

Background¶

v1's loop (recap)¶

From [[../../wiki/entities/auto-loop]]:

for i = 1..MaxIterations:
    prd = LoadAutoPRD(.claude/auto/prd.json)
    if prd.GetNextTask() == nil: stop

    PrepareProgressContext(projectDir)
    GenerateProjectSnapshot(projectDir)
    GenerateTaskContext(projectDir, prd, false)

    InvokeAgent(cfg)

    sleep PauseSecs

Five steps per iteration, all hardcoded. The agent is invoked once per iteration. Quality checks are inside the agent's prompt (config.quality_checks listed; agent runs them via Bash).

v1's loop is right at the methodology level but closed at the extension level. No way to plug in.

Hook patterns in the ecosystem¶

Git hooks: scripts in .git/hooks/ fire at lifecycle points (pre-commit, post-commit, pre-push). Simple, file-based, no programmatic registration.

npm scripts hooks: pre<script> and post<script> conventions in package.json automatically fire around named scripts. Minimal but useful.

WordPress actions & filters: add_action() / add_filter() with priority numbers. Plugins register handlers; WordPress fires them in priority order. Battle-tested at massive scale; the priority-number scheme is the standard for fine-grained ordering.

Kubernetes admission webhooks: validating + mutating webhooks fire at named admission points. Heavyweight but principled — each webhook is an HTTP endpoint, validation logic, well-defined request/response shapes.

Rails hooks (Rake tasks, before/after callbacks, Active Record callbacks): convention-over-configuration. Define before_save, after_create — Rails calls them automatically. Plugins extend via Concerns.

OpenTelemetry hooks: instrumentation libraries hook into method invocations declaratively. Minimal API; plugins compose well.

Samuel's hooks borrow from WordPress (priority/order), Git (named lifecycle points), and Rails (default-overridable conventions).

What changes from v1¶

The lifecycle stays — the loop's shape is right. What gets added is named lifecycle points where plugins can attach. v1's hardcoded PrepareProgressContext becomes the default handler for hook context.progress; plugins can register their own handlers; both run (or one replaces the other, per ordering rules).

Options considered¶

Option A: Named hook framework, default+plugin enhancement (chosen)¶

The methodology defines named hook points. Plugins declare handlers in their manifest. The framework fires hooks in declared order; default handlers run if no plugin handlers attach (or alongside, per ordering rules).

Hook surface (~10 hooks):

  before:loop          → fires once at samuel run start
    before:iteration   → fires before each iteration
      iteration.gate   → returns "impl" or "discovery" (pilot mode decision)
      context.snapshot → regenerate project-snapshot.toon
      context.progress → regenerate progress-context.md, rotate progress.md
      context.task     → regenerate task-context.toon
      context.extra    → plugin-only slot for additional pre-compute files
      before:agent.invoke
      agent.invoke     → run the agent (default: built-in adapter for configured agent)
      after:agent.invoke
      quality.check    → default: run config.quality_checks; plugins add checks
    after:iteration
  after:loop

Plugin manifest:

[provides]
hooks = ["context.extra", "quality.check", "after:iteration"]

[hook.context_extra]
handler = "generate_python_context"  # WASM export name

[hook.quality_check]
handler = "run_pytest"

[hook.after_iteration]
handler = "notify_slack"

Framework's per-hook config:

# samuel.toml
[hooks."context.extra".order]
samuel-python-plugin = 1     # runs first
samuel-rust-plugin = 2       # runs second

[hooks."quality.check"]
strict = true                # any handler erroring aborts the iteration

Pros: - Plugin authors learn one mechanism. Same shape regardless of which hook they target. - The methodology shape is built in (don't worry about loop logic), only extensions are plugin-side. - Adding a hook is a framework MINOR bump — purely additive, doesn't break existing plugins. - Default handlers preserve v1's behavior. Users with no plugins installed get v1-equivalent functionality. - Composes well: multiple plugins on one hook run in declared order.

Cons: - Hook surface is a public API contract — once shipped, hard to change without protocol-version bumps. - Plugin authors can write slow hook handlers that bog down iterations. - Cross-plugin coordination (plugin B's handler depends on plugin A's output) needs documentation.

Effort: Medium. Hook framework + default handlers in framework binary (~2 weeks). Per-hook tests with fake plugins.

Option B: Plugin-only methodology — no built-in¶

The framework provides only the hook plumbing. Every step is a plugin handler. No defaults.

Pros: - Minimum framework code. Methodology authors have full control. - Forces a clean separation.

Cons: - A user with no plugins installed gets nothing. samuel run is useless until plugins are installed. - New users have to figure out which plugins together constitute a working methodology. - v1's flagship feature (auto-mode) becomes "install these 8 plugins to get it back" — bad UX. - Conflicts with [[../../wiki/concepts/methodology-default-plus-plugin]]'s explicit decision.

Effort: Lower in framework. Much higher in onboarding cost.

Option C: Built-in only — no extension¶

The methodology is monolithic, like v1. No hooks. No plugin extension.

Pros: - Simplest framework code. - Predictable.

Cons: - Same problem v1 has — extension requires forking Samuel. - Defeats the "framework + skills hub" thesis. Methodology behavior is the most-customized surface; closing it kills the value prop.

Effort: Lower. But the product loses.

Option D: Strategy pattern with one swap point¶

Define a Methodology interface; plugins implement it whole-cloth. A plugin replaces the entire methodology, not extends it.

type Methodology interface {
    Run(ctx, opts) error
}

// ralph (built-in) and tdd-strict (plugin) both implement Methodology

Pros: - Clean interface. Easy to reason about. - Methodology plugins are self-contained.

Cons: - All-or-nothing. A plugin that wants to add one custom quality check must rewrite the entire loop. - Common case (small additions to v1's behavior) becomes hardest case. - Discourages plugin authoring — bar to entry is "write a complete methodology."

Effort: Lower in framework. Much higher per plugin.

Option E: Event bus (loose coupling, async)¶

The framework emits events; plugins subscribe; handlers are async. Pub/sub style.

Pros: - Maximally loose coupling. - Plugins can subscribe to multiple events.

Cons: - Async handlers complicate the iteration loop. Did the quality.check finish before we proceeded? Hard to tell without explicit barriers. - Loop is sequential by design. Asynchrony doesn't help here. - Adds queue management to the framework.

Effort: Higher. Wrong fit for a sequential loop.

Decision¶

Adopt Option A: named hook framework with default+plugin enhancement.

The decision rests on three judgments:

The Samuel Way is the product. v1's methodology — Ralph Wiggum loop + pre-computed context + pilot mode — is the differentiator. Shipping it as default behavior anchors v2 in v1's strengths. Plugin extension is value-add; plugin replacement is rare.
WordPress-style hooks scale. WordPress has powered tens of millions of sites for two decades on this exact pattern. The pattern fits sequential workflows with extension points perfectly. We borrow what works.
Hook surface stays minimal. ~10 named hooks at v2.0 launch. Each maps to a clear lifecycle point. Adding hook 11 is a framework minor bump — purely additive. No need to over-design.

Implementation plan¶

Phase 1 — define the hook surface (PRD 0004, week 1)¶

internal/methodology/hooks/hooks.go:

package hooks

// HookName is the canonical name of a methodology hook.
type HookName string

const (
    BeforeLoop         HookName = "before:loop"
    AfterLoop          HookName = "after:loop"
    BeforeIteration    HookName = "before:iteration"
    AfterIteration     HookName = "after:iteration"
    IterationGate      HookName = "iteration.gate"      // impl vs discovery
    ContextSnapshot    HookName = "context.snapshot"
    ContextProgress    HookName = "context.progress"
    ContextTask        HookName = "context.task"
    ContextExtra       HookName = "context.extra"
    BeforeAgentInvoke  HookName = "before:agent.invoke"
    AgentInvoke        HookName = "agent.invoke"
    AfterAgentInvoke   HookName = "after:agent.invoke"
    QualityCheck       HookName = "quality.check"
)

// Hook is a typed handler signature.
type Hook interface {
    Name() HookName
    Run(ctx context.Context, input HookInput) (HookOutput, error)
}

type HookInput struct {
    PromptContext  *PromptContext   // shared with prompt templates per RFD 0006
    Iteration      int
    Mode           string          // "implementation" | "discovery"
    PreviousOutput HookOutput      // from prior handler in the chain (for chainable hooks)
}

type HookOutput struct {
    Modified     bool
    Data         map[string]any  // hook-specific payload
}

// Registry holds the ordered chain of handlers per hook.
type Registry struct {
    // handlers maps hook name → ordered list of (plugin name, handler) pairs
    handlers map[HookName][]registered
    config   HookConfig
}

Phase 2 — built-in default handlers (PRD 0004, week 1-2)¶

Each hook gets a default handler implemented by the framework, ported from v1:

Hook	Default behavior (ported from v1)
`before:loop`	Load + validate prd.toon
`iteration.gate`	`ShouldRunDiscovery` from v1 — gate impl vs discovery
`context.snapshot`	Regenerate project-snapshot.toon (file inventory, test gaps, TODOs, git log)
`context.progress`	Regenerate progress-context.md, rotate progress.md if > 500 lines
`context.task`	Regenerate task-context.toon (impl detail or discovery summary)
`context.extra`	No-op (plugin-only slot)
`before:agent.invoke`	Compose final prompt from template + samuel.toml
`agent.invoke`	Run the configured agent adapter (claude/codex/copilot/gemini/kiro)
`after:agent.invoke`	Log iteration timing, persist any output captured
`quality.check`	Run `config.quality_checks` commands; fail iteration if any fail
`after:iteration`	Reload prd.toon (agent may have mutated via CLI subcommands), increment iteration counter
`after:loop`	Emit summary, optionally trigger plugin handlers (e.g., notify-slack)
`before:iteration`	Acquire iteration sleep lock (PauseSecs)

The framework's built-in handlers are first-class hooks themselves — registered with the registry at startup, fire in declared order. This makes them swappable: a plugin can register a handler with order 0 to run before the default, or replace the default by registering at the same order (last-write-wins with explicit warning).

Phase 3 — plugin registration (PRD 0003, week 3)¶

Plugin manifest [provides] hooks = [...] lists hook names the plugin attaches to. Per-hook handler binding lives in tier-specific blocks:

For WASM plugins:

[wasm]
module = "plugin.wasm"

[wasm.hook_handlers]
"context.extra"   = "on_context_extra"
"quality.check"   = "on_quality_check"
"after:iteration" = "on_after_iteration"

The named exports (on_context_extra, etc.) get called when the corresponding hook fires.

For OCI plugins:

[oci.hook_handlers]
"agent.invoke"    = "invoke"   # subcommand on the container's entrypoint
"after:iteration" = "post"

Phase 4 — execution order (PRD 0004, week 2)¶

Order resolution:

Built-in default handler registers at framework startup at order 100.
Plugin handlers register at order from their manifest (default 200 if unspecified) or samuel.toml override.
Per-hook execution: walk handlers in ascending order, calling each.

For chainable hooks (context.snapshot, context.progress, context.task, context.extra, quality.check), each handler's output is fed as input to the next. Plugin can augment what the default produced.

For replace-style hooks (agent.invoke, iteration.gate), the last handler to register wins. Most plugins won't replace these.

samuel.toml override:

[hooks."context.snapshot".order]
"samuel-python-plugin" = 50           # runs before built-in default
"samuel-default"       = 100          # built-in (explicit name for override)
"samuel-rust-plugin"   = 150          # runs after built-in

[hooks."quality.check"]
strict = true                          # iteration aborts on any handler error

Phase 5 — error handling (PRD 0004, week 2)¶

Default: hook handler errors logged as structured warnings, loop proceeds. The handler's plugin name + hook name + error message persist to progress.md as a [hooks.warning] entry.

strict = true per-hook flips this: any handler error aborts the iteration (and the loop, if MaxConsecFails reached per [[../../wiki/entities/auto-loop]]).

Default strict = false for all hooks except:

quality.check — strict by default (a failing quality check should abort)
before:loop — strict (a setup failure should not produce a half-running loop)
after:loop — non-strict (cleanup handlers shouldn't block the loop from concluding)

Phase 6 — context propagation (PRD 0004, week 2-3)¶

Hook handlers receive the PromptContext from [[0006|RFD 0006]] — the full set of template variables (Samuel, Project, Methodology, Iteration, Config, Guardrails, Paths, State, Mode, Hooks, Plugins).

Handlers can:

Read any field for decision-making.
Write to Plugins["<plugin-name>"] namespace to contribute data to subsequent hooks and the agent prompt.
Cannot modify Project, Config, or State directly — those flow from framework state, not plugin output.

Plugin-namespaced data is how a python-plugin adds a Python block to the prompt template:

ctx.Plugins["python"] = {
  "Venv":         ".venv",
  "Requirements": "requirements.txt",
  "PyVersion":    "3.12",
}

The prompt template accesses it as {{.Plugins.python.Venv}}.

Phase 7 — testing (PRD 0004, week 3)¶

internal/methodology/hooks/testutil/ provides a FakePlugin for handler-chain testing. Tests cover:

Default handler fires alone produces v1-equivalent output.
Plugin handler at order < 100 modifies context before default sees it.
Plugin handler at order > 100 augments default's output.
Two plugins at the same order: last-registered wins, with warning logged.
Plugin handler errors with strict = false → warning logged, loop continues.
Plugin handler errors with strict = true → iteration aborts.
Plugin contributes to ctx.Plugins["plugin-name"] → prompt template sees it.
10 plugins all attaching to one hook compose correctly.

Acceptance criteria¶

Compatibility and migration¶

v1 users: no v1 plugin authors exist. Migration is conceptual: "the loop that ran your auto mode is now a sequence of hook handlers. Default behavior is the same. Plugins can extend."
v1 prompt content: ported into the default hook handlers. v1 users don't see a behavior change unless they install a plugin that overrides a hook.
Plugin protocol versioning: hook surface is part of plugin protocol v1.0. Additive hook additions in v1.x. Changes to hook signatures or removals are v2.0 of protocol (framework v3 territory).

Risks¶

Risk	Likelihood	Mitigation
Hook handler slow (e.g., shell-out to external tool) bogs iterations	High	Document expected latency (< 50ms target, < 500ms hard ceiling). `samuel run start --profile` reports per-hook timing. Plugins with slow handlers run in background where possible.
Two plugins on same hook produce conflicting output (chainable case)	Medium	Plugin authors document compatibility. Test suites cover common pairings. v2.0 ships with known-compatible starter plugins.
Plugin handler accidentally reads sensitive state (other plugins' private data)	Low	`Plugins["<name>"]` namespace is by convention; framework doesn't enforce isolation. Documented "plugins should only read their own namespace." For v2.0, this is honor-system; v3 may add enforcement.
Hook surface needs to grow during v2.x	High	Additive additions are minor bumps. Reserved hook names (`extension.`, `methodology.`) documented for future.
Plugin author confused by ordering rules	Medium	One-page hook authoring guide. Visual order resolution example. `samuel doctor --hooks` prints the registered chain.
`strict = true` is misconfigured and aborts loop unnecessarily	Medium	Default `strict = false` for all hooks except quality.check and before:loop. Plugin authors can document required strict settings.

Resolved decisions (2026-05-12)¶

Hook execution timing budget: default 5-minute timeout per hook, configurable in samuel.toml [hooks.<name>.timeout]. Quality-check legitimately runs long (full test suites); other hooks should be sub-second to sub-minute. Timeout breach aborts the hook with structured warning; loop continues unless strict = true.
Per-iteration hook profiling: yes — emit timing stats to progress.md as [hooks.timing] entries in verbose mode. Optional via --profile flag on samuel run start.
Hook handler retries: framework does not retry. Hook handler implements its own retry policy if needed (e.g., for network.outbound-using hooks hitting transient failures).
Multi-methodology projects: methodologies share the hook framework but maintain separate runtime state. Default .samuel/run/ for single-methodology projects. Auto-namespaced .samuel/run/<methodology>/ when multiple methodologies are installed in one project.
Hook handler observability (OpenTelemetry): deferred to post-v2.0. Plugins can self-instrument with their preferred library.
Privilege escalation through hooks: framework verifies the plugin's capability grant before invoking its hook handler. A plugin without exec capability cannot register an agent.invoke handler that spawns subprocesses. Enforced in plugin loader at install time + at handler invocation time.

Outcome¶

To be filled in post-v2.0 launch. Expected outcomes:

The 10 starter-pack plugins (per [[0007|RFD 0007]]) use ~3-4 hooks total, all at the default ordering — confirms simple cases stay simple.
One or two third-party plugins use 5+ hooks creatively in the first six months — confirms expressiveness.
Hook handler latency stays under target for ~95% of registered handlers.
The strict semantics catch real failures without producing false-positive aborts.

[[0001|RFD 0001]] — plugin tiers (WASM and OCI plugins implement hook handlers)
[[0003|RFD 0003]] — manifest schema, plugin protocol version
[[0005|RFD 0005]] — Plugin interface (Manifest method exposes hook bindings)
[[0006|RFD 0006]] — samuel run [methodology] and PromptContext (consumed by hooks)
[[../../wiki/concepts/methodology-default-plus-plugin]] — wiki concept this RFD ports
[[../../wiki/synthesis/auto-mode-v2-design]] — wiki synthesis of the hook surface
[[../../wiki/concepts/pre-computed-context]] — v1's innovation that lives in default handlers
[[../../wiki/entities/auto-loop]] — v1 reference implementation
PRD 0004 (Methodology) — implements the hook framework + default handlers