Skip to content

RFD 0006: samuel run [methodology] — Ralph as default, CLI-mutation pattern

Summary

v2's flagship workflow is samuel run [methodology] — runs the configured methodology, default ralph (the Ralph Wiggum methodology, Samuel's autonomous coding loop). The methodology positional accepts the methodology name or an alias (rw for Ralph).

Three changes from v1's samuel auto / samuel run:

  1. Methodology is explicit and pluggable. samuel run defaults to the methodology declared in samuel.toml; samuel run tdd-strict runs a plugin-provided methodology. Adds extensibility per [[0004|RFD 0004]].

  2. TOON-encoded runtime files. prd.toon, task-context.toon, project-snapshot.toon replace v1's prd.json + markdown. Token-efficient, malformation-tolerant per-row recovery, per [[../../wiki/concepts/toon-evaluation]].

  3. Agent never writes prd directly. Mutations go through CLI subcommands: samuel run done <id>, samuel run skip <id>, samuel run reset <id>, samuel run enqueue <title>. Samuel CLI is the only writer of runtime state. Decouples storage encoding from agent contract — load-bearing change.

Plus retained from v1: smart bare invocation (never silently starts a loop), samuel auto as a permanent alias, pilot mode (samuel run pilot, samuel run --discover-only), pre-computed context generators.

Problem statement

v1's samuel run is solid ([[../../wiki/entities/command-tree-v1]]). v1 already renamed samuel autosamuel run in its v3.0.0 release, kept auto as a permanent alias, deprecated the nested task list/complete/skip/reset forms in favor of flat verbs (tasks, done, skip, reset). v2 inherits all of that.

What v2 adds:

  1. Methodology selection. v1 has one methodology hardcoded. v2 makes it pluggable per [[0004|RFD 0004]]'s hooks framework. The CLI surface needs a way to say "run the methodology I configured."

  2. CLI-mutation pattern. v1's prompt instructs the agent: "Update prd.json directly for status changes." This works but fragiles when the agent emits malformed JSON ([[../../wiki/concepts/toon-evaluation]]). v2 moves mutations through CLI subcommands.

  3. TOON runtime files. Token efficiency wins for the agent's per-iteration reads. Storage format change is invisible to the agent if the agent uses CLI mutations exclusively.

Requirements

  • samuel run [methodology] runs the configured (or specified) methodology end-to-end.
  • samuel run (no methodology) uses the default from samuel.toml.
  • samuel run ralph and samuel run rw are equivalent (rw is the Ralph Wiggum alias).
  • samuel auto is a permanent alias for samuel run (v1 commitment, kept).
  • Smart bare invocation: samuel run with no args → status if initialized, actionable help otherwise.
  • samuel run init, samuel run start, samuel run status, samuel run pilot, samuel run tasks subcommands match v1's surface.
  • New mutation subcommands: samuel run done <id>, samuel run skip <id>, samuel run reset <id>, samuel run enqueue <title> operate atomically on prd.toon (already in v1; v2 makes them the canonical mutation interface).
  • Auto-mode prompt instructs the agent to use mutation subcommands, not edit prd.toon.
  • All commands support --json.

Constraints

  • samuel run is the user-facing flagship verb. Cannot break v1 muscle memory.
  • v1's samuel auto alias is permanent — cannot be deprecated.
  • Methodology must be pluggable (RFD 0004) — the CLI surface is the interaction point.

Background

v1's samuel run (recap)

From [[../../wiki/entities/command-tree-v1]]:

samuel run                  # smart bare invocation (status or help)
  ├── init                  # initialize loop runtime
  ├── convert <prd-path>    # PRD markdown → prd.json
  ├── status
  ├── start                 # run the loop
  ├── pilot                 # alternates discovery + impl
  ├── tasks                 # list tasks
  ├── done <id>             # mark task complete
  ├── skip <id>             # mark task skipped
  ├── reset <id>            # back to pending
  ├── enqueue <title>       # add task with auto-assigned ID
  └── task                  # preserved nested namespace
      ├── add <id> <title>  # explicit-id (CI scripts)
      ├── list              # [DEPRECATED] hidden, redirects to "tasks"
      ├── complete          # [DEPRECATED] hidden, redirects to "done"
      ├── skip              # [DEPRECATED] hidden, redirects to "skip"
      └── reset             # [DEPRECATED] hidden, redirects to "reset"

samuel auto is a permanent top-level alias. samuel auto task complete <id> still works through v3.x via the deprecated wrapper.

Smart bare invocation (recap)

v1's samuel run with no args ([[../../wiki/concepts/smart-bare-invocation]]):

  • If .claude/auto/prd.json exists → show status (read-only).
  • If missing → print actionable help to stderr, exit non-zero. Never silently starts a loop.

Comment in v1's source explicitly calls out: "preventing the v2-era footgun where a stray samuel auto could kick off pilot mode in a directory the user wasn't expecting."

v2 preserves this. The bare-invocation pattern is right.

v1's CLI mutation commands

v1 already ships samuel run done, samuel run skip, samuel run reset, samuel run enqueue. They mutate prd.json directly. The intention was clear — give the agent (and humans) explicit subcommands rather than free-form JSON editing.

But v1's auto-mode prompt still tells the agent: "Update prd.json directly for status changes." The CLI mutation commands are available but the agent doesn't use them.

v2 closes the gap: the agent uses the mutation commands. The prompt changes accordingly. Storage encoding becomes Samuel-internal.

Options considered

Option A: samuel run [methodology] with default in config (chosen)

Positional argument for methodology. Default from samuel.toml. Subcommands unchanged from v1 (with done, skip, reset, enqueue becoming the agent's mutation interface, not just human CLI).

samuel run                        # default methodology (ralph)
samuel run ralph                  # explicit
samuel run rw                     # alias
samuel run tdd-strict             # methodology from a plugin
samuel run my-custom-methodology  # methodology from a custom plugin

samuel run init --prd ...         # works for any methodology
samuel run start                  # works for any methodology
samuel run pilot                  # ralph-only (or plugin-defined)
samuel run done 1.2               # mutation, methodology-agnostic

Pros: - Extends naturally for plugin methodologies (RFD 0004's hook framework hosts them). - Bare samuel run keeps v1's behavior — no muscle-memory break. - Methodology name is visible in the command for clarity in shell history and CI logs. - One CLI surface, methodology-agnostic for subcommands.

Cons: - Methodology argument is optional → some users may not realize it's there. - Plugin methodologies need to be installed before they're invokable.

Effort: Low. v1's surface is mostly there; add the positional parsing + default resolution.

Option B: Keep v1's samuel auto exactly

No change. v2 ships the same CLI surface v1's v3.0.0 introduced.

Pros: - Zero migration burden for v1 users. - Conservative.

Cons: - Doesn't expose methodology extensibility. Plugin methodologies need somewhere to live; "where does samuel run my-tdd-methodology go?" without the positional. - Misses the chance to align CLI with the RFD 0004 plugin architecture.

Effort: Zero. But misses the v2 opportunity.

Option C: samuel autosamuel run, no methodology arg

v1's rename, no further changes.

Pros: - Already done in v1.

Cons: - Same as Option B — no methodology extensibility.

Option D: samuel <methodology> — methodology is top-level

samuel ralph         # runs ralph methodology
samuel tdd-strict    # runs tdd-strict methodology

Pros: - Shortest command for the flagship use case.

Cons: - Top-level namespace pollution. samuel ralph and samuel install and samuel ls mix verbs (run) with methodology names. Confusing. - Methodology names conflict with potential future commands. - v1 muscle memory expects samuel run.

Option E: samuel run --methodology X

Flag instead of positional.

Pros: - Explicit.

Cons: - Flags are less ergonomic than positionals for primary arguments. - samuel run --methodology ralph init --prd ... reads awkwardly compared to samuel run ralph init --prd .... - Cargo, npm, etc. use positionals for subcommand-ish arguments.

Decision

Adopt Option A: samuel run [methodology] with default methodology from samuel.toml.

The decision rests on three judgments:

  1. v1's surface is right; we extend, not replace. v1's v3.0.0 rename to samuel run was correct. v2 inherits the surface and adds one positional. Bare invocation stays. Subcommands stay. Aliases stay.

  2. Methodology pluggability needs a CLI surface. RFD 0004's hook framework makes alternative methodologies possible. The CLI must expose them. A positional argument is the conventional path (kubectl, cargo, npm all use this shape).

  3. samuel auto is permanent. v1 committed to this. v2 honors it.

Implementation plan

Phase 1 — CLI positional + default resolution (PRD 0004, week 1)

internal/commands/run/:

var runCmd = &cobra.Command{
    Use:     "run [methodology]",
    Aliases: []string{"auto"},
    Short:   "Run a methodology (default: ralph)",
    Args:    cobra.MaximumNArgs(1),  // 0 or 1 args
    RunE:    runRunBare,             // handles bare + with-positional
}

Resolution order for methodology name:

  1. Positional argument if provided (samuel run ralphralph).
  2. Alias map (rwralph; future aliases like tddtdd-strict).
  3. samuel.toml default_methodology if no positional.
  4. Hardcoded fallback: ralph if samuel.toml doesn't specify.

The chosen methodology name is resolved against installed methodologies — built-in (ralph) or plugin-provided. Error if not found.

Phase 2 — bare invocation behavior (PRD 0004, week 1)

runRunBare matches v1's logic:

cwd, _ := os.Getwd()
methodology := resolveMethodology(cmd, args)
runtimeDir := filepath.Join(cwd, ".samuel", "run")
prdPath := filepath.Join(runtimeDir, "prd.toon")

if _, err := os.Stat(prdPath); err == nil {
    return runStatus(cmd, args)  // read-only status
}

fmt.Fprintln(os.Stderr, "samuel: no methodology runtime initialized in this directory.")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, "  Initialize one:    samuel run init")
fmt.Fprintln(os.Stderr, "  From a PRD:        samuel run init --prd <path>")
fmt.Fprintln(os.Stderr, "  Zero-setup pilot:  samuel run pilot")
fmt.Fprintln(os.Stderr, "")
fmt.Fprintln(os.Stderr, "See 'samuel run --help' for the full subcommand list.")
return errors.New("no methodology runtime initialized")

Exit non-zero on bare-invocation-without-prd, exit 0 with status output otherwise.

Phase 3 — subcommands inherit methodology (PRD 0004, week 1-2)

Every samuel run subcommand reads the resolved methodology from the parent:

samuel run ralph init --prd ...     # init for ralph methodology
samuel run tdd-strict start         # start tdd-strict methodology
samuel run pilot                    # ralph or default methodology
samuel run done 1.2                 # mutates prd.toon (methodology-agnostic at the data layer)

The runtime directory is methodology-aware: .samuel/run/ for single-methodology projects, .samuel/run/<methodology>/ for projects running multiple methodologies. Configurable.

Phase 4 — TOON runtime files (PRD 0004, week 2)

Per [[../../wiki/concepts/toon-evaluation]]:

  • prd.toon — task state (was prd.json)
  • task-context.toon — current task brief / summary table
  • project-snapshot.toon — file inventory, test gaps, TODO counts, git log
  • progress.md — markdown append-only log (unchanged)
  • progress-context.md — markdown summary (unchanged)

Every .toon file has a version header: # toon v3. Read path rejects unknown major versions. Per-row malformation recovery: skip + structured warning, loop continues.

Loop driver reloads prd.toon every iteration (agent may have mutated via CLI subcommands during the prior iteration's agent invocation).

Phase 5 — CLI mutation commands (PRD 0004, week 2)

Already in v1; v2 elevates them to canonical:

samuel run done <task-id> [--commit-sha SHA] [--iteration N]
samuel run skip <task-id> [--reason TEXT]
samuel run reset <task-id>
samuel run enqueue <title> [--priority] [--complexity] [--source]
samuel run task add <id> <title> [...]              # explicit-id for CI

All mutations:

  • Acquire the file lock briefly (flock on .samuel/run/lock).
  • Read prd.toon.
  • Mutate in memory.
  • Write atomically (write-tmp-then-rename).
  • Release lock.
  • Emit JSON via --json.

The atomic write protects against partial state if the user ^Cs mid-mutation.

Phase 6 — prompt rewrite for CLI mutation (PRD 0004, week 2)

The Ralph implementation prompt ([[../../wiki/entities/auto-prompts]]) gets rewritten:

- Update the task's status to "completed" in prd.json
- Record the commit SHA in the task's commit_sha field
- Update progress.total_tasks and progress.completed_tasks
+ Run: samuel run done <task-id> --commit-sha $(git rev-parse HEAD)
+ The CLI updates prd.toon atomically and records the commit SHA.

Discovery prompt similarly:

- Read prd.json, add new tasks with status "pending", write prd.json back
+ Add new tasks via: samuel run enqueue "<title>" --priority <level> --source pilot-discovery
+ For explicit IDs (uncommon — prefer enqueue): samuel run task add <id> "<title>"
+ The CLI updates prd.toon atomically.

The agent invokes these via Bash tool. Samuel's CLI is the only mutator. Encoding becomes a Samuel-internal concern.

Phase 7 — runtime directory mount (PRD 0003, week 3)

When the agent runs in an OCI sandbox:

  • .samuel/run/ mounted read-only to /workspace/.samuel/run/ inside the container.
  • The agent's Bash tool invokes samuel run done <id> via the bridge socket back to the host samuel process, which has write access.
  • This prevents the agent from accidentally writing prd.toon directly (it's read-only from the agent's view).

Defense in depth: the prompt says "use CLI subcommands" AND the filesystem makes direct edits impossible.

Phase 8 — pilot mode and discovery-only (PRD 0004, week 2-3)

samuel run pilot                  # alternates discovery + impl iterations
samuel run pilot --focus testing  # discovery focused on testing
samuel run --discover-only        # shortcut: pilot mode, discovery iterations only, no implementation
samuel run pilot --discover-interval 5    # configurable per [[../../wiki/concepts/pilot-mode]]

samuel.toml:

[methodology.ralph.pilot]
enabled = true
discover_interval = 5
max_discovery_tasks = 10
focus = "testing"                 # optional

Phase 9 — samuel auto alias (PRD 0004, week 1)

Cobra alias: Aliases: []string{"auto"}. Every subcommand inherits.

samuel auto start         # equivalent to samuel run start
samuel auto done 1.2      # equivalent to samuel run done 1.2
samuel auto ralph init    # equivalent to samuel run ralph init
samuel auto rw            # equivalent to samuel run rw

Permanent. Never deprecated. v1 promised this; v2 honors it.

Acceptance criteria

  • samuel run (bare, no prd) → actionable help to stderr, exit 1.
  • samuel run (bare, with prd) → status output, exit 0.
  • samuel run ralph and samuel run rw are equivalent.
  • samuel.toml default_methodology = "tdd-strict" → bare samuel run start runs tdd-strict.
  • samuel run with positional overrides config default.
  • samuel run unknown-methodology errors with "methodology not found" + suggests samuel ls --type methodology.
  • samuel run init --prd <path> produces prd.toon (not prd.json).
  • prd.toon has # toon v3 version header.
  • Malformed row in prd.toon is skipped with warning, loop continues on other tasks.
  • samuel run done 1.2 --commit-sha abc atomically mutates prd.toon.
  • samuel run done, skip, reset, enqueue, task add all emit --json.
  • In OCI sandbox, .samuel/run/ is mounted read-only; agent's direct edits fail.
  • Agent invokes mutation subcommands via Bash tool; mutations land in host's prd.toon.
  • samuel auto is permanent alias; every samuel run * works as samuel auto *.
  • samuel run pilot --focus testing runs discovery + impl iterations.
  • samuel run --discover-only runs discovery iterations only, no impl.
  • Bare invocation pattern works across all methodology selections.
  • No .claude/ paths written by run subcommands (agnostic invariant).

Compatibility and migration

  • v1's samuel auto: works in v2 as alias for samuel run.
  • v1's samuel run (added in v1's v3.0.0): works identically in v2 with the addition of methodology positional.
  • v1's samuel run task list/complete/skip/reset: dropped in v2 (clean break). v1 had them deprecated already; v2 just doesn't carry the deprecated wrappers.
  • v1's prd.json: replaced by prd.toon. samuel run init --prd <md-path> produces prd.toon. Migration script for existing prd.json files: samuel run migrate-encoding (deferred to v2.1+; rare use case for v1 users).
  • v1's agent prompt: v2's prompt is different (uses CLI mutations). User's project-level prompt overrides at .samuel/templates/ralph/prompt.md.tmpl survive but should be updated to match.

Risks

Risk Likelihood Mitigation
Agent ignores CLI-mutation prompt and tries to edit prd.toon directly High Mount .samuel/run/ as :ro in OCI sandbox — direct edits fail at the filesystem layer.
User confused by methodology positional being optional Low samuel run --help shows it prominently. samuel.toml default_methodology is auto-set during init.
Plugin methodologies with name conflicts Low Plugin loader rejects duplicate names. samuel ls --type methodology lists what's installed.
TOON encoder bug corrupts prd.toon mid-mutation Medium Atomic write (write-tmp-then-rename). Per-row recovery on read. Lock file prevents concurrent mutations.
samuel auto users confused that it's an alias, not the canonical name Low samuel auto --help says "(alias for samuel run)".
Methodology positional clashes with subcommand names Medium Methodology names follow plugin naming rules ([a-z0-9-]+); subcommand names follow Cobra (also [a-z-]+). Collision detection: reject any methodology name matching a built-in subcommand.
Multi-methodology projects (running ralph + tdd-strict in same project) get confusing runtime dirs Medium Default .samuel/run/ for single methodology; auto-namespaced .samuel/run/<methodology>/ if multiple installed.
Pre-computed context generators that wrote markdown now write TOON — agent prompt needs updating Medium Update the Ralph prompt template (Phase 6). Per-project overrides may go stale; doctor flags them.

Resolved decisions (2026-05-12)

  1. Methodology argument autocompletion: Cobra shell completion + dynamic from installed methodologies. Lands in Milestone 6 polish.

  2. Default methodology auto-detection: framework always ships ralph built-in, so samuel run always has at least one methodology available. If a future user uninstalls all methodology plugins (only possible if ralph becomes uninstallable too, which isn't planned), samuel run errors with helpful message. Acceptable edge case.

  3. samuel run aliasing inside samuel.toml: deferred to v2.x. Useful pattern ([methodology.aliases] my-flow = "ralph --focus testing") but not v2.0.

  4. CLI mutation timestamps: yes — completed_at field on tasks, preserved from v1's prd.json shape into v2's prd.toon. Plus created_at on enqueued tasks.

  5. samuel run done --commit-sha: optional but strongly suggested in prompt. CLI accepts without it (some "done" cases legitimately don't involve commits — skipped-but-marked-done, manual sign-off, etc.). The prompt template instructs the agent to always pass --commit-sha $(git rev-parse HEAD) when a commit was made.

  6. Discovery-only mode and pilot mode interaction: --discover-only implies pilot enabled with discover_interval = 1 (every iteration is discovery) and zero implementation iterations.

Outcome

To be filled in post-v2.0 launch. Expected outcomes:

  • v1 users moving to v2 don't notice the encoding swap because the CLI surface is identical for the common case.
  • The CLI-mutation pattern catches the agent emitting bad output in production — measurable reduction in "manual prd repair" support reports.
  • 1-2 plugin methodologies appear within six months (likely tdd-strict or bdd or monorepo-coordinator).
  • Per-iteration token cost measurably lower than v1 due to TOON encoding (target: 15-25% reduction in agent context).
  • [[0001|RFD 0001]] — plugin tiers (methodology plugins are WASM-tier)
  • [[0003|RFD 0003]] — manifest schema (provides.methodology = [...])
  • [[0004|RFD 0004]] — hook framework (methodology = hook registrations)
  • [[0005|RFD 0005]] — plugin loader (resolves methodology plugin at install)
  • [[../../wiki/synthesis/auto-mode-v2-design]] — wiki synthesis this RFD ports
  • [[../../wiki/concepts/ralph-wiggum-methodology]] — Geoffrey Huntley's pattern Samuel ships as default
  • [[../../wiki/concepts/pre-computed-context]] — v1 innovation preserved in default hook handlers
  • [[../../wiki/concepts/pilot-mode]] — pilot mode design
  • [[../../wiki/concepts/smart-bare-invocation]] — bare-samuel run UX pattern
  • [[../../wiki/concepts/toon-evaluation]] — TOON encoding decision
  • [[../../wiki/concepts/prompt-template-variables]] — PromptContext variables consumed by methodology prompts
  • [[../../wiki/entities/auto-loop]] — v1 reference implementation
  • [[../../wiki/entities/auto-prompts]] — v1 prompt templates (rewritten for CLI-mutation)
  • PRD 0004 (Methodology) — implements this RFD