Pipeline & Verification

The core loop. Wire everything together into the FSM that drives task execution, and build the verification gates that make 3B output reliable.

Scope:

  • Pipeline FSM with states: UNDERSTAND, LOCATE, PLAN, GENERATE, VERIFY, COMPLETE
  • State transitions driven by the harness, not the model
  • UNDERSTAND: classify user intent via grammar-constrained model call (edit / explain / fix / generate)
  • LOCATE: use code intelligence to find relevant symbols, model narrows via constrained ranking prompt
  • GENERATE: build context (milestone 3), run inference (milestone 1) with N candidates
  • VERIFY gate: tree-sitter parse check, ruff lint, pytest for relevant tests, diff self-review via a second model call
  • Best-of-N selection: first candidate that clears all verification gates wins
  • Retry logic: on failure, inject structured error feedback into prompt and retry (max 3 attempts)
  • Diff generation and application

Demo: end-to-end "fix the bug in parse_config" — the FSM classifies intent, locates the function, generates 5 candidates, verifies each, picks the winner, and presents a diff. All automatic.

No due date
0% Completed