Pipeline & Verification
The core loop. Wire everything together into the FSM that drives task execution, and build the verification gates that make 3B output reliable.
Scope:
- Pipeline FSM with states: UNDERSTAND, LOCATE, PLAN, GENERATE, VERIFY, COMPLETE
- State transitions driven by the harness, not the model
- UNDERSTAND: classify user intent via grammar-constrained model call (edit / explain / fix / generate)
- LOCATE: use code intelligence to find relevant symbols, model narrows via constrained ranking prompt
- GENERATE: build context (milestone 3), run inference (milestone 1) with N candidates
- VERIFY gate: tree-sitter parse check, ruff lint, pytest for relevant tests, diff self-review via a second model call
- Best-of-N selection: first candidate that clears all verification gates wins
- Retry logic: on failure, inject structured error feedback into prompt and retry (max 3 attempts)
- Diff generation and application
Demo: end-to-end "fix the bug in parse_config" — the FSM classifies intent, locates the function, generates 5 candidates, verifies each, picks the winner, and presents a diff. All automatic.
No due date
0% Completed