Verification gate: diff self-review #24

Open
opened 2026-04-07 20:24:25 +00:00 by austin · 0 comments
Owner

Implement model-based semantic review of generated diffs.

  • After a candidate passes parse + lint + tests, show the model its own diff alongside the original task
  • Prompt: "Does this change correctly address: {task}? Answer YES or NO with a one-line reason."
  • Use grammar-constrained decoding (output must be YES or NO + a short string)
  • A "NO" response fails the candidate and the reason is included in retry feedback
  • Low temperature (0.1) for this call — we want conservative judgment

This is a cheap second opinion that catches semantic errors: code that parses and passes tests but doesn't actually do what was asked. Surprisingly effective even with 3B models because the task is narrowly scoped (judge, don't create).

Implement model-based semantic review of generated diffs. - After a candidate passes parse + lint + tests, show the model its own diff alongside the original task - Prompt: "Does this change correctly address: {task}? Answer YES or NO with a one-line reason." - Use grammar-constrained decoding (output must be YES or NO + a short string) - A "NO" response fails the candidate and the reason is included in retry feedback - Low temperature (0.1) for this call — we want conservative judgment This is a cheap second opinion that catches semantic errors: code that parses and passes tests but doesn't actually do what was asked. Surprisingly effective even with 3B models because the task is narrowly scoped (judge, don't create).
Sign in to join this conversation.
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
austin/localcode#24
No description provided.