Verification Loop: Checkpoint & Evals
What You'll Learn
After learning the verification loop mechanism, you can:
- Use
/checkpointto save and restore working state - Use
/verifyto run comprehensive code quality checks - Master Eval-Driven Development (EDD) concept, define and run evals
- Establish continuous verification loop, maintain code quality during development
Your Current Challenges
You just completed a feature but不敢 submit a PR because:
- Unsure if existing functionality was broken
- Worried about declining test coverage
- Forgot the original goal, unsure if you've deviated from direction
- Want to rollback to a stable state but have no record
If there were a mechanism to "take a snapshot" at key moments and continuously verify during development, these problems would be solved.
When to Use This Workflow
- Before starting new features: Create checkpoint to record starting state
- After completing milestones: Save progress for easy rollback and comparison
- Before submitting PR: Run comprehensive verification to ensure code quality
- During refactoring: Frequent verification to avoid breaking existing functionality
- During team collaboration: Share checkpoints to sync working state
🎒 Prerequisites
Prerequisites
This tutorial assumes you have:
- ✅ Completed TDD Development Workflow learning
- ✅ Familiar with basic Git operations
- ✅ Understand how to use Everything Claude Code's basic commands
Core Concept
Verification loop is a quality assurance mechanism that turns "write code → test → verify" cycle into a systematic process.
Three-Layer Verification System
Everything Claude Code provides three layers of verification:
| Layer | Mechanism | Purpose | When to Use |
|---|---|---|---|
| Real-time verification | PostToolUse Hooks | Capture type errors, console.logs, etc. immediately | After every tool call |
| Periodic verification | /verify command | Comprehensive check: build, types, tests, security | Every 15 minutes or after major changes |
| Milestone verification | /checkpoint | Compare state differences, track quality trends | After milestones, before PR submission |
Checkpoint: Code's "Save Point"
Checkpoint "takes a snapshot" at key moments:
- Records Git SHA
- Records test pass rate
- Records code coverage
- Records timestamp
When verifying, you can compare the difference between current state and any checkpoint.
Evals: AI Development's "Unit Tests"
Eval-Driven Development (EDD) treats evals as unit tests for AI development:
- Define success criteria first (write evals)
- Write code next (implement features)
- Run evals continuously (verify correctness)
- Track regressions (ensure existing functionality not broken)
This aligns with TDD (Test-Driven Development) philosophy, but for AI-assisted development.
Follow Along: Using Checkpoint
Step 1: Create Initial Checkpoint
Before starting new features, create checkpoint first:
/checkpoint create "feature-start"Why Record starting state for later comparison.
What you should see:
VERIFICATION: Running quick checks...
Build: OK
Types: OK
CHECKPOINT CREATED: feature-start
Time: 2026-01-25-14:30
Git SHA: abc1234
Logged to: .claude/checkpoints.logCheckpoint will:
- First run
/verify quick(only check build and types) - Create git stash or commit (named:
checkpoint-feature-start) - Record to
.claude/checkpoints.log
Step 2: Implement Core Functionality
Start writing code to complete core logic.
Step 3: Create Milestone Checkpoint
After completing core functionality:
/checkpoint create "core-done"Why Record milestone for easy rollback.
What you should see:
CHECKPOINT CREATED: core-done
Time: 2026-01-25-16:45
Git SHA: def5678
Logged to: .claude/checkpoints.logStep 4: Verify and Compare
Verify if current state has deviated from goal:
/checkpoint verify "feature-start"Why Compare quality metric changes from start to now.
What you should see:
CHECKPOINT COMPARISON: feature-start
=====================================
Files changed: 12
Tests: +25 passed / -0 failed
Coverage: +15% / -2% (from 60% to 75%)
Build: PASS
Status: ✅ Quality improvedStep 5: View All Checkpoints
View checkpoint history:
/checkpoint listWhat you should see:
CHECKPOINTS HISTORY
===================
Name | Time | Git SHA | Status
---------------|------------------|----------|--------
feature-start | 2026-01-25-14:30 | abc1234 | behind
core-done | 2026-01-25-16:45 | def5678 | currentCheckpoint ✅: Verify understanding
- Does checkpoint automatically run
/verify quick? ✅ Yes - Which file records checkpoints? ✅
.claude/checkpoints.log - What metrics does
/checkpoint verifycompare? ✅ File changes, test pass rate, coverage
Follow Along: Using Verify Command
Step 1: Run Quick Verification
During development, run quick verification frequently:
/verify quickWhy Only check build and types, fastest speed.
What you should see:
VERIFICATION: PASS
Build: OK
Types: OK
Ready for next task: YESStep 2: Run Full Verification
Before submitting PR, run complete check:
/verify fullWhy Comprehensive check of all quality gates.
What you should see:
VERIFICATION: PASS
Build: OK
Types: OK
Lint: OK (2 warnings)
Tests: 120/125 passed, 76% coverage
Secrets: OK
Logs: 3 console.logs found in src/
Ready for PR: NO
Issues to Fix:
1. Remove console.log statements before PR
Found in: src/utils/logger.ts:15, src/api/client.ts:23, src/ui/button.ts:8
2. Increase test coverage from 76% to 80% (target)
Missing coverage in: src/components/Form.tsxStep 3: Run Pre-PR Verification
Strictest check, includes security scan:
/verify pre-prWhat you should see:
VERIFICATION: FAIL
Build: OK
Types: OK (1 error)
Lint: OK
Tests: 120/125 passed, 76% coverage
Secrets: ❌ FOUND (2 API keys)
Logs: 3 console.logs
Security Issues Found:
1. Hardcoded API key in src/api/config.ts:10
2. Secret key in .env.example
Ready for PR: NO
Critical Issues:
1. Remove hardcoded secrets
2. Fix type error in src/components/Form.tsx:45
3. Remove console.logs
4. Increase coverage to 80%Step 4: Re-verify After Fixing Issues
After fixing issues, run verification again:
/verify fullWhat you should see:
VERIFICATION: PASS
Build: OK
Types: OK
Lint: OK
Tests: 125/125 passed, 81% coverage
Secrets: OK
Logs: OK
Ready for PR: YESCheckpoint ✅: Verify understanding
- What does
/verify quickonly check? ✅ Build and types - What items does
/verify fullcheck? ✅ Build, types, Lint, tests, Secrets, Console.log, Git status - Which verification mode includes security scan? ✅
pre-pr
Follow Along: Using Evals (Eval-Driven Development)
Step 1: Define Evals (Before Writing Code)
Define success criteria before starting to code:
## EVAL: user-authentication
### Capability Evals
- [ ] User can register with email/password
- [ ] User can login with valid credentials
- [ ] Invalid credentials rejected with proper error
- [ ] Sessions persist across page reloads
- [ ] Logout clears session
### Regression Evals
- [ ] Public routes still accessible
- [ ] API responses unchanged
- [ ] Database schema compatible
### Success Metrics
- pass@3 > 90% for capability evals
- pass^3 = 100% for regression evalsWhy Define success criteria first, force thinking about "what does completion look like".
Save to: .claude/evals/user-authentication.md
Step 2: Implement Functionality
Write code based on evals.
Step 3: Run Capability Evals
Test if new features meet evals:
[CERTAIN CAPABILITY EVAL: user-authentication]
Test 1: User can register with email/password
Task: Call registration API with valid credentials
Expected: User account created, token returned
Actual: PASS
Test 2: User can login with valid credentials
Task: Call login API with registered credentials
Expected: JWT token returned
Actual: PASS
Test 3: Invalid credentials rejected
Task: Call login API with wrong password
Expected: 401 Unauthorized with error message
Actual: PASS
Overall: 3/3 passedStep 4: Run Regression Evals
Ensure existing functionality not broken:
npm test -- --testPathPattern="existing"What you should see:
PASS existing/auth.test.ts
PASS existing/api.test.ts
PASS existing/db.test.ts
All regression tests: 15/15 passedStep 5: Generate Eval Report
Summarize results:
EVAL REPORT: user-authentication
=================================
Capability Evals:
register-user: PASS (pass@1)
login-user: PASS (pass@2)
reject-invalid: PASS (pass@1)
session-persistence: PASS (pass@1)
logout-clears: PASS (pass@1)
Overall: 5/5 passed
Regression Evals:
public-routes: PASS
api-responses: PASS
db-schema: PASS
Overall: 3/3 passed
Metrics:
pass@1: 80% (4/5)
pass@3: 100% (5/5)
pass^3: 100% (3/3)
Status: READY FOR REVIEWCheckpoint ✅: Verify understanding
- When should evals be defined? ✅ Before writing code
- What's the difference between capability evals and regression evals? ✅ Former tests new features, latter ensures existing functionality not broken
- What's the meaning of pass@3? ✅ Probability of success within 3 attempts
Common Pitfalls
Pitfall 1: Forgetting to Create Checkpoint
Problem: After developing for a while, want to rollback to a state but have no record.
Solution: Create checkpoint before starting each new feature:
# Good practice: before new feature starts
/checkpoint create "feature-name-start"
# Good practice: every milestone
/checkpoint create "phase-1-done"
/checkpoint create "phase-2-done"Pitfall 2: Evals Defined Too Vaguely
Problem: Evals written too vaguely, cannot verify.
Wrong example:
- [ ] User can loginCorrect example:
- [ ] User can login with valid credentials
Task: POST /api/login with email="[email protected]", password="Test123!"
Expected: HTTP 200 with JWT token in response body
Actual: ___________Pitfall 3: Only Running Verification Before Submitting PR
Problem: Discovering issues only before PR, high cost to fix.
Solution: Establish continuous verification habit:
Every 15 minutes: /verify quick
After each feature: /checkpoint create "milestone"
Before PR: /verify pre-prPitfall 4: Evals Not Updated
Problem: After requirements change, evals are still old, causing verification to fail.
Solution: Evals are "first-class code", update synchronously when requirements change:
# Requirements change → update Evals → update code
1. Modify .claude/evals/feature-name.md
2. Modify code based on new evals
3. Re-run evalsSummary
Verification loop is a systematic approach to maintaining code quality:
| Mechanism | Purpose | Frequency |
|---|---|---|
| PostToolUse Hooks | Capture errors in real-time | Every tool call |
/verify | Periodic comprehensive check | Every 15 minutes |
/checkpoint | Milestone recording and comparison | Each feature phase |
| Evals | Feature verification and regression testing | Each new feature |
Core principles:
- Define first, implement second (Evals)
- Verify frequently, improve continuously (
/verify) - Record milestones, facilitate rollback (
/checkpoint)
Coming Up Next
In the next lesson, we'll learn Custom Rules: Building Project-Specific Standards.
You'll learn:
- How to create custom Rules files
- Rule file format and checklist writing
- Define project-specific security rules
- Integrate team standards into code review workflow
Appendix: Source Code Reference
Click to expand source code locations
Last updated: 2026-01-25
| Feature | File Path | Lines |
|---|---|---|
| Checkpoint command definition | commands/checkpoint.md | 1-75 |
| Verify command definition | commands/verify.md | 1-60 |
| Verification Loop Skill | skills/verification-loop/SKILL.md | 1-121 |
| Eval Harness Skill | skills/eval-harness/SKILL.md | 1-222 |
Key Flows:
- Checkpoint creation flow: Run
/verify quickfirst → Create git stash/commit → Record to.claude/checkpoints.log - Verify verification flow: Build Check → Type Check → Lint Check → Test Suite → Console.log Audit → Git Status
- Eval workflow: Define (define evals) → Implement (write code) → Evaluate (run evals) → Report (generate report)
Key Parameters:
/checkpoint [create|verify|list] [name]- Checkpoint operations/verify [quick|full|pre-commit|pre-pr]- Verification modes- pass@3 - Target of success within 3 attempts (>90%)
- pass^3 - 3 consecutive successes (100%, used for critical paths)