Skip to content

Durasi: 15 menit | Block: 5


🧠 Pattern Overview

         ┌─────────────────────────────┐
         │                             │
         ▼                             │
    GENERATOR                         │
    "Buat draft"                      │
         │                             │
         ▼                             │
    EVALUATOR                        │
    "Apakah ini bagus?"               │
         │                             │
    ┌────┴────┐                       │
    │         │                       │
  GOOD     NEEDS WORK ───────────────┘


  FINAL OUTPUT

Agent generate → evaluate sendiri → improve → deliver. Quality loop built-in.


🔄 How It Works in Hermes

Step 1: Generator

Hermes menerima task → generate initial output
Contoh: "Draft email follow-up ke customer Budi"
→ Initial draft generated

Step 2: Evaluator

Hermes evaluate output sendiri:
✓ Apakah tone sesuai SOUL.md?
✓ Apakah info lengkap?
✓ Apakah ada typo/grammar issue?
✓ Apakah sesuai guidelines?
→ Score: 7/10, perlu improvement

Step 3: Optimizer

Hermes improve berdasarkan evaluasi:
"Perbaikan: 
- Tone terlalu formal, perlu lebih casual
- Missing: opsi follow-up call
- Tambah: urgency di akhir email"
→ Improved draft

Step 4: Final Evaluation

Re-evaluate improved draft:
✓ Tone: sesuai ✓ Info: lengkap ✓ Grammar: OK ✓ Guidelines: OK
→ Score: 9/10 → APPROVED

🛠️ Implementation di Hermes

Via Prompt Engineering

User: "Buat email follow-up ke customer Budi 
yang inquiry tentang produk X 3 hari lalu.

Setelah selesai, evaluasi draft kamu sendiri:
1. Apakah tone sesuai brand voice kita?
2. Apakah ada call-to-action yang jelas?
3. Apakah ada yang bisa diperbaiki?

Kalau ada yang perlu diperbaiki, perbaiki dan 
berikan versi final."

Via AGENTS.md Config

markdown
# AGENTS.md

## Quality Rules
Untuk setiap output yang di-generate:
1. Selalu self-evaluate sebelum deliver
2. Check terhadap SOUL.md guidelines
3. Kalau score < 8/10, improve dan re-evaluate
4. Maksimal 3 improvement cycles
5. Jika masih < 8/10 setelah 3 cycles, 
   flag untuk human review

## Evaluation Criteria
- Tone consistency: sesuai SOUL.md?
- Completeness: info lengkap?
- Accuracy: data yang dipakai benar?
- Actionability: ada next step yang jelas?

📊 Use Cases

Email Drafting with Quality Loop

Task: Draft sales email
Generator → Draft email
Evaluator → Check: subject line menarik? CTA jelas? Personalized?
Optimizer → Improve weak points
Output → Email yang refined

Content Creation

Task: Write social media post
Generator → Draft post + caption
Evaluator → Check: on-brand? engaging? right length?
Optimizer → Adjust tone, add hooks, shorten if needed
Output → Post yang ready-to-publish

Research Report

Task: Compile competitor analysis
Generator → Initial report draft
Evaluator → Check: data accurate? gaps identified? actionable?
Optimizer → Add missing data, improve structure
Output → Comprehensive report

⚠️ Anti-Patterns

  1. Infinite loop — tanpa max iterations, agent bisa loop selamanya
    • Fix: Set max 3 cycles
  2. Over-optimization — terus improve sampai over-engineered
    • Fix: Set threshold "good enough" (8/10)
  3. Evaluating without criteria — evaluator tanpa rubrik = random
    • Fix: Define evaluation criteria di AGENTS.md
  4. Skip evaluation — langsung deliver tanpa check
    • Fix: Make evaluation mandatory di config

⚡ Hands-On: Build Quality Loop

Task

Setup evaluator-optimizer untuk salah satu use case Hermes kamu.

Steps

  1. [ ] Define task (email/content/report)
  2. [ ] Write evaluation criteria (3-5 checklist items)
  3. [ ] Add ke AGENTS.md atau prompt
  4. [ ] Test: generate → evaluate → improve → final
  5. [ ] Compare: before vs after quality loop

Expected Result

Initial output: "Halo, follow up inquiry kamu..."
[Evaluation: 6/10 - too generic, no CTA]

Improved output: "Halo Kak Budi! 😊 Masih tertarik 
dengan Kopi Arabica Premium kami? Stock terbatas 
nih, mau saya pesankan? Terima kasih! 🙏"
[Evaluation: 9/10 - on-brand, personalized, clear CTA]
→ DELIVERED

Bootcamp AI Automation — akala.id