Administrator
Published on 2026-03-28 / 4 Visits
0
0

What is an AI Agent? OpenCode's Multi-Agent System Explained

This article explains how AI Agents work and the multi-agent collaboration mechanism through the real-world case of OpenCode/Oh My OpenCode.


1. Plain English: What is an AI Agent?

Traditional AI vs AI Agent

Traditional AI (e.g., ChatGPT): - You ask: "Help me analyze this project's code quality" - AI says: "Please send me the code" - You copy and paste the code - AI gives you the analysis results

AI Agent: - You say: "Help me analyze this project's code quality" - The Agent goes and: 1. Reads project files 2. Runs code inspection tools 3. Checks test coverage 4. Generates a report - Gives you the result directly

Core difference: Traditional AI is "customer service" — you ask, it answers. An Agent is an "assistant" — you state your needs and it gets things done on its own.

Three Key Capabilities of an Agent

  1. Tool usage: Can read files, run commands, search the web, call APIs
  2. Memory: Remembers what was done before, avoids redundant work
  3. Planning: Knows how to break large tasks into smaller steps and complete them one by one

2. Why Do We Need Multiple Agents?

The Problem with Single Agent: Context Competition

Imagine asking one person to do all of the following simultaneously: - Architecture design (requires macro-level thinking) - Writing code (requires attention to detail) - Debugging bugs (requires state tracking) - Writing documentation (requires clear expression)

This person will: - Cognitive overload: Planning and execution details compete for attention in the same brain - Easily confused: While fixing Bug A, they reintroduce Bug A while working on Bug B - Memory loss: After several rounds of debugging, they forget the original planning goal

This is not a capability issue, it's an information architecture problem — mixing things that shouldn't be mixed together.

Core Value of Multi-Agent: Information Domain Isolation

The leverage of multi-agent comes from information domain isolation, not from imitating corporate organizational structures.

Key insight (from axiom T03): - Isolation is not for division of labor, but to enable each Agent to make better decisions in a clean information environment - Planner focuses on global decisions without being overwhelmed by execution details - Executor focuses on low-level implementation without being distracted by planning discussions

Analogy: - Single Agent = One person being both boss and employee, thinking about strategy and details simultaneously - Multi-Agent = Boss focuses on strategy, employee focuses on execution, coordinated through shared documents


3. OpenCode's Agent Team

Team Structure

OpenCode has a "Project Manager" (Sisyphus) managing 11 "specialists".

Project Manager: Sisyphus

Why this name: In Greek mythology, Sisyphus pushes a boulder up a mountain every day, only for it to roll back down, and he starts again the next day. This symbolizes the AI handling repetitive tasks every day, never stopping.

Core responsibilities: - Receive your requirements, understand intent - Decision-making: Determine which specialist Agents are needed - Assign tasks to specialist Agents - Aggregate results, verify quality

Decision logic: How does Sisyphus know who to dispatch?

User Request
    ↓
Analyze Task Type
    ↓
┌─────────────────────────────────────┐
│ Need to search code?                │
│   → Explore                         │
│                                     │
│ Need external information?          │
│   → Librarian                       │
│                                     │
│ Need architecture advice?           │
│   → Oracle                          │
│                                     │
│ Need to write code?                 │
│   → Select Category by complexity   │
└─────────────────────────────────────┘
    ↓
Evaluate if tasks can run in parallel
    ↓
Dispatch Agents (possibly multiple at once)
    ↓
Wait for results → Aggregate → Verify

Practical example: You say "Fix the login bug"

Sisyphus's thinking process: 1. This is a fix task, not a new feature 2. Involves authentication, may span multiple files 3. Need to understand existing code first → Dispatch Explore 4. May need to check common issues → Dispatch Librarian 5. The two tasks are independent, can run in parallel 6. After results come back, decide on the fix plan 7. The fix is a simple config change → Use quick category


11 Specialist Agents

According to Oh My OpenCode's official documentation, they are divided into 4 categories:

1. Communication & Coordination

  • Metis (Pre-Analyst)
  • Responsibility: Identify hidden pitfalls before a task begins
  • When to use: Requirements are vague or may have multiple interpretations
  • Recommended model: Claude Opus 4.6 (requires deep reasoning)
  • Example: You say "Add user authentication", Metis will ask:

    • Need database migration?
    • JWT or Session?
    • Is there an existing authentication pattern in the code?
  • Momus (Quality Reviewer)

  • Responsibility: Check if work plans are feasible
  • When to use: Complex tasks, let it review after plan is made
  • Recommended model: Claude Opus 4.6 (requires critical thinking)
  • Example: Sisyphus made a 5-step plan, Momus checks:
    • Are steps complete?
    • Any missing dependencies?
    • Are verification criteria clear?

2. Exploration & Research

  • Explore (Codebase Search Expert)
  • Responsibility: Find files and code patterns in your project
  • When to use: Unfamiliar with codebase, need to locate relevant files
  • Recommended model: MiniMax-M2.1 (lightweight and fast, frequently used)
  • Example: Find all authentication-related code
  • Cannot do: Cannot search external sources (only searches local project)

  • Librarian (External Resource Retrieval Expert)

  • Responsibility: Search the web for documentation, GitHub examples, best practices
  • When to use: Need external knowledge (official docs, open source examples)
  • Recommended model: MiniMax-M2.5 (medium model, balances performance and cost)
  • Example: Look up JWT authentication security best practices
  • Cannot do: Cannot search local code (only searches external sources)

3. Advisory

  • Oracle (Architecture Consultant)
  • Responsibility: Gives advice but does not modify code (read-only)
  • When to use: Need architecture advice, debugging complex issues, failed 3 times consecutively
  • Recommended model: Claude Opus 4.6 (strongest reasoning capability)
  • Example: Design distributed lock方案, analyze performance bottlenecks
  • Characteristic: High cost but good quality, used only at critical moments

4. Execution (Categorized by Task Type)

These are not specific Agent names, but execution modes automatically selected based on task type:

Category Chinese Name Use Case Recommended Model Why
visual-engineering Visual Engineer Frontend, UI, styles, animations MiniMax-M2.5 Strong visual understanding
ultrabrain Ultra Brain Complex logic, architecture design Claude Opus 4.6 Strongest reasoning
deep Deep Thinker Tasks requiring deep analysis MiniMax-M2.7 Balances performance and cost
artistry Artist Creativity, brainstorming Gemini 3 Pro Unconventional thinking
quick Quick Hand Simple edits, typos MiniMax-M2.1 Fast and cheap
writing Writer Documentation, reports MiniMax-M2.5 Optimized for text generation

Key design: Same task framework, automatically switches to the most suitable model based on type.

Why this configuration?

  • Oracle uses the strongest model: Architecture advice requires the strongest reasoning, high cost but used only at critical moments
  • Librarian uses a medium model: Searching external resources requires intent understanding, used frequently
  • Explore uses a lightweight model: Searching local code only needs pattern matching, used very frequently
  • Ultrabrain uses the strongest model: Complex logic requires deep reasoning, high task quality requirements
  • Quick uses a lightweight model: Simple edits prioritize speed, low cost

Design principle: Choose the most cost-effective model based on task difficulty and frequency.


4. Real Case: How Do Agents Collaborate?

Case 1: Fix Login Bug

Your requirement: "There's a bug in the login feature, fix it"

Workflow:

Step 1: Dispatch Explorer Agents in Parallel
┌─────────────────────────────────────────────────┐
│ Sisyphus dispatches two Agents simultaneously   │
│                                                 │
│ ┌─────────────────┐    ┌──────────────────┐   │
│ │ Explore         │    │ Librarian        │   │
│ │ Searches local  │    │ Searches external│   │
│ │ code            │    │ resources        │   │
│ └─────────────────┘    └──────────────────┘   │
│         ↓                      ↓               │
│   Find auth-related   Check JWT common issues   │
│   code                                           │
└─────────────────────────────────────────────────┘

Step 2: Wait for Results
┌─────────────────────────────────────────────────┐
│ Explore reports:                                 │
│   Found login.ts, auth.ts, token.ts            │
│                                                 │
│ Librarian reports:                               │
│   Common issue is token expiration time        │
│   configuration error                           │
└─────────────────────────────────────────────────┘

Step 3: Dispatch Execution Agent to Fix
┌─────────────────────────────────────────────────┐
│ Sisyphus decides:                               │
│   This is a simple config modification          │
│   → Dispatch Quick to execute                   │
│   → Fix expiration time in token.ts            │
└─────────────────────────────────────────────────┘

Step 4: Verify
┌─────────────────────────────────────────────────┐
│ - Run code inspection tools                     │
│ - Confirm no new errors                         │
│ - Report completion                             │
└─────────────────────────────────────────────────┘

Key point: Explore and Librarian truly run simultaneously, not one after another.


Case 2: Write a Technical Research Report

Your requirement: "Research best practices for React Server Components"

Sisyphus's strategy:

Dispatch 3 Librarians, each responsible for a different angle, but with 30-50% overlap (for cross-validation):

Parallel Research (3 Librarians running simultaneously)
┌─────────────────────────────────────────────────┐
│ Agent 1: Official docs + Community discussions  │
│ Agent 2: Community discussions + Production     │  ← Overlap: Community discussions
│         cases                                    │
│ Agent 3: Production cases + Comparative         │  ← Overlap: Production cases
│         analysis                                 │
└─────────────────────────────────────────────────┘
         ↓
Cross-validate overlapping areas
         ↓
┌─────────────────────────────────────────────────┐
│ If Agent 2 and Agent 3 agree on "Production    │
│ cases" information                              │
│   → High credibility                            │
│                                                 │
│ If they disagree                                │
│   → Sisyphus further verifies                  │
└─────────────────────────────────────────────────┘
         ↓
Aggregate and generate comprehensive report

Why overlap?

  • Agent 2 and Agent 3 both look at "production cases"
  • If they find consistent information → High credibility
  • If inconsistent → Sisyphus further verifies

Result: 3 Agents run simultaneously, 3x faster than one Agent running serially, and the information is more comprehensive.


5. Key Design Mechanisms

1. Task Routing: Category System

Sisyphus automatically selects the most suitable model based on task type:

Task Type Determination
    ↓
┌─────────────────────────────────────────┐
│ Frontend, UI, styles?                   │
│   → visual-engineering                  │
│   → Use MiniMax-M2.5 (strong visual     │
│     understanding)                      │
│                                         │
│ Complex logic, architecture design?     │
│   → ultrabrain                          │
│   → Use Claude Opus 4.6 (strongest      │
│     reasoning)                          │
│                                         │
│ Simple edits, typos?                    │
│   → quick                               │
│   → Use MiniMax-M2.1 (fast and cheap)   │
└─────────────────────────────────────────┘

Benefit: Same framework, automatically switches to the most suitable model based on task.


2. Session Reuse: Avoid Redundant Work

If an Agent fails the first time, you can continue the conversation without starting over:

First Attempt
    ↓
Agent executes task
    ↓
Returns session_id (e.g., "ses_abc123")
    ↓
Failed?
    ↓
Continue the same session
    ↓
Agent remembers:
  - Which files were read
  - Which approaches were tried
  - Which problems were encountered
    ↓
Saves 70% of redundant work

Value: Agent retains complete context, no need to re-explore.


3. The 6 Elements of Delegation Prompt

When Sisyphus dispatches a task, it must clearly state 6 things:

  1. TASK: What specifically to do
  2. EXPECTED OUTCOME: What counts as success
  3. REQUIRED TOOLS: What tools can be used
  4. MUST DO: Things that must be done
  5. MUST NOT DO: Things that are prohibited
  6. CONTEXT: Relevant files, existing patterns, constraints

Why so strict? (from axiom A08)

Clear prompt quality is the decisive factor in whether AI can correctly understand intent. Vague prompts cause Agents to guess your intent in a huge search space, with a high probability of failure.


4. Failure Recovery: The 3-Strike Rule

If an Agent fails 3 times consecutively:

3 failures
    ↓
Immediately stop all edits
    ↓
Rollback to the last working version
    ↓
Consult Oracle (architecture consultant)
    ↓
Oracle can't solve it either?
    ↓
Ask the user

Why: Prevent Agents from trial-and-error, wasting time and money.


6. Comparison with Other Frameworks

OpenCode vs Traditional Frameworks

Comparison OpenCode LangChain AutoGPT
Architecture Multi-agent division of labor Single agent + tools Single agent + loop
Parallel capability Native support Need to write yourself Not supported
Model selection Auto-switch based on task Fixed one model Fixed one model
Specialization 11 specialist Agents General-purpose Agent General-purpose Agent

Core difference (from axiom T03):

OpenCode's multi-agent value comes from information domain isolation: - Traditional frameworks = One generalist, planning and execution compete in the same context - OpenCode = Professional team, each Agent makes decisions in a clean information environment

Analogy: - Traditional frameworks = One generalist - OpenCode = A professional team

For simple tasks, a generalist may be faster (no coordination cost). For complex tasks, a professional team is significantly stronger.

Cost and Performance

Based on community data: - Request count: Oh My OpenCode is 3x a regular version (96 vs 27) - Time: 10 minutes more (55 vs 45 minutes) - Success rate: Slightly lower by 4% (69% vs 73%)

However: - Oh My OpenCode handles more complex tasks - Includes more verification and quality checks - Provides more detailed intermediate results

Selection advice: - Simple tasks (fix a typo) → Use regular version - Complex tasks (multi-module refactoring) → Use Oh My OpenCode - Cost-sensitive → Control parallelism


7. Summary

Core Points

What is an AI Agent: - Not just answering questions, can complete tasks on its own - Can use tools, has memory, can plan

OpenCode's innovation: - Multi-agent division of labor, information domain isolation - Automatic model selection based on task type - True parallel execution - Session reuse to avoid redundant work

Key principles: - Delegate if you can do it yourself - Parallelize if you can, don't serialize - Every operation has verification - Stop immediately after 3 failures

Applicable Scenarios

OpenCode excels at: - Complex multi-module tasks - Requiring deep research - Exploring unfamiliar codebases

Not good at: - Simple single-file edits - Highly serial tasks - Extreme cost control


References


Closing note: This article is based on practical experience using OpenCode/Oh My OpenCode, combined with guidance from the axiom system (T03 Context Isolation, A08 Prompt Quality, M05 Simplicity). The system is still rapidly iterating, and details may change with version updates.


Comment