What is an AI Agent? OpenCode's Multi-Agent System Explained

This article explains how AI Agents work and the multi-agent collaboration mechanism through the real-world case of OpenCode/Oh My OpenCode.

1. Plain English: What is an AI Agent?

Traditional AI vs AI Agent

Traditional AI (e.g., ChatGPT): - You ask: "Help me analyze this project's code quality" - AI says: "Please send me the code" - You copy and paste the code - AI gives you the analysis results

AI Agent: - You say: "Help me analyze this project's code quality" - The Agent goes and: 1. Reads project files 2. Runs code inspection tools 3. Checks test coverage 4. Generates a report - Gives you the result directly

Core difference: Traditional AI is "customer service" — you ask, it answers. An Agent is an "assistant" — you state your needs and it gets things done on its own.

Three Key Capabilities of an Agent

Tool usage: Can read files, run commands, search the web, call APIs
Memory: Remembers what was done before, avoids redundant work
Planning: Knows how to break large tasks into smaller steps and complete them one by one

2. Why Do We Need Multiple Agents?

The Problem with Single Agent: Context Competition

Imagine asking one person to do all of the following simultaneously: - Architecture design (requires macro-level thinking) - Writing code (requires attention to detail) - Debugging bugs (requires state tracking) - Writing documentation (requires clear expression)

This person will: - Cognitive overload: Planning and execution details compete for attention in the same brain - Easily confused: While fixing Bug A, they reintroduce Bug A while working on Bug B - Memory loss: After several rounds of debugging, they forget the original planning goal

This is not a capability issue, it's an information architecture problem — mixing things that shouldn't be mixed together.

Core Value of Multi-Agent: Information Domain Isolation

The leverage of multi-agent comes from information domain isolation, not from imitating corporate organizational structures.

Key insight (from axiom T03): - Isolation is not for division of labor, but to enable each Agent to make better decisions in a clean information environment - Planner focuses on global decisions without being overwhelmed by execution details - Executor focuses on low-level implementation without being distracted by planning discussions

Analogy: - Single Agent = One person being both boss and employee, thinking about strategy and details simultaneously - Multi-Agent = Boss focuses on strategy, employee focuses on execution, coordinated through shared documents

3. OpenCode's Agent Team

Team Structure

OpenCode has a "Project Manager" (Sisyphus) managing 11 "specialists".

Project Manager: Sisyphus

Why this name: In Greek mythology, Sisyphus pushes a boulder up a mountain every day, only for it to roll back down, and he starts again the next day. This symbolizes the AI handling repetitive tasks every day, never stopping.

Core responsibilities: - Receive your requirements, understand intent - Decision-making: Determine which specialist Agents are needed - Assign tasks to specialist Agents - Aggregate results, verify quality

Decision logic: How does Sisyphus know who to dispatch?

User Request
    ↓
Analyze Task Type
    ↓
┌─────────────────────────────────────┐
│ Need to search code?                │
│   → Explore                         │
│                                     │
│ Need external information?          │
│   → Librarian                       │
│                                     │
│ Need architecture advice?           │
│   → Oracle                          │
│                                     │
│ Need to write code?                 │
│   → Select Category by complexity   │
└─────────────────────────────────────┘
    ↓
Evaluate if tasks can run in parallel
    ↓
Dispatch Agents (possibly multiple at once)
    ↓
Wait for results → Aggregate → Verify

Practical example: You say "Fix the login bug"

Sisyphus's thinking process: 1. This is a fix task, not a new feature 2. Involves authentication, may span multiple files 3. Need to understand existing code first → Dispatch Explore 4. May need to check common issues → Dispatch Librarian 5. The two tasks are independent, can run in parallel 6. After results come back, decide on the fix plan 7. The fix is a simple config change → Use quick category

11 Specialist Agents

According to Oh My OpenCode's official documentation, they are divided into 4 categories:

1. Communication & Coordination

Metis (Pre-Analyst)
Responsibility: Identify hidden pitfalls before a task begins
When to use: Requirements are vague or may have multiple interpretations
Recommended model: Claude Opus 4.6 (requires deep reasoning)
Example: You say "Add user authentication", Metis will ask:
- Need database migration?
- JWT or Session?
- Is there an existing authentication pattern in the code?
Momus (Quality Reviewer)
Responsibility: Check if work plans are feasible
When to use: Complex tasks, let it review after plan is made
Recommended model: Claude Opus 4.6 (requires critical thinking)
Example: Sisyphus made a 5-step plan, Momus checks:
- Are steps complete?
- Any missing dependencies?
- Are verification criteria clear?

2. Exploration & Research

Explore (Codebase Search Expert)
Responsibility: Find files and code patterns in your project
When to use: Unfamiliar with codebase, need to locate relevant files
Recommended model: MiniMax-M2.1 (lightweight and fast, frequently used)
Example: Find all authentication-related code
Cannot do: Cannot search external sources (only searches local project)
Librarian (External Resource Retrieval Expert)
Responsibility: Search the web for documentation, GitHub examples, best practices
When to use: Need external knowledge (official docs, open source examples)
Recommended model: MiniMax-M2.5 (medium model, balances performance and cost)
Example: Look up JWT authentication security best practices
Cannot do: Cannot search local code (only searches external sources)

3. Advisory

Oracle (Architecture Consultant)
Responsibility: Gives advice but does not modify code (read-only)
When to use: Need architecture advice, debugging complex issues, failed 3 times consecutively
Recommended model: Claude Opus 4.6 (strongest reasoning capability)
Example: Design distributed lock方案, analyze performance bottlenecks
Characteristic: High cost but good quality, used only at critical moments

4. Execution (Categorized by Task Type)

These are not specific Agent names, but execution modes automatically selected based on task type:

Category	Chinese Name	Use Case	Recommended Model	Why
`visual-engineering`	Visual Engineer	Frontend, UI, styles, animations	MiniMax-M2.5	Strong visual understanding
`ultrabrain`	Ultra Brain	Complex logic, architecture design	Claude Opus 4.6	Strongest reasoning
`deep`	Deep Thinker	Tasks requiring deep analysis	MiniMax-M2.7	Balances performance and cost
`artistry`	Artist	Creativity, brainstorming	Gemini 3 Pro	Unconventional thinking
`quick`	Quick Hand	Simple edits, typos	MiniMax-M2.1	Fast and cheap
`writing`	Writer	Documentation, reports	MiniMax-M2.5	Optimized for text generation

Key design: Same task framework, automatically switches to the most suitable model based on type.

Why this configuration?

Oracle uses the strongest model: Architecture advice requires the strongest reasoning, high cost but used only at critical moments
Librarian uses a medium model: Searching external resources requires intent understanding, used frequently
Explore uses a lightweight model: Searching local code only needs pattern matching, used very frequently
Ultrabrain uses the strongest model: Complex logic requires deep reasoning, high task quality requirements
Quick uses a lightweight model: Simple edits prioritize speed, low cost

Design principle: Choose the most cost-effective model based on task difficulty and frequency.

4. Real Case: How Do Agents Collaborate?

Your requirement: "There's a bug in the login feature, fix it"

Workflow:

Step 1: Dispatch Explorer Agents in Parallel
┌─────────────────────────────────────────────────┐
│ Sisyphus dispatches two Agents simultaneously   │
│                                                 │
│ ┌─────────────────┐    ┌──────────────────┐   │
│ │ Explore         │    │ Librarian        │   │
│ │ Searches local  │    │ Searches external│   │
│ │ code            │    │ resources        │   │
│ └─────────────────┘    └──────────────────┘   │
│         ↓                      ↓               │
│   Find auth-related   Check JWT common issues   │
│   code                                           │
└─────────────────────────────────────────────────┘

Step 2: Wait for Results
┌─────────────────────────────────────────────────┐
│ Explore reports:                                 │
│   Found login.ts, auth.ts, token.ts            │
│                                                 │
│ Librarian reports:                               │
│   Common issue is token expiration time        │
│   configuration error                           │
└─────────────────────────────────────────────────┘

Step 3: Dispatch Execution Agent to Fix
┌─────────────────────────────────────────────────┐
│ Sisyphus decides:                               │
│   This is a simple config modification          │
│   → Dispatch Quick to execute                   │
│   → Fix expiration time in token.ts            │
└─────────────────────────────────────────────────┘

Step 4: Verify
┌─────────────────────────────────────────────────┐
│ - Run code inspection tools                     │
│ - Confirm no new errors                         │
│ - Report completion                             │
└─────────────────────────────────────────────────┘

Key point: Explore and Librarian truly run simultaneously, not one after another.

Case 2: Write a Technical Research Report

Your requirement: "Research best practices for React Server Components"

Sisyphus's strategy:

Dispatch 3 Librarians, each responsible for a different angle, but with 30-50% overlap (for cross-validation):

Parallel Research (3 Librarians running simultaneously)
┌─────────────────────────────────────────────────┐
│ Agent 1: Official docs + Community discussions  │
│ Agent 2: Community discussions + Production     │  ← Overlap: Community discussions
│         cases                                    │
│ Agent 3: Production cases + Comparative         │  ← Overlap: Production cases
│         analysis                                 │
└─────────────────────────────────────────────────┘
         ↓
Cross-validate overlapping areas
         ↓
┌─────────────────────────────────────────────────┐
│ If Agent 2 and Agent 3 agree on "Production    │
│ cases" information                              │
│   → High credibility                            │
│                                                 │
│ If they disagree                                │
│   → Sisyphus further verifies                  │
└─────────────────────────────────────────────────┘
         ↓
Aggregate and generate comprehensive report

Why overlap?

Agent 2 and Agent 3 both look at "production cases"
If they find consistent information → High credibility
If inconsistent → Sisyphus further verifies

Result: 3 Agents run simultaneously, 3x faster than one Agent running serially, and the information is more comprehensive.

5. Key Design Mechanisms

1. Task Routing: Category System

Sisyphus automatically selects the most suitable model based on task type:

Task Type Determination
    ↓
┌─────────────────────────────────────────┐
│ Frontend, UI, styles?                   │
│   → visual-engineering                  │
│   → Use MiniMax-M2.5 (strong visual     │
│     understanding)                      │
│                                         │
│ Complex logic, architecture design?     │
│   → ultrabrain                          │
│   → Use Claude Opus 4.6 (strongest      │
│     reasoning)                          │
│                                         │
│ Simple edits, typos?                    │
│   → quick                               │
│   → Use MiniMax-M2.1 (fast and cheap)   │
└─────────────────────────────────────────┘

Benefit: Same framework, automatically switches to the most suitable model based on task.

2. Session Reuse: Avoid Redundant Work

If an Agent fails the first time, you can continue the conversation without starting over:

First Attempt
    ↓
Agent executes task
    ↓
Returns session_id (e.g., "ses_abc123")
    ↓
Failed?
    ↓
Continue the same session
    ↓
Agent remembers:
  - Which files were read
  - Which approaches were tried
  - Which problems were encountered
    ↓
Saves 70% of redundant work

Value: Agent retains complete context, no need to re-explore.

3. The 6 Elements of Delegation Prompt

When Sisyphus dispatches a task, it must clearly state 6 things:

TASK: What specifically to do
EXPECTED OUTCOME: What counts as success
REQUIRED TOOLS: What tools can be used
MUST DO: Things that must be done
MUST NOT DO: Things that are prohibited
CONTEXT: Relevant files, existing patterns, constraints

Why so strict? (from axiom A08)

Clear prompt quality is the decisive factor in whether AI can correctly understand intent. Vague prompts cause Agents to guess your intent in a huge search space, with a high probability of failure.

4. Failure Recovery: The 3-Strike Rule

If an Agent fails 3 times consecutively:

3 failures
    ↓
Immediately stop all edits
    ↓
Rollback to the last working version
    ↓
Consult Oracle (architecture consultant)
    ↓
Oracle can't solve it either?
    ↓
Ask the user

Why: Prevent Agents from trial-and-error, wasting time and money.

6. Comparison with Other Frameworks

OpenCode vs Traditional Frameworks

Comparison	OpenCode	LangChain	AutoGPT
Architecture	Multi-agent division of labor	Single agent + tools	Single agent + loop
Parallel capability	Native support	Need to write yourself	Not supported
Model selection	Auto-switch based on task	Fixed one model	Fixed one model
Specialization	11 specialist Agents	General-purpose Agent	General-purpose Agent

Core difference (from axiom T03):

OpenCode's multi-agent value comes from information domain isolation: - Traditional frameworks = One generalist, planning and execution compete in the same context - OpenCode = Professional team, each Agent makes decisions in a clean information environment

Analogy: - Traditional frameworks = One generalist - OpenCode = A professional team

For simple tasks, a generalist may be faster (no coordination cost). For complex tasks, a professional team is significantly stronger.

Cost and Performance

Based on community data: - Request count: Oh My OpenCode is 3x a regular version (96 vs 27) - Time: 10 minutes more (55 vs 45 minutes) - Success rate: Slightly lower by 4% (69% vs 73%)

However: - Oh My OpenCode handles more complex tasks - Includes more verification and quality checks - Provides more detailed intermediate results

Selection advice: - Simple tasks (fix a typo) → Use regular version - Complex tasks (multi-module refactoring) → Use Oh My OpenCode - Cost-sensitive → Control parallelism

7. Summary

Core Points

What is an AI Agent: - Not just answering questions, can complete tasks on its own - Can use tools, has memory, can plan

OpenCode's innovation: - Multi-agent division of labor, information domain isolation - Automatic model selection based on task type - True parallel execution - Session reuse to avoid redundant work

Key principles: - Delegate if you can do it yourself - Parallelize if you can, don't serialize - Every operation has verification - Stop immediately after 3 failures

Applicable Scenarios

OpenCode excels at: - Complex multi-module tasks - Requiring deep research - Exploring unfamiliar codebases

Not good at: - Simple single-file edits - Highly serial tasks - Extreme cost control

References

Oh My OpenCode GitHub
Official Documentation
Agent Architecture Deep Dive - Rost Glukhov, 2026-03

Closing note: This article is based on practical experience using OpenCode/Oh My OpenCode, combined with guidance from the axiom system (T03 Context Isolation, A08 Prompt Quality, M05 Simplicity). The system is still rapidly iterating, and details may change with version updates.

菜单

Share

What is an AI Agent? OpenCode's Multi-Agent System Explained

1. Plain English: What is an AI Agent?

Traditional AI vs AI Agent

Three Key Capabilities of an Agent

2. Why Do We Need Multiple Agents?

The Problem with Single Agent: Context Competition

Core Value of Multi-Agent: Information Domain Isolation

3. OpenCode's Agent Team

Team Structure

11 Specialist Agents

4. Real Case: How Do Agents Collaborate?

Case 2: Write a Technical Research Report

5. Key Design Mechanisms

1. Task Routing: Category System

2. Session Reuse: Avoid Redundant Work

3. The 6 Elements of Delegation Prompt

4. Failure Recovery: The 3-Strike Rule

6. Comparison with Other Frameworks

OpenCode vs Traditional Frameworks

Cost and Performance

7. Summary

Core Points

Applicable Scenarios

References

Comment

"代码审查才是瓶颈：Ramp 如何用 Codex 把审查时间从小时压缩到分钟"

"当 AI 看到了 80 年数学史没能看到的东西：OpenAI 推翻单位距离猜想始末"

"When AI Sees What 80 Years of Mathematics Couldn't: Inside OpenAI's Disproof of the Unit Distance Conjecture"

"Code Review Was the Bottleneck: How Ramp Used Codex to Compress Review Time from Hours to Minutes"

"OpenAI 与戴尔合作：将 Codex 引入混合云和本地企业环境"

"OpenAI and Dell Partner to Bring Codex to Hybrid and On-Premise Enterprise Environments"

"OpenAI 高级账户安全：防钓鱼登录与增强保护机制技术解析"

"OpenAI Advanced Account Security: How Phishing-Resistant Login and Enhanced Protections Work"

"NVIDIA 工程师如何用 Codex 构建生产级 AI 系统"

"NVIDIA Engineers Build with Codex: How the GPU Giant Ships Production AI Systems"

Share

What is an AI Agent? OpenCode's Multi-Agent System Explained

1. Plain English: What is an AI Agent?

Traditional AI vs AI Agent

Three Key Capabilities of an Agent

2. Why Do We Need Multiple Agents?

The Problem with Single Agent: Context Competition

Core Value of Multi-Agent: Information Domain Isolation

3. OpenCode's Agent Team

Team Structure

11 Specialist Agents

4. Real Case: How Do Agents Collaborate?

Case 1: Fix Login Bug

Case 2: Write a Technical Research Report

5. Key Design Mechanisms

1. Task Routing: Category System

2. Session Reuse: Avoid Redundant Work

3. The 6 Elements of Delegation Prompt

4. Failure Recovery: The 3-Strike Rule

6. Comparison with Other Frameworks

OpenCode vs Traditional Frameworks

Cost and Performance

7. Summary

Core Points

Applicable Scenarios

References

Comment