MCP vs CLI for AI Agents: Why the Command Line Is Winning (And What It Means for Agent Protocol Standards)

The Week the Market Diverged from the Protocol Crowd

March 2026 was supposed to be the year of agent protocol unification. The Model Context Protocol (MCP) had been in production at scale for over a year, and the prevailing narrative in developer communities was clear: agents needed a standardized interface layer, a universal translation layer between AI models and the tools they orchestrated. MCP was positioned as that layer—clean, transport-agnostic, designed for the AI-native era.

And then, in the same week of March 2026, two of China's largest enterprise communication platforms—Feishu (Lark) and DingTalk—released their agent CLI toolkits. Not MCP servers. Not protocol bindings. CLI tools. The gap between the protocol community's vision and the market's actual purchasing decision could not have been more stark.

Feishu's CLI offered 2500+ API endpoints across 11 business domains, bundled with 19 AI Agent Skills. The entire package was open-sourced on GitHub within 72 hours of announcement. DingTalk released CLI tooling covering 10 core platform capabilities—but chose to keep the implementation closed-source. When the developer community asked why, one commenter on the Chinese developer forums put it bluntly: "DingTalk's API encapsulation is so mediocre, why would they even bother closing the CLI? Their mindset is the problem."

This is not a story about MCP failing. MCP is gaining adoption across many platforms. This is a story about why CLI—the humble command-line interface born in the 1970s—is winning the race to give agents real capabilities today, and what that race reveals about the future of protocol standardization.

The Market Has Spoken: CLI First, MCP Later

The simultaneous release patterns from Feishu and DingTalk tell a consistent story about how enterprise platforms are prioritizing their agent tooling strategies.

Feishu's CLI is a comprehensive beast. 2500+ APIs covering 11 business domains—calendar, documents, messaging, search, approval workflows, task management, contacts, mail, data analytics, integration pipelines, and developer tooling. On top of this API surface, Feishu bundled 19 AI Agent Skills: discrete, composable capabilities that agents can invoke to accomplish specific tasks. The entire project is open source, meaning the community can audit, extend, and redistribute the tooling without platform lock-in. Feishu's bet is clear: give agents maximum capability access first, negotiate the protocol layer later.

DingTalk's CLI covers 10 core platform capabilities—an order of magnitude less surface area than Feishu's offering, and closed-source to boot. The comparison is unflattering not just in scope but in philosophy. Where Feishu treats its agent ecosystem as a platform play with community wind at its sails, DingTalk appears to be treating CLI as a proprietary differentiator. The developer reception was accordingly frosty. Chen Ming, a developer who reviewed both toolkits, commented: "DingTalk's CLI is not worth open-sourcing given how poorly designed their API封装 is—but the fact that they chose to close it anyway says everything about their understanding of the agent ecosystem."

The critical observation here is what neither company prioritized for their initial agent release: an MCP server. Both platforms have since announced MCP compatibility roadmaps, but the sequencing is revealing. The market demand was for working tools that agents could use immediately. Protocol standardization came second.

This is not unique to Chinese enterprise platforms. When GitHub integrated agent capabilities into Copilot, the first generation of workspace integration was CLI-based. When Replit deployed agents to handle deployment pipelines, the agents spoke shell. The pattern is consistent across geographies and use cases: CLI is the first-mover interface, protocol is the consolidation play that comes after market validation.

Why CLI Works Better for Shell-Native Agents

The agent platforms shipping today are not web-native applications running in browser sandboxes. They are processes running in terminal sessions, CI/CD pipelines, and containerized environments. Claude Code runs in a shell. Cursor's agent mode runs in a local process. OpenAI's Codex runs in a cloud shell. These agents live where developers live—in the command line—and CLI is their native tongue.

Consider what CLI offers to an agent developer:

Zero-friction interface discovery. Run lark-cli --help and you get a complete listing of available commands, arguments, and usage patterns. This is not documentation that can go stale—it is the interface itself. When an agent needs to understand what a tool can do, it runs --help. When a developer needs to audit an agent's capability access, they read the same help output. The interface and the documentation are the same artifact. Compare this to MCP, where capability discovery requires understanding the protocol schema, transport configuration, and server initialization sequence before a single tool call can be made.

Stateless execution model. A CLI command runs, produces output, and exits. There is no persistent connection to maintain, no session state to manage, no transport layer to keep alive. For agents that need to orchestrate dozens of tool calls per task, the operational simplicity of CLI is significant. Each invocation is independent, retryable, and observable. MCP's transport layer adds statefulness that becomes a failure point at scale.

Shell integration is universal. Agents running in bash, zsh, fish, or PowerShell can invoke CLI commands without any special adapter code. The shell is the integration layer. There is no SDK to install, no package manager to coordinate, no version compatibility matrix to maintain. An agent running curl can interact with any HTTP API. An agent running lark-cli can interact with the Feishu platform. The abstraction level of the shell is low enough to be universally compatible and high enough to express complex operations.

Debugging is straightforward. When a CLI command fails, the error output tells you exactly what went wrong. When an MCP tool call fails, you may need to trace through protocol negotiation, transport errors, schema validation, and server-side exception logs before finding the root cause. For developers building and debugging agentic systems, this debugging gap is not trivial. A significant portion of agent development time is spent understanding why a tool call produced an unexpected result. CLI's debugging model is simply faster.

The practical consequence is that CLI-first platforms are seeing faster agent adoption. Developers can prototype an agent's tool use in a shell script before committing to a full protocol integration. The iteration speed advantage compounds over time.

Code Example: Feishu CLI in Action

Here is what agent integration with Feishu's CLI looks like in practice. An agent managing calendar events would invoke commands directly in the shell:

# List upcoming calendar events for the next 7 days
lark-cli calendar event list \
  --start-time "$(date -v+1d +%Y-%m-%d)" \
  --end-time "$(date -v+7d +%Y-%m-%d)" \
  --fields "summary,start_time,attendees"

# Create a meeting with attendees
lark-cli calendar event create \
  --calendar-id "primary" \
  --summary "Q2 Planning Review" \
  --start-time "2026-03-15T14:00:00+08:00" \
  --end-time "2026-03-15T15:00:00+08:00" \
  --attendees "[email protected],[email protected]" \
  --description "Quarterly planning session"

# Search documents across the workspace
lark-cli docs search \
  --query "infrastructure architecture" \
  --types "doc,spreadsheet" \
  --workspace "engineering"

The agent receives structured output directly—no protocol handshake, no session negotiation, just the command execution and its result. The Feishu CLI covers 11 business domains with 19 discrete AI Agent Skills, each composable in the same shell-driven paradigm.

Code Example: MCP Server Configuration vs CLI Command

Compare this to the equivalent MCP server configuration and tool invocation:

// MCP server configuration (package.json equivalent)
{
  "mcpServers": {
    "feishu-calendar": {
      "transport": "stdio",
      "command": "npx",
      "args": ["@feishu/mcp-server", "--auth-token", "ENV_FS_TOKEN"],
      "env": {
        "FS_TOKEN": "${FS_ACCESS_TOKEN}"
      }
    }
  }
}

# MCP tool invocation (agent code)
def list_calendar_events(start_time: str, end_time: str) -> list[Event]:
    """
    MCP tool call requires:
    1. Server initialization handshake
    2. Tool schema transmission (overhead ~200 tokens)
    3. Result wrapping in MCP response format (~50 tokens)
    """
    result = mcp.call_tool(
        "feishu_calendar_list",
        arguments={
            "start_time": start_time,
            "end_time": end_time,
            "fields": ["summary", "start_time", "attendees"]
        }
    )
    return parse_mcp_response(result)

# vs CLI equivalent:
def list_calendar_events(start_time: str, end_time: str) -> list[Event]:
    """
    CLI tool call requires:
    1. Command string construction (~20 tokens)
    2. Shell execution
    3. Output parsing (~30 tokens)
    Total overhead: ~50 tokens vs ~250 tokens for MCP
    """
    output = subprocess.run(
        f"lark-cli calendar event list --start-time {start_time} "
        f"--end-time {end_time} --fields summary,start_time,attendees",
        shell=True,
        capture_output=True,
        text=True
    )
    return parse_json_output(output.stdout)

The token overhead difference is not theoretical. In a production agent processing 500 calendar operations per session, CLI saves approximately 100,000 tokens—tokens that could be used for actual task content rather than protocol metadata.

MCP's Fragmentation Problem: A SQL-like History Repeating

The MCP community positions the protocol as analogous to USB for AI tools—a universal connection standard that lets any AI model connect to any tool. The analogy is seductive and, on its face, reasonable. USB solved exactly this problem for hardware peripherals in the 1990s. But the analogy breaks down when you examine what is actually happening in the MCP ecosystem.

MCP is fragmenting along the same lines that fragmented SQL adoption in the 1990s and 2000s. SQL was a standard. Every major database vendor claimed SQL compliance. But PostgreSQL SQL was not Oracle SQL was not MySQL SQL was not SQL Server SQL. The standard existed; the implementations were not interchangeable. Migrating a SQL-based application from one database to another was a significant engineering project, often requiring substantial query rewrites and schema adjustments. The standard provided a common language but not true interoperability. Application developers learned this the hard way: write once for one database, debug extensively for the target database.

MCP is exhibiting the same fragmentation pattern, just compressed into a shorter timeline.

The OpenAI Apps SDK workaround. OpenAI's agent framework added a _meta parameter to tool calls specifically to work around context window limitations. When an agent's prompt is too large to fit in the context window, the SDK sends a minimal placeholder instead of the full tool result. This is a pragmatic engineering solution to a real constraint, but it means that MCP tool calls from an OpenAI agent may not carry the same information density as tool calls from other providers. An agent designed to work with OpenAI's _meta optimization may behave differently when ported to a different MCP provider. The fragmentation is behavioral: the protocol standard supports full tool results, but production implementations diverge in their use of optimization flags.

SSE transport deprecation. Server-Sent Events (SSE) was one of the original transport options for MCP communication. It has been deprecated in favor of more robust alternatives. This deprecation is not hypothetical—it affects production deployments that built on SSE and now need migration. The protocol is still young enough that breaking changes are to be expected, but they add up. Each migration imposes costs on agent developers who built integrations against specific transport assumptions. The irony is that the deprecation was driven by legitimate concerns about reliability, but the market consequence is fragmented production environments running different transport stacks.

OAuth 2.1 security considerations. The OAuth 2.1 specification introduced stricter requirements for PKCE (Proof Key for Code Exchange) and removed implicit grant flows. MCP server implementations have been caught with security gaps related to these changes. CVE disclosures for MCP-related OAuth misconfigurations have appeared in production environments where server operators did not adequately validate token endpoint compliance. Security fragmentation in a protocol designed for agentic tool access is particularly concerning because agents operating with compromised OAuth flows can be manipulated to access unauthorized resources. The attack surface is not hypothetical: an agent with a manipulated OAuth flow could be tricked into reading or writing data on behalf of an unauthorized user.

Kiro CLI compatibility failures. Developer reports from late 2025 documented cases where Kiro CLI (a popular MCP client implementation) failed to correctly negotiate tool schemas with certain MCP servers. The failures were intermittent and appeared to be related to schema interpretation differences between client and server implementations. The affected developers had to patch their MCP server configurations to emit schema variants that Kiro's parser handled correctly—meaning the protocol standard was insufficient to guarantee interoperability in practice. This is the SQL fragmentation pattern appearing in real time: the standard says one thing, the implementations diverge in practice, and developers bear the compatibility cost.

The pattern is not that MCP is a bad protocol. The pattern is that protocol standards in competitive markets tend to fragment as implementers make pragmatic tradeoffs to ship features faster than the standard evolves. SQL fragmented. REST fragmented (anyone who has debugged OpenAPI spec mismatches knows this). gRPC fragmented (envoy proxy exists largely to paper over gRPC-Web compatibility gaps). MCP is fragmenting, and the fragmentation is most visible precisely where adoption is highest: in the CLI tools and agent frameworks that developers are actually deploying.

The irony is that the fragmentation is happening while the protocol community is actively working to prevent it. But the market moves faster than working groups, and pragmatic implementers are shipping CLI tools while the standards committees draft specifications.

The Hidden Cost of MCP: Token Consumption

There is a cost to MCP that rarely appears in the protocol's marketing materials: context window consumption. Every MCP tool call involves a round-trip where the protocol layer adds metadata to the context. Tool schemas must be transmitted. Transport headers must be parsed. Session state must be maintained. None of this is free in the context of an LLM whose pricing is calibrated to tokens processed.

CLI, by contrast, is a one-time shell execution. The agent sends a command string to the shell, the shell executes the underlying API call or system operation, and returns the output. There is no persistent protocol session. There is no metadata overhead layered on each call. The token cost of a CLI invocation is the command string plus its output—nothing more.

For agents that make dozens or hundreds of tool calls per session, this difference compounds. An agent managing a Feishu calendar integration via CLI might consume 50 tokens per API call. The same agent via MCP might consume 120 tokens per call when you include protocol overhead—schema transmission, session negotiation, and response wrapping. Run 500 calendar operations in a single agent session and you have consumed 35,000 additional tokens. At current LLM pricing, that is real money. At scale—with thousands of agents running in production—it is a significant line item.

The token cost is not just financial. In fixed-context models, protocol overhead displaces actual task content. An agent whose context window is 30% protocol metadata has 30% less room for the actual work it was assigned to do. This is the hidden tax of protocol-based tool access, and it is a tax that CLI entirely avoids. For agents doing complex reasoning over large data sets, this displacement is not a minor efficiency concern—it is a capability ceiling.

Google Workspace's approach is instructive here. The googleworkspace/cli tool that ships with Google Workspace integration is designed to be called directly by agents without an MCP intermediary. Google's own documentation acknowledges this: agents should use gws CLI commands for batch operations where token efficiency matters, and reserve MCP for scenarios where the protocol's discovery features provide compensating value. This is a vendor explicitly endorsing the hybrid model—CLI for efficiency, MCP for specific use cases—because Google has measured the cost difference and found it material. The internal analysis that led to this recommendation has not been published, but the public recommendation itself is significant: a vendor with strong MCP investments is still directing customers to CLI for production workloads.

Google Workspace: The Pragmatic Hybrid

Google Workspace's agent tooling strategy deserves specific examination because it represents the most mature enterprise implementation of the CLI-plus-MCP hybrid model. Their approach reveals what the market is actually converging on, as opposed to what the protocol community advocates.

The googleworkspace/cli package provides direct command-line access to Gmail, Calendar, Drive, Docs, and Meet functionality. Agents invoke these commands directly via shell execution. The token cost per operation is low because there is no protocol layer between the agent and the API calls. For an agent processing 1000 emails per day to extract action items, CLI is the difference between a viable product and an overpriced prototype.

But Google also maintains an official MCP server for Workspace integration. The MCP server provides structured tool discovery, typed schema definitions, and a standardized interface that works with any MCP-compatible client. For agents running in environments where tool discovery and type safety are prioritized over token efficiency—early prototyping, agent evaluation frameworks, multi-platform aggregators—the MCP server is the right tool. The discovery value is real: when an agent is exploring what a new workspace can do, the MCP server's structured schema is more useful than running --help on a CLI tool.

Google's own guidance implicitly acknowledges the tradeoff structure: CLI for production-scale agent operations where every token counts, MCP for development and discovery workflows where ergonomics matter more than marginal cost. This is not a contradictory position—it is an honest acknowledgment that different stages of the agent development lifecycle have different requirements. Prototyping favors discoverability. Production favors efficiency.

This is the hybrid model that is emerging across enterprise platforms. Feishu's open-source CLI is the foundation; an MCP server is on the roadmap but not yet shipped. DingTalk's CLI is proprietary; whether an MCP server materializes depends on competitive dynamics and developer demand. Klarna's agent infrastructure—simplified from 1200 SaaS applications to a generative core—uses CLI as the primary interface because CLI is the lowest-common-denominator abstraction that every agent runtime can invoke without platform-specific adapters. The Klarna case is particularly instructive because it represents a company that went through the full MCP evaluation process and concluded that CLI was the pragmatic choice for their production agent workloads.

The CLI-first approach is not anti-standard. It is pragmatic standard adoption: ship a working tool that every agent can use today, contribute the MCP server to the ecosystem when the implementation has stabilized. The sequence matters because implementation experience informs protocol design. Feishu's open-source CLI will generate production usage patterns that will inform their eventual MCP server implementation. Shipping the MCP server first would have meant designing the protocol without the benefit of real production feedback.

What This Means for Agent Infrastructure

The message from the market is uncomfortable for protocol advocates: standardization is being deprioritized in favor of immediate capability delivery. Agents need to solve problems today. The question is not whether MCP will eventually achieve the universal adoption that USB achieved for hardware. The question is whether the fragmentation we are seeing in 2026 will eventually coalesce or deepen.

There are reasons for cautious optimism. SQL eventually stabilized. REST eventually coalesced around OpenAPI despite early fragmentation. The MCP working group has strong participation from major platform vendors, and the protocol's design accounts for many of the lessons learned from earlier standardization efforts. The fragmentation we are seeing may be a necessary early phase—implementers exploring the design space before settling on common patterns—rather than a terminal condition.

But there are also reasons for concern. The token consumption problem is structural to protocol-based tool access. As agents run more operations per session and context windows remain finite, the efficiency gap between CLI and MCP will become more pronounced. This will create pressure for hybrid architectures that use CLI for hot paths and MCP for discovery, which is more complex than either pure approach but may be the only viable production architecture. Managing two integration paradigms in the same agent codebase adds maintenance burden and increases the probability of integration bugs.

The deeper implication is about the relationship between agents and interfaces. Agents are not web applications. They do not need web-native interfaces designed for human attention and browser rendering. They need interfaces designed for programmatic invocation—shell commands, function calls, message-passing primitives. CLI is the oldest and most battle-tested interface paradigm for programmatic invocation. The reason CLI is winning is not nostalgia. It is that CLI maps better to how agents actually work.

Yang Zhengwu, a developer who has been tracking enterprise agent deployments across Chinese platforms, put it this way in his analysis of the Feishu and DingTalk releases: "The CLI bet is the right bet for 2026. Agents need to ship capabilities, not specifications. The protocol will follow once the implementations stabilize. Everyone who tries to lead with the protocol first ends up with a beautiful specification that nobody uses."

This is the pragmatic consensus forming in the agent infrastructure community: let CLI carry the first wave of agent deployments, let MCP mature through production experience, and build hybrid architectures that get the best of both. The protocol is not dead. It is just not the first move.

FAQ

What is MCP and how does it differ from CLI?

The Model Context Protocol (MCP) is a transport-agnostic protocol designed to provide a standardized interface between AI agents and the tools they use. It defines how tool schemas are communicated, how tool calls are routed, and how results are returned to the agent. CLI (Command Line Interface) is a traditional interface paradigm where an agent invokes a command-line program with arguments and receives text output. MCP is protocol-defined and designed for interoperability across platforms; CLI is execution-defined and ties the agent directly to shell execution. MCP adds metadata overhead per call; CLI does not.

Why is CLI winning the first wave of agent deployments?

CLI is winning because it offers zero-friction interface discovery (--help is the documentation), stateless execution (no persistent session state), universal shell integration (works in any shell environment), and straightforward debugging. For agents running in terminal sessions and CI/CD pipelines, CLI is the native interface. Platform vendors like Feishu and DingTalk shipped CLI tools first because CLI allows agents to access capabilities immediately without protocol overhead. The market validated this prioritization by adopting CLI tools faster than equivalent MCP server implementations. Feishu's open-source CLI reached 5000 GitHub stars within two weeks of announcement—faster adoption than any MCP server implementation from the same platform vendors.

Can MCP and CLI coexist in the same agent infrastructure?

Yes, and this hybrid model is emerging as the dominant pattern in mature agent deployments. Google Workspace exemplifies this: agents use gws CLI commands for production-scale operations where token efficiency matters, and use the official MCP server for development and discovery workflows where typed schema definitions and structured tool discovery provide value. The pattern is CLI for hot paths (high-frequency, efficiency-sensitive operations) and MCP for cold paths (discovery, prototyping, cross-platform aggregation). Both Feishu and DingTalk have announced MCP compatibility roadmaps, suggesting they expect their CLI tools to coexist with MCP servers rather than being replaced by them. The coexistence is not theoretical—it is the production architecture that major platform vendors are converging on.

What are the security implications of CLI versus MCP tool access?

Both paradigms have distinct security considerations. MCP's OAuth 2.1 implementation has seen CVE disclosures related to token endpoint compliance, and protocol-level security depends on correct server configuration. CLI security depends on the security of the underlying command execution environment—shell injection vulnerabilities can be severe if agent prompts are not properly sanitized before shell command construction. CLI's advantage is transparency: a CLI command and its arguments are visible in the process list, making auditing more straightforward. MCP's advantage is standardized authentication flows, which are more complex but more feature-complete for enterprise security requirements. Neither paradigm is inherently more secure; the security posture depends on implementation quality and operational discipline. The CVE disclosures affecting MCP are a reminder that protocol standardization does not automatically confer security—each implementation must be audited independently.

What should developers prioritize when building agent tooling today?

Developers should prioritize capability delivery over protocol compliance. Ship CLI tools that agents can invoke immediately, measure token consumption and operational costs in production, and add MCP server implementations once the CLI tooling has stabilized. The pragmatic path is: CLI first for immediate agent enablement, MCP server contribution to the ecosystem once implementation experience has been gained, hybrid architecture for production systems that need both discovery ergonomics and execution efficiency. The agent infrastructure community's consensus is that getting agents deployed and working is more valuable than getting the protocol perfect. Protocol maturity will follow production experience, not precede it. Klarna's experience—reducing 1200 SaaS applications to a generative core with CLI as the primary interface—is the reference implementation for this prioritization.

Conclusion

The simultaneous CLI releases from Feishu and DingTalk in March 2026 were not a rejection of protocol standardization. They were a market signal about sequencing. The market wanted working tools that agents could invoke today. Protocol standardization is a legitimate goal—but it is a goal that follows capability delivery, not a prerequisite for it.

CLI is winning because it maps to how agents actually operate: stateless shell executions, universal shell environments, and straightforward debugging. MCP offers real value—standardized interfaces, structured discovery, transport abstraction—but that value is most apparent in development and discovery workflows rather than production hot paths. The fragmentation happening in the MCP ecosystem is a reminder that standards bodies move slower than markets, and pragmatic implementers will ship working tools while the committees draft specifications.

The hybrid model emerging from Google Workspace and the Feishu/DingTalk roadmaps suggests a future where CLI and MCP coexist in the same infrastructure stack. CLI carries the first wave of agent deployments where efficiency matters. MCP matures through production experience and eventually provides the interoperability that enterprise platforms need. The protocol is not dead. It is just not the first move.

For developers building agent infrastructure today, the lesson is clear: ship the CLI first, contribute the protocol later, and build for a hybrid future. The agents cannot wait for the specification to be perfect. They need capabilities now.

菜单

Share

MCP vs CLI for AI Agents: Why the Command Line Is Winning (And What It Means for Agent Protocol Standards)

The Week the Market Diverged from the Protocol Crowd

The Market Has Spoken: CLI First, MCP Later

Why CLI Works Better for Shell-Native Agents

Code Example: Feishu CLI in Action

Code Example: MCP Server Configuration vs CLI Command

MCP's Fragmentation Problem: A SQL-like History Repeating

The Hidden Cost of MCP: Token Consumption

Google Workspace: The Pragmatic Hybrid

What This Means for Agent Infrastructure

FAQ

Conclusion

Comment

"代码审查才是瓶颈：Ramp 如何用 Codex 把审查时间从小时压缩到分钟"

"当 AI 看到了 80 年数学史没能看到的东西：OpenAI 推翻单位距离猜想始末"

"When AI Sees What 80 Years of Mathematics Couldn't: Inside OpenAI's Disproof of the Unit Distance Conjecture"

"Code Review Was the Bottleneck: How Ramp Used Codex to Compress Review Time from Hours to Minutes"

"OpenAI 与戴尔合作：将 Codex 引入混合云和本地企业环境"

"OpenAI and Dell Partner to Bring Codex to Hybrid and On-Premise Enterprise Environments"

"OpenAI 高级账户安全：防钓鱼登录与增强保护机制技术解析"

"OpenAI Advanced Account Security: How Phishing-Resistant Login and Enhanced Protections Work"

"NVIDIA 工程师如何用 Codex 构建生产级 AI 系统"

"NVIDIA Engineers Build with Codex: How the GPU Giant Ships Production AI Systems"