Administrator
Published on 2026-05-17 / 6 Visits
0
0

"Codex Windows Sandbox: Inside OpenAI's Secure Coding Environment Architecture"

When an AI agent can run arbitrary shell commands on your machine, the question is not whether it will make a mistake, but how you contain the blast radius. OpenAI's Codex gives its coding agent exactly this power: reading files, writing code, running tests, installing packages, and executing shell commands. The agent needs real system access to do useful work, which means real system access to do real damage if something goes wrong.

In May 2026, OpenAI published a detailed engineering blog post by David Wiesen, a member of the technical staff, documenting how the team built a sandbox for Codex on Windows. The post is a rare window into the security architecture of an AI coding agent, and it reveals something important: sandboxing an AI agent is a fundamentally different problem from sandboxing a traditional application. The solution OpenAI arrived at, after iterating through two distinct architectures, offers lessons for anyone building systems that let AI agents interact with real operating systems.

The Problem: Why Codex Needs a Sandbox

Codex runs with the permissions of a real user by default. The coding model tells its harness to run commands locally: running tests, reading and editing files, creating Git branches. This is powerful and potentially dangerous. A mischievous prompt injection, a hallucinated command, or a model that decides to "clean up" the wrong directory could delete source code, modify system configurations, or exfiltrate data to an external server.

OpenAI's default mode attempts to balance effectiveness and safety. Codex can read files almost anywhere and write files within your workspace (the directory where you are running Codex), with no internet access unless you explicitly request it. To enforce these constraints automatically, Codex needs a sandbox environment, one that the operating system actually enforces rather than merely suggests.

On macOS, Codex uses Seatbelt, Apple's built-in sandboxing mechanism. On Linux, it uses a combination of seccomp and bwrap (bubblewrap). These are mature, well-understood tools that provide kernel-enforced isolation. Windows, however, does not provide an equivalent capability out of the box. When Wiesen joined the Codex engineering team in September 2025, Windows users faced two bad options: approve nearly every command manually, which defeats the purpose of an autonomous agent, or enable Full Access mode, which gives the agent unrestricted control with no oversight.

Why Existing Windows Sandboxing Didn't Fit

Before building their own solution, the Codex team evaluated existing Windows isolation mechanisms. Each fell short for the same underlying reason: they were designed for applications with known, fixed capabilities, not for an agent that dynamically decides what tools to use.

AppContainer is Windows' native sandbox, a capability-based isolation model designed for apps that know upfront exactly what resources they need. The problem is that Codex is not one tightly scoped app. It drives open-ended developer workflows: shells, Git, Python, package managers, build tools, and whatever other binaries the agent decides it needs. AppContainer provides strong isolation, but for a much narrower class of workloads than "let an agent operate like a developer."

Windows Sandbox (the Hyper-V-based feature) creates a full disposable virtual machine. It offers strong isolation through hardware virtualization, but at a cost: every command incurs VM startup overhead, file sharing between host and guest is complex, and the VM needs its own copy of the development environment. For an agent that might run dozens of short-lived commands in quick succession, the latency penalty was too high.

The Codex team needed something in between: strong enough isolation to prevent the agent from escaping its boundaries, but lightweight enough to support real developer workflows. They built it themselves, in two iterations.

Iteration 1: The Unelevated Sandbox

The first working prototype used a combination of existing Windows concepts to implement isolation without requiring administrator privileges. The design goal was explicit: Codex should never need to prompt the user for admin rights just to set up or run the sandbox.

The unelevated sandbox focused on two constraints: file writes and network access.

For file system isolation, the team used restricted tokens derived from the current user's token. They applied a restricted SID list of [Everyone, Logon, Synthetic] to create a write_restricted token. Combined with Windows ACLs (Access Control Lists), this prevented the sandboxed process from writing to directories outside the workspace while still allowing reads.

For network isolation, the approach was more improvised. Without Windows Firewall as an option (it requires elevation), the team made the child environment fail-closed for common networked tools. They set environment variables to poison proxy-aware traffic:

HTTPS_PROXY=http://127.0.0.1:1
ALL_PROXY=http://127.0.0.1:1
GIT_HTTPS_PROXY=http://127.0.0.1:1
NO_PROXY=localhost,127.0.0.1,::1
GIT_SSH_COMMAND=cmd /c exit 1

They prepended a small denybin directory to PATH and reordered PATHEXT so that stub SSH and SCP scripts would resolve before the real binaries.

This caught a lot of normal tool-driven traffic, but it was still only advisory. A process could ignore environment variables, bypass PATH, or open sockets directly. The unelevated sandbox worked as a reasonable default, but the network isolation was fundamentally best-effort. As Wiesen noted, it was "too risky" for a security boundary that needed to hold under adversarial pressure.

Iteration 2: The Elevated Sandbox

The current implementation requires elevated admin permissions at setup time, which is why it is called the elevated sandbox. The key architectural insight is that elevation happens only during setup, not during normal operation. The user approves administrator access once, and all subsequent sandbox operations run without any elevated privileges.

At the boundary where Codex spawns a command, the elevated sandbox looks similar to the unelevated one. It still runs child processes under a restricted token with the same SID list. The critical difference is that the principal of this token is no longer the actual Windows user but one of two local users created by Codex itself:

  • CodexSandboxOffline: targeted by firewall rules that block all outbound traffic
  • CodexSandboxOnline: not targeted by firewall rules, used when the user explicitly enables network access

This design is elegant because it converts the network isolation problem from "can we poison enough environment variables" to "does Windows Firewall work," and Windows Firewall is a battle-tested, kernel-enforced security boundary. Creating dedicated local users means the sandbox has its own security identity, distinct from the user's real account. ACLs, firewall rules, and logon rights can all be scoped to this identity.

The Four-Layer Architecture

The final architecture consists of four distinct binaries, each with a clear responsibility:

codex.exe is the main process, the unelevated harness that orchestrates the agent loop. It never handles sandbox setup or elevated operations directly.

codex-windows-sandbox-setup.exe handles all elevated setup work: creating the two local sandbox users, configuring firewall rules, setting local security policy, and creating a private desktop for the sandbox session. Encapsulating setup in its own binary serves multiple purposes: it crosses the UAC (User Account Control) boundary only when needed, keeps codex.exe as a normal unelevated process, prevents Windows-only setup machinery from bloating codex.exe on other platforms, decouples longer-running setup work from the main process lifetime, and provides a single place to handle the different setup paths the sandbox needs.

codex-command-runner.exe has one job: mint a restricted token and spawn the requested command. Instead of having codex.exe do the entire flow itself (real user to sandbox user to restricted token to child process), the team split the flow in two. The command runner bridges the gap between the sandbox user context and the restricted token environment where the actual command executes.

The child process runs in the sandbox environment, constrained by the restricted token, the dedicated user's firewall rules, and ACL-based file system boundaries.

Wiesen cites Einstein: "Everything should be made as simple as possible, but not simpler." Each layer does exactly one thing, and the security properties of the system emerge from the composition of these layers rather than from any single component.

What This Architecture Actually Prevents

The elevated sandbox enforces three boundaries simultaneously:

File system writes are restricted to the workspace directory and any explicitly configured writable roots. The restricted token combined with ACLs means the sandboxed process literally lacks the Windows permissions to write elsewhere, even if the agent tries.

Network access is controlled at the firewall level, tied to the dedicated sandbox user identity. When Codex runs in its default offline mode, the firewall blocks all outbound connections for CodexSandboxOffline. When the user approves network access for a specific task, the process runs as CodexSandboxOnline, which has a different firewall profile.

Process privilege is constrained because the sandbox user is a standard local user with minimal rights. It cannot install system-wide software, modify registry hives outside its scope, or access other users' files.

The system also supports a read-only mode where Codex can inspect files but cannot edit anything or run commands without approval, and a danger-full-access mode that removes all boundaries entirely. The configurable approval policy (untrusted, on-request, never) determines when Codex must pause and ask the user before crossing a boundary.

What This Design Teaches About AI Agent Security

The Codex Windows Sandbox reveals three principles that generalize beyond one product.

First, sandboxing an AI agent is a different beast than traditional application security. The agent's workload is open-ended by design. It might need to run a test suite, install a package, compile code, or execute a custom build script. You cannot enumerate all the capabilities the agent will need in advance, which rules out static capability-based models like AppContainer. The sandbox must be strong enough to contain unknown behavior while flexible enough to support real developer workflows. This tension between compatibility and enforcement shaped every design decision.

Second, elevation should be a setup-time event, not a runtime requirement. The Codex team made the deliberate choice to require admin privileges only during initial configuration. Once the sandbox users, firewall rules, and security policies are in place, the agent runs entirely within standard user privileges. This is a good pattern for any system that needs to bootstrap security boundaries: pay the privilege cost once, then operate within the constrained environment.

Third, OS-native enforcement beats application-level advisory controls. The shift from the unelevated sandbox (environment variable poisoning, PATH manipulation) to the elevated sandbox (firewall rules, dedicated users, ACLs) represents a move from "hope the process follows our rules" to "the operating system enforces our rules." For AI agents that can run arbitrary code, this distinction is the difference between a security boundary and a suggestion.

The Windows Sandbox is one layer in a multi-layered safety architecture. In Codex's cloud environment, agents run in isolated OpenAI-managed containers with a two-phase runtime model: a setup phase that can access the network to install dependencies, followed by an agent phase that runs offline by default. Secrets configured for cloud environments are available only during setup and are removed before the agent phase starts.

The GPT-5.2-Codex System Card, published on OpenAI's Deployment Safety Hub, describes additional model-level mitigations: specialized safety training for harmful tasks and prompt injections, combined with product-level mitigations like the sandboxing described here. The card notes that GPT-5.2-Codex is OpenAI's strongest performing model on Professional CTFs (Capture The Flag) evaluations, partly because of context compaction that allows coherent work across multiple context windows.

For organizations running Codex locally, the sandbox provides the safety boundary. For cloud deployments, container isolation provides it. In both cases, the principle is the same: the agent operates within a bounded environment where routine tasks can run autonomously inside clear limits, and crossing those limits requires explicit user approval.

How Other AI Coding Tools Handle This Problem

Codex is not the only AI coding tool that needs to execute code. Here is how the competition approaches sandboxing:

Claude Code (Anthropic) runs commands locally with user confirmation prompts by default. It does not implement OS-level sandboxing on Windows, instead relying on approval gates where the user must confirm before commands execute. This is safer for the user but slower for workflows that need many small commands.

Cursor runs code execution in isolated backend environments for its agent mode, keeping the local machine largely untouched. The trade-off is that the agent cannot interact with the user's local development environment as seamlessly.

GitHub Copilot operates primarily as a code suggestion tool rather than an autonomous agent, so it faces a different risk profile. When it does execute code (in features like Copilot Workspace), it uses sandboxed cloud environments.

The contrast highlights the unique position Codex occupies: it needs both real local system access and strong containment. Other tools solve the problem by limiting one or the other.

Practical Implications for Development Teams

If you are running Codex on Windows, the elevated sandbox is the recommended configuration. If your organization blocks local user creation or firewall rule changes through enterprise policy, you will need to work with your IT team to allow these specific operations for the Codex setup binary. The fallback unelevated sandbox works but provides weaker network isolation.

The key configuration lives in config.toml:

[windows]
sandbox = "elevated"
sandbox_private_desktop = true

If neither native Windows sandbox mode works in your environment, you can run Codex inside WSL2, which uses the Linux sandbox implementation (seccomp + bwrap) instead.

For teams managing Codex deployments at scale, the sandbox architecture means that security policy is enforced at the OS level, not at the application level. You can audit the sandbox users, firewall rules, and ACLs using standard Windows administration tools. The security boundaries are inspectable and testable, which is essential for organizations that need to validate AI tool safety before deployment.

FAQ

What happens if Codex tries to write outside the workspace? The restricted token and ACLs prevent the write operation at the OS level. Codex receives a permission denied error and can report the failure to the user.

Can the elevated sandbox be compromised by a sufficiently clever prompt injection? The sandbox operates at the OS level, below the application layer. Even if the model is tricked into generating malicious commands, those commands execute within the sandbox user context, which lacks the privileges to escape the configured boundaries. The primary remaining risk is social engineering: tricking the user into approving an action that crosses the sandbox boundary.

Why not just use Docker containers for isolation? Docker on Windows relies on WSL2 or Hyper-V, which introduces the same VM overhead as Windows Sandbox. The Codex team needed a solution that adds minimal latency to individual commands while still enforcing real OS-level boundaries.

Does the sandbox work on all Windows versions? The elevated sandbox requires Windows 10 Pro or later with support for local user creation and firewall configuration. Windows Home edition may not support all the required features.

References


Comment