claude-code with --dangerously-skip-permissions, minus the danger

ai-safety claude-code cli developer-tools go nix nix-flake nixos security

Find a file

Christopher Mühl 72dfde91a8 feat!: thin layer over Claude /sandbox + nftables CIDR block Drops bwrap orchestration, history overlay, forced --dangerously-skip-permissions, SANDBOX.md injection, env-file loading. claude --sandbox handles kernel isolation; claudebox manages settings.local.json sandbox.* keys and installs nftables rules matched on claude-sandbox.slice cgroup membership. New flake outputs: nixosModules.default + checks.wrapper-syntax. Docs updated to reflect the layered (not structural) FS guarantee. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-11 12:19:40 +02:00
.planning	docs(quick-260505-le7): Add harness config file support to claudebox	2026-05-05 15:34:33 +00:00
modules	feat!: thin layer over Claude /sandbox + nftables CIDR block	2026-05-11 12:19:40 +02:00
redteam	feat!: thin layer over Claude /sandbox + nftables CIDR block	2026-05-11 12:19:40 +02:00
CLAUDE.md	docs: create roadmap (3 phases)	2026-04-09 10:32:35 +02:00
claudebox.sh	feat!: thin layer over Claude /sandbox + nftables CIDR block	2026-05-11 12:19:40 +02:00
flake.lock	fix: SHELL path, PATH isolation, --shell flag, nix-claude-code input	2026-04-09 14:59:43 +02:00
flake.nix	feat!: thin layer over Claude /sandbox + nftables CIDR block	2026-05-11 12:19:40 +02:00
GUARANTEES.md	feat!: thin layer over Claude /sandbox + nftables CIDR block	2026-05-11 12:19:40 +02:00
README.md	feat!: thin layer over Claude /sandbox + nftables CIDR block	2026-05-11 12:19:40 +02:00
test-gc.sh	test(05-02): add GC integration test covering stale removal, valid preservation, empty-dir safety	2026-04-13 10:02:31 +00:00
THREAT-MODEL.md	docs: add scope/limits section, GUARANTEES and THREAT-MODEL	2026-05-11 09:21:47 +02:00

README.md

claudebox

A thin layer over Claude Code's built-in /sandbox that adds CIDR-level egress blocking (Tailscale, RFC1918, MagicDNS) and hardened credential-path denyRead defaults. NixOS-distributed.

When to use this

claudebox is worth it if any of these apply:

You have internal/private-network services reachable from your machine that you don't want a prompt-injected agent to touch — anything on a mesh VPN (Tailscale, Headscale, Nebula, ZeroTier, WireGuard), anything on RFC1918 LAN (router admin, NAS, homelab, internal dashboards), or cloud metadata services (169.254.169.254).
You're on NixOS and want hardened sandbox defaults (denyRead trifecta, opinionated allowedDomains) shipped as a flake input rather than hand-rolled per-project.

The gap claudebox closes over plain /sandbox: built-in /sandbox does hostname-based egress allowlist only — it cannot block address ranges like 100.64.0.0/10 (CGNAT, used by Tailscale and some ISPs), 192.168.0.0/16, or 169.254.169.254. If the agent resolves a name to one of those IPs (e.g. MagicDNS), the hostname allowlist won't catch the connection.

When not to use this

Skip claudebox and just use claude with /sandbox enabled if:

No internal network exposure. Your machine doesn't reach anything you wouldn't put on the public internet anyway. Hostname allowlist (api.anthropic.com, github.com, etc.) covers your exfil concern.
Not on NixOS. This is distributed as a NixOS flake with a NixOS module for the nftables rules. The wrapper-only piece works elsewhere but you'd reinvent the network policy by hand.
You need hostname-only filtering. /sandbox does that natively via sandbox.network.allowedDomains in .claude/settings.json — claudebox doesn't add anything there.

Put bluntly: if you took your laptop to a coffee shop and never noticed anything was missing, you probably don't need claudebox.

Quick start

nix run git+https://git.toph.so/toph/claudebox

Or add to your flake:

{
  inputs.claudebox.url = "git+https://git.toph.so/toph/claudebox";
}

Then add inputs.claudebox.packages.${system}.default to your environment.systemPackages or home-manager packages, and import the NixOS module to install the nftables rules:

{
  imports = [ inputs.claudebox.nixosModules.default ];
  services.claudebox.enable = true;
}

Without the module, claudebox still runs but the CIDR block won't be enforced — you'll get only the hardened denyRead defaults on top of /sandbox.

What it does

Writes a hardened sandbox.* config into ./.claude/settings.local.json (deep-merge: preserves your other keys, replaces the sandbox subtree).
Launches claude inside the claude-sandbox.slice systemd user scope so nftables rules can match by cgroup.
NixOS module installs the nftables output chain that drops egress to private/internal ranges — CGNAT (100.64.0.0/10, used by Tailscale/Headscale/some ISPs), RFC1918 (10/8, 172.16/12, 192.168/16), link-local (169.254/16, includes cloud metadata services), Tailscale's IPv6 ULA prefix (fd7a:115c:a1e0::/48), generic IPv6 ULA (fc00::/7), and IPv6 link-local — only for processes inside that slice. CIDRs are configurable via the module.

What it doesn't do (anymore, post-rewrite): no bwrap orchestration of its own, no SANDBOX.md injection, no per-project history overlay, no forced --dangerously-skip-permissions. Claude's built-in /sandbox handles the kernel-isolation primitives; claudebox does network policy + Nix glue.

Scope and limits

Right now, there are likely files on your machine you'd rather an attacker not exfiltrate — an unencrypted SSH key, an agenix age key, mail server credentials, your ~/.aws/credentials. This section describes what the sandbox does and does not keep them safe from. The defaults are not no-op; they protect against things you may not have catalogued.

Explicitly in scope:

Reducing blast radius of model misbehavior to "I lost an hour of work" rather than "my SSH keys are on pastebin."
Making the easy path the safer one — fewer footguns, less to remember.
Knowing which tier I'm in for any given session, and switching deliberately. See THREAT-MODEL.md for the posture ladder.

What this sandbox protects:

Reads of well-known credential paths are denied. SSH keys, GPG keys, AWS/GCP creds, agenix/sops secrets, Tailscale state — the standard list of dotfiles and runtime secret locations. Enforced by sandbox.filesystem.denyRead at the syscall layer. (Reasoning, including the list-drift caveat.)
Writes outside the working directory are denied by default (Claude Code's /sandbox default policy). The agent cannot overwrite your ~/.bashrc, drop a hook into ~/.claude/hooks/, or touch anything else in $HOME without being explicitly allowed.
The agent cannot reach internal-network hosts. CGNAT (Tailscale, etc.), RFC1918, MagicDNS, link-local — all dropped by nftables matched on cgroup membership. This one is structural: kernel-enforced, won't drift, fires at packet emit time. (Why this holds.)

The network block is the strongest claim — kernel rules matched on slice membership, no configuration list to forget. The credential-read denial is a hardened preset; the list is opinionated but finite, and unusual credential locations on your machine won't be covered unless you add them.

What it does not guarantee:

Anything in the working directory that you wouldn't want public — .env files, hardcoded credentials, customer-data test fixtures, database dumps — can be exfiltrated through allowed network destinations (GitHub, npm, Anthropic API, anything you've permitted). Source code itself is rarely the worry; LLMs have made code largely commodity. The issue is what's next to the code in the same dir. The sandbox confines the session; it does not protect what flows out of it. Code review at commit/push time is the control for that leg. (CWD exfil reasoning. · Code review as control.)
Defense against an attacker with specific knowledge of your setup. claudebox is good for untargeted attacks (random injections, generic exfil payloads). It is not sufficient against someone actively targeting you who knows your dotfile layout, dependency stack, CI pipeline, or homelab topology. For higher-risk work, escalate to a remote VM or managed sandbox — see THREAT-MODEL.md.

If you want to skip the sandbox for a session — you trust this task, you need full homelab access, you're decrypting agenix locally — run bare claude instead. The choice happens at the binary name. No flag inside the wrapper turns the sandbox off; that would be a false-safety footgun.

Flags

Flag	Description
`--yes`, `-y`	Skip the audit prompt and launch immediately
`--dry-run`	Print the launch command without executing
`--check`	Verify prerequisites (claude, jq, systemd-run, nftables chain) and exit
`--no-slice`	Skip the systemd slice scope (CIDR block won't apply — for debugging)
`--`	Pass remaining args to Claude Code

How it works

project root/
└── .claude/
    └── settings.local.json   # managed by claudebox (sandbox.* keys),
                              # user keys preserved on merge

On launch the wrapper:

Computes the canonical project root (worktree-aware via git rev-parse --git-common-dir).
Deep-merges the hardened sandbox.* config into .claude/settings.local.json. Existing top-level keys (model, env, MCP servers, etc.) are kept; the sandbox subtree is replaced wholesale.
Shows an audit of what's being applied, asks for confirmation.
Execs systemd-run --user --scope --slice=claude-sandbox.slice -- claude "$@".

Inside that slice, two things happen in parallel:

Claude Code reads settings.local.json and activates its built-in /sandbox — bwrap + seccomp + namespace isolation + hostname-allowlisted proxy.
The kernel nftables rules (installed by the NixOS module) fire on every connect() from any socket inside claude-sandbox.slice, dropping packets bound for internal CIDRs.

Together: kernel-isolated process for the session, kernel-enforced CIDR block for the network, hostname allowlist on top.

Requirements

NixOS with flakes enabled (the NixOS module is the value-add — without it, claudebox falls back to the same set of guarantees as plain /sandbox).
jq, systemd-run, and claude on PATH (bundled via the flake's runtimeInputs).
cgroup v2 (default on every modern systemd setup).
Kernel with socket cgroupv2 nftables match (default on NixOS).

License

MIT