claudebox/.planning/phases/01-minimal-viable-sandbox/01-02-PLAN.md

---
phase: 01-minimal-viable-sandbox
plan: 02
type: execute
wave: 2
depends_on: ["01-01"]
files_modified: []
autonomous: false
requirements:
  - NIX-03
  - SAND-02
  - SAND-03
  - SAND-04
  - SAND-05
  - SAND-06
  - SAND-09
  - SAND-10
  - SAND-12
  - SAND-13
  - SAND-14
  - TOOL-01
  - TOOL-02

must_haves:
  truths:
    - "`nix build` succeeds and produces a claudebox binary"
    - "claudebox launches and env inside sandbox contains only allowlisted vars"
    - "Secret paths are invisible inside the sandbox"
    - "DNS and SSL work (curl https succeeds)"
    - "comma and nix shell can install packages"
    - "Exit code passes through from claude to caller"
  artifacts: []
  key_links:
    - from: "nix build result"
      to: "claudebox binary"
      via: "result/bin/claudebox symlink"
      pattern: "result/bin/claudebox"
---

<objective>
Build the claudebox flake and verify the sandbox works end-to-end through automated smoke tests and manual verification.

Purpose: Confirm the sandbox actually isolates secrets, passes through tools, and runs Claude Code successfully.
Output: Verified working claudebox command.
</objective>

<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/01-minimal-viable-sandbox/01-CONTEXT.md
@.planning/phases/01-minimal-viable-sandbox/01-01-SUMMARY.md
@flake.nix
@claudebox.sh
</context>

<tasks>

<task type="auto">
  <name>Task 1: Build flake and run automated smoke tests</name>
  <files></files>
  <read_first>
    flake.nix
    claudebox.sh
  </read_first>
  <action>
Run the following commands sequentially, fixing any issues that arise:

**Step 1: Build the flake**
```bash
cd /home/toph/code/tools/claudebox
nix build
```
If this fails, read the error and fix `flake.nix` or `claudebox.sh` as needed. Common issues:
- shellcheck errors in claudebox.sh (fix the shell code)
- Missing flake.lock (nix build will create it on first run)
- Package name mismatches (verify against nixpkgs)

**Step 2: Verify the binary exists**
```bash
ls -la result/bin/claudebox
```

**Step 3: Run a minimal bwrap test without Claude**
To test the sandbox without needing Claude, run just the bwrap portion to verify mounts and env isolation. Extract the bwrap invocation concept and test key properties:

```bash
# Test that the built script at least starts (will fail at claude lookup if claude not in PATH, that's ok)
# Instead, test bwrap directly using the same flags pattern:

# Test 1: Verify --clearenv produces empty env
result/bin/claudebox 2>&1 || true
# If claude is found, it will launch. If not, we get the expected error.
```

Since claudebox requires `claude` in PATH and will exec into it, automated testing is limited. The key automated checks are:

1. `nix build` succeeds (shellcheck passes, all deps resolve)
2. `result/bin/claudebox` exists and is executable
3. The script content in the Nix store passes basic sanity: `cat result/bin/claudebox` shows the wrapper with correct PATH setup

Run:
```bash
# Check the built wrapper contains expected runtimeInputs in PATH
cat result/bin/claudebox | head -20
```

If `nix build` fails due to shellcheck issues in claudebox.sh, fix them. Common shellcheck fixes:
- SC2086: Double-quote variable expansions
- SC2034: Unused variables (may need `# shellcheck disable=SC2034` if intentional)
- SC2155: Declare and assign separately

After build succeeds, if `claude` is available on the host PATH, run a quick sandbox test:
```bash
# Quick test: launch claudebox with --help to verify it starts and exits cleanly
result/bin/claudebox --help 2>&1 | head -5 || true
```
This should show Claude Code's help output if everything is wired correctly, or show a meaningful error.
  </action>
  <verify>
    <automated>test -x /home/toph/code/tools/claudebox/result/bin/claudebox && echo "PASS: binary exists" || echo "FAIL: binary missing"</automated>
  </verify>
  <acceptance_criteria>
    - `nix build` exits 0 (no shellcheck errors, all deps resolve)
    - `result/bin/claudebox` exists and is executable
    - `flake.lock` exists (created by first build)
    - The built wrapper script in the Nix store contains runtimeInputs PATH entries (visible in `cat result/bin/claudebox`)
  </acceptance_criteria>
  <done>nix build succeeds and produces an executable claudebox binary</done>
</task>

<task type="checkpoint:human-verify" gate="blocking">
  <name>Task 2: Manual sandbox verification</name>
  <files></files>
  <action>Present the verification checklist below to the user and wait for their confirmation that each check passes.</action>
  <what-built>Complete claudebox sandbox wrapping Claude Code with environment isolation, filesystem isolation, secret hiding, git support, and tool provisioning</what-built>
  <how-to-verify>
1. Launch claudebox from a project directory:
   ```
   cd ~/some-project
   /home/toph/code/tools/claudebox/result/bin/claudebox
   ```

2. Inside the Claude session, verify environment isolation:
   - Ask Claude to run `env | sort` -- should show ONLY allowlisted vars (HOME, PATH, TERM, USER, SHELL, TMPDIR, etc.)
   - Confirm NO appearance of: SSH_AUTH_SOCK, AWS_PROFILE, GITHUB_TOKEN, or any secret vars

3. Verify filesystem isolation:
   - Ask Claude to run `ls ~/.ssh` -- should fail (directory not found)
   - Ask Claude to run `ls ~/.gnupg` -- should fail
   - Ask Claude to run `ls ~/.aws` -- should fail
   - Ask Claude to run `ls ~/.claude` -- should succeed (mapped from ~/.claudebox)

4. Verify tools work:
   - Ask Claude to run `git status` -- should work in the project dir
   - Ask Claude to run `curl -s https://example.com | head -5` -- should return HTML (DNS + SSL work)
   - Ask Claude to run `, jq --help | head -3` -- should install and run jq via comma
   - Ask Claude to run `rg --version` -- should show ripgrep version

5. Exit Claude (Ctrl+C or /exit) and verify:
   - The shell returns to your normal prompt
   - `echo $?` shows the exit code from Claude (typically 0)
  </how-to-verify>
  <verify>
    <automated>echo "CHECKPOINT: requires human verification"</automated>
  </verify>
  <done>User confirms all sandbox isolation and tool provisioning checks pass</done>
  <resume-signal>Type "approved" if all checks pass, or describe any issues found</resume-signal>
</task>

</tasks>

<threat_model>
## Trust Boundaries

| Boundary | Description |
|----------|-------------|
| Build output -> Runtime | Nix build produces the sandbox script; verification confirms it behaves as designed |

## STRIDE Threat Register

| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-01-08 | Information Disclosure | Env leak in built binary | mitigate | Manual verification (Task 2 step 2) confirms only allowlisted vars appear in `env` output inside sandbox |
| T-01-09 | Information Disclosure | Secret path accessible | mitigate | Manual verification (Task 2 step 3) confirms ~/.ssh, ~/.gnupg, ~/.aws are not visible |
</threat_model>

<verification>
1. `nix build` exits 0
2. Human confirms env isolation (only allowlisted vars visible)
3. Human confirms filesystem isolation (secret paths invisible)
4. Human confirms tools work (git, curl, comma, ripgrep)
5. Human confirms clean exit behavior
</verification>

<success_criteria>
- claudebox builds from the Nix flake without errors
- Human verifies the sandbox isolates secrets and provides working tools
- Phase 1 success criteria from ROADMAP.md are met
</success_criteria>

<output>
After completion, create `.planning/phases/01-minimal-viable-sandbox/01-02-SUMMARY.md`
</output>