toph/claudebox

Fork 0

Christopher Mühl 28798102d0

docs(02): research phase domain

2026-04-09 15:39:57 +02:00

16 KiB

Raw Blame History

Phase 2: Env Audit and CLI Polish - Research

Researched: 2026-04-09 Domain: Bash CLI UX -- flag parsing, ANSI formatting, TTY detection Confidence: HIGH

Summary

This phase adds pre-launch transparency and diagnostic CLI flags to the existing claudebox.sh script. All work is pure bash -- no new dependencies, no new Nix packages, no external tools. The existing code already has the data structures (ENV_ARGS, HOST_ALLOWLIST, CLAUDEBOX_EXTRA_ENV parsing) and the flag parsing skeleton (case/esac). The phase extends these with display logic, confirmation prompts, and two new early-exit modes (--dry-run, --check).

The main technical concerns are: (1) correctly iterating ENV_ARGS for display without breaking the bwrap invocation, (2) ANSI escape code portability, (3) non-interactive stdin detection, and (4) shellcheck compliance since writeShellApplication enforces it at build time.

Primary recommendation: Build parallel data structures for audit display rather than parsing the ENV_ARGS array. Track env vars in associative arrays by category during construction, then format for display separately from the bwrap --setenv pairs.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

D-01: Group env vars by source in three labeled sections: "Sandbox-generated", "Host (allowlisted)", "Extra (CLAUDEBOX_EXTRA_ENV)". Display to stderr.
D-02: PATH is split by : and displayed one entry per line (indented under PATH=) for readability.
D-03: Use plain ANSI escape codes for color/formatting -- no external dependency like gum. Bold section headers, colored section labels.
D-04: Auto-mask values where the variable name matches *KEY*, *TOKEN*, *SECRET*, *PASSWORD*, *CREDENTIAL* (case-insensitive). Show first 7 + last 4 characters with ... in between.
D-05: Use Proceed? [Y/n] prompt -- default is proceed (Enter launches). User must type n or no to abort.
D-06: If stdin is not a TTY (piped input, CI, scripts), abort with error telling user to pass --yes/-y. Do NOT auto-proceed.
D-07: All audit output and the prompt go to stderr. Keeps stdout clean.
D-08: --yes / -y skips the env audit display and confirmation entirely -- launches immediately.
D-09: --dry-run prints the full bwrap command without executing. Claude's discretion on format.
D-10: --check verifies prerequisites: bwrap exists, required Nix packages available, ~/.claudebox exists. Claude's discretion on depth/format.

Claude's Discretion

--dry-run output format (single-line vs multiline, annotated vs raw)
--check diagnostic depth (existence-only vs version checks vs connectivity tests)
Exact ANSI color choices and spacing
Flag parsing order and error messages for invalid flag combinations

Deferred Ideas (OUT OF SCOPE)

None -- discussion stayed within phase scope. </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
UX-01	Pre-launch env audit displays all env vars being passed into the sandbox on stderr	Audit display patterns, ANSI formatting, ENV_ARGS iteration approach
UX-02	Pre-launch env audit prompts for confirmation before proceeding	TTY detection, read prompt pattern, non-interactive abort
UX-03	`--yes` / `-y` flag skips the env audit confirmation	Flag parsing extension to existing case/esac
UX-04	`--dry-run` flag prints the full bwrap command without executing	Command reconstruction from arrays, printf %q quoting
UX-05	`--check` flag verifies bwrap exists, required Nix packages available, and ~/.claudebox exists	command -v checks, exit code conventions
</phase_requirements>

Standard Stack

No new packages required. Everything is bash builtins and existing runtimeInputs.

Tool	Source	Purpose	Already Available
bash builtins	runtimeInputs	ANSI output, read, test, printf	Yes
jq	runtimeInputs	Not needed for this phase	Yes (unused)
coreutils	runtimeInputs	tput alternative -- but ANSI codes are simpler	Yes

Architecture Patterns

Integration Points in claudebox.sh

The current script flow is linear:

1. Parse flags (--shell)
2. Resolve binaries
3. Record CWD, ensure ~/.claudebox
4. Generate gitconfig
5. Build ENV_ARGS array
6. Build SANDBOX_CMD
7. exec bwrap

Phase 2 inserts into this flow:

1. Parse flags (--shell, --yes/-y, --dry-run, --check)  [EXTEND]
2. --check: early exit                                    [NEW]
3. Resolve binaries
4. Record CWD, ensure ~/.claudebox
5. Generate gitconfig
6. Build ENV_ARGS array
7. Build SANDBOX_CMD
8. Env audit display + confirmation (unless --yes)        [NEW]
9. --dry-run: print command and exit                      [NEW]
10. exec bwrap

Pattern: Parallel Display Data

Rather than parsing ENV_ARGS (which is --setenv key value triplets), maintain separate display-oriented arrays during construction. This avoids fragile parsing of the bwrap args array.

# During env construction, also track for display
declare -A SANDBOX_VARS    # sandbox-generated vars
declare -A HOST_VARS       # host allowlisted vars
declare -A EXTRA_VARS      # CLAUDEBOX_EXTRA_ENV vars

[VERIFIED: reading claudebox.sh -- the three categories already have distinct code blocks that can populate these]

Pattern: ANSI Escape Codes

# Color constants -- define once at top
BOLD=$'\033[1m'
RESET=$'\033[0m'
DIM=$'\033[2m'
CYAN=$'\033[36m'
YELLOW=$'\033[33m'
GREEN=$'\033[32m'
RED=$'\033[31m'

[ASSUMED] These are standard VT100/ECMA-48 sequences supported by all modern terminals. No tput dependency needed.

Pattern: Value Masking (D-04)

mask_value() {
  local name="$1" value="$2"
  # Case-insensitive match on var name
  if [[ "${name^^}" == *KEY* || "${name^^}" == *TOKEN* || "${name^^}" == *SECRET* || "${name^^}" == *PASSWORD* || "${name^^}" == *CREDENTIAL* ]]; then
    local len=${#value}
    if (( len > 11 )); then
      echo "${value:0:7}...${value: -4}"
    else
      echo "***"
    fi
  else
    echo "$value"
  fi
}

Note: ${name^^} converts to uppercase in bash 4+. NixOS ships bash 5.x, so this is safe. [VERIFIED: NixOS uses bash 5.x from nixpkgs]

Pattern: PATH Display (D-02)

display_path() {
  echo "  PATH="
  IFS=':' read -ra path_entries <<< "$1"
  for entry in "${path_entries[@]}"; do
    echo "    $entry"
  done
}

Pattern: TTY Detection (D-06)

if [[ -t 0 ]]; then
  # Interactive -- show prompt
  read -r -p "Proceed? [Y/n] " response < /dev/tty
  # ...
else
  echo "Error: stdin is not a terminal. Pass --yes or -y to skip confirmation." >&2
  exit 1
fi

Important: Use read < /dev/tty rather than plain read because stdin may be consumed by pipes even when /dev/tty exists. The [[ -t 0 ]] check catches the non-interactive case. [ASSUMED]

Pattern: --dry-run Output

Recommend multiline format with one flag per line, matching the existing exec bwrap layout in the script. This makes it easy to diff against the actual invocation and spot issues.

if [[ "$DRY_RUN" == true ]]; then
  echo "bwrap \\" >&2
  echo "  --clearenv \\" >&2
  # ... each flag on its own line
  exit 0
fi

Use printf '%q ' for values that may contain special characters (though in practice, env values and paths are clean).

Pattern: --check Diagnostics

Recommend checking:

command -v bwrap -- is bwrap on PATH
command -v claude -- is claude on PATH
Key runtimeInputs: git, curl, nix, bash
~/.claudebox directory exists
ANTHROPIC_API_KEY is set (warn if missing, don't fail)

Output: one line per check with pass/fail indicator. Exit 0 if all required checks pass, exit 1 if any required check fails.

Pattern: Flag Parsing Extension

The existing parser uses for arg in "$@" with shift. Extend with additional cases. Important: --check should be checked first (early exit before env construction), but parsing order can collect all flags first, then branch.

SKIP_AUDIT=false
DRY_RUN=false
CHECK_MODE=false
SHELL_MODE=false
CLAUDE_ARGS=()

for arg in "$@"; do
  case "$arg" in
    --yes|-y) SKIP_AUDIT=true ;;
    --dry-run) DRY_RUN=true ;;
    --check) CHECK_MODE=true ;;
    --shell) SHELL_MODE=true ;;
    --) shift; break ;;
    *) CLAUDE_ARGS+=("$arg") ;;
  esac
done

Note: Current parsing uses shift and break which is problematic for multi-flag support. The refactored approach collects all flags in one pass and stores remaining args in CLAUDE_ARGS. [VERIFIED: current claudebox.sh only handles --shell with shift+break]

Anti-Patterns to Avoid

Parsing ENV_ARGS for display: The array contains --setenv key value triplets interleaved with bwrap flags. Iterating it for display is fragile. Track display data separately.
Using tput for colors: Adds an ncurses dependency. ANSI escape codes are sufficient and have no dependency.
Auto-proceeding in non-interactive mode: D-06 explicitly requires aborting. Don't silently proceed.
Echoing sensitive values without masking: D-04 requires masking KEY/TOKEN/SECRET/PASSWORD/CREDENTIAL patterns.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Color output	ncurses/tput wrapper	Raw ANSI escapes	D-03 mandates plain ANSI, zero deps
Argument parsing	getopt/getopts	Simple case/esac loop	Only 4 flags, no complex options, matches existing pattern

Common Pitfalls

Pitfall 1: Shellcheck Violations

What goes wrong: writeShellApplication runs shellcheck at build time. Bash-isms like ${name^^} or declare -A may trigger warnings. Why it happens: shellcheck defaults may flag associative arrays or uppercase expansion. How to avoid: Use # shellcheck disable=SCXXXX directives only when necessary. Test with shellcheck claudebox.sh locally before nix build. Associative arrays and ${var^^} are valid bash and shellcheck-clean in bash mode. Warning signs: nix build fails with shellcheck errors.

Pitfall 2: Flag Parsing Breaking Passthrough

What goes wrong: Claudebox consumes a flag meant for claude, or fails to pass remaining args. Why it happens: Current parsing uses shift+break which only handles one flag. How to avoid: Refactor to collect all known flags, accumulate unknown args in CLAUDE_ARGS array, pass CLAUDE_ARGS to claude. Warning signs: claudebox --model sonnet silently drops --model.

Pitfall 3: read Prompt in Non-TTY

What goes wrong: read hangs or reads garbage when stdin is piped. Why it happens: No TTY check before prompting. How to avoid: Check [[ -t 0 ]] before read. Read from /dev/tty explicitly. Warning signs: Script hangs in CI or when piped.

Pitfall 4: Masking Short Values

What goes wrong: Masking "first 7 + last 4" on a 5-character value reveals the whole thing. Why it happens: No length check before substring extraction. How to avoid: If value length <= 11, show *** instead of partial mask. Warning signs: Short API keys fully visible in audit output.

Pitfall 5: ANSI Codes in Redirected Output

What goes wrong: If stderr is redirected to a file, ANSI escape codes pollute the output. Why it happens: Colors are sent regardless of terminal capability. How to avoid: Optional: check [[ -t 2 ]] and suppress colors if stderr is not a terminal. This is discretionary per D-03, but good practice. Warning signs: Garbled text in log files.

Code Examples

Complete Flag Parsing (recommended)

# Parse claudebox flags -- collect our flags, pass the rest to claude
SKIP_AUDIT=false
DRY_RUN=false
CHECK_MODE=false
SHELL_MODE=false
CLAUDE_ARGS=()

while (( $# > 0 )); do
  case "$1" in
    --yes|-y) SKIP_AUDIT=true ;;
    --dry-run) DRY_RUN=true ;;
    --check) CHECK_MODE=true ;;
    --shell) SHELL_MODE=true ;;
    --) shift; CLAUDE_ARGS+=("$@"); break ;;
    *) CLAUDE_ARGS+=("$1") ;;
  esac
  shift
done

[ASSUMED] This is standard bash argument parsing. The while/shift pattern is more robust than the current for/shift/break.

Env Audit Display Function

print_audit() {
  local bold=$'\033[1m' reset=$'\033[0m'
  local cyan=$'\033[36m' yellow=$'\033[33m' green=$'\033[32m'

  echo "${bold}${cyan}=== Sandbox Environment ===${reset}" >&2
  echo "" >&2

  echo "${bold}Sandbox-generated:${reset}" >&2
  for var in HOME USER PATH SHELL TMPDIR XDG_RUNTIME_DIR NIX_SSL_CERT_FILE SSL_CERT_FILE; do
    if [[ "$var" == "PATH" ]]; then
      echo "  ${green}PATH=${reset}" >&2
      IFS=':' read -ra entries <<< "$SANDBOX_PATH"
      for entry in "${entries[@]}"; do
        echo "    $entry" >&2
      done
    else
      # Look up value from the sandbox vars
      echo "  ${green}${var}=${reset}${SANDBOX_DISPLAY[$var]}" >&2
    fi
  done
  # ... similar for host and extra sections
}

--check Implementation

run_check() {
  local pass=true
  local green=$'\033[32m' red=$'\033[31m' reset=$'\033[0m'

  check_cmd() {
    if command -v "$1" &>/dev/null; then
      echo "${green}OK${reset}  $1" >&2
    else
      echo "${red}FAIL${reset}  $1 -- not found" >&2
      pass=false
    fi
  }

  echo "claudebox prerequisites:" >&2
  check_cmd bwrap
  check_cmd claude
  check_cmd git
  check_cmd nix

  if [[ -d "$HOME/.claudebox" ]]; then
    echo "${green}OK${reset}  ~/.claudebox exists" >&2
  else
    echo "${red}FAIL${reset}  ~/.claudebox -- not found (will be created on first run)" >&2
  fi

  if [[ -v ANTHROPIC_API_KEY ]]; then
    echo "${green}OK${reset}  ANTHROPIC_API_KEY is set" >&2
  else
    echo "${yellow}WARN${reset}  ANTHROPIC_API_KEY is not set" >&2
  fi

  if [[ "$pass" == true ]]; then
    exit 0
  else
    exit 1
  fi
}

Assumptions Log

#	Claim	Section	Risk if Wrong
A1	ANSI VT100 escape codes work in all target terminals	Architecture Patterns	Low -- NixOS terminals universally support ANSI
A2	`read < /dev/tty` is the correct pattern for prompting when stdin may be piped	Architecture Patterns	Low -- standard Unix practice
A3	`${var^^}` uppercase expansion is shellcheck-clean	Pitfalls	Low -- shellcheck knows bash, would only flag if shell directive is sh
A4	while/shift is more robust than for/shift/break for multi-flag parsing	Code Examples	Very low -- well-established pattern

Open Questions

Should --dry-run also show the env audit?
- What we know: D-09 says print the full bwrap command. The env audit is separate (D-01).
- What's unclear: Whether --dry-run implies --yes (skip audit) or shows audit then command.
- Recommendation: --dry-run implies --yes -- the user wants to see the command, not be prompted. Show the command and exit.
Should --check be combinable with other flags?
- What we know: --check is an early-exit diagnostic.
- What's unclear: What if user passes --check --dry-run.
- Recommendation: --check takes priority, exits before other flags matter. No need to error on combinations.
Color disable for non-terminal stderr?
- What we know: D-03 says use ANSI codes. Doesn't mention disabling.
- What's unclear: Whether to add NO_COLOR or [[ -t 2 ]] detection.
- Recommendation: Respect the NO_COLOR env var convention if set. Otherwise, always emit ANSI. This is low-effort and follows modern CLI conventions.

Sources

Primary (HIGH confidence)

claudebox.sh -- current implementation, read directly
flake.nix -- current derivation structure, read directly
02-CONTEXT.md -- locked decisions D-01 through D-10
REQUIREMENTS.md -- UX-01 through UX-05 definitions

Secondary (MEDIUM confidence)

Bash reference manual (training data) -- builtins, parameter expansion, test operators
VT100/ECMA-48 escape codes (training data) -- ANSI color sequences

Metadata

Confidence breakdown:

Standard stack: HIGH -- no new packages, pure bash
Architecture: HIGH -- clear insertion points in existing script, straightforward patterns
Pitfalls: HIGH -- well-known bash gotchas, verified against existing code

Research date: 2026-04-09 Valid until: No expiry -- bash and ANSI codes are stable

16 KiB Raw Blame History