What This Post Covers
This is a practical guide to running multiple Claude Code instances in parallel on a shared codebase, fully autonomous, with git as the synchronization layer. You’ll get:
- A loop that keeps Claude Code running autonomously
- Docker containers for isolation
- Git-based synchronization between agents
- A simple task-locking mechanism so agents don’t step on each other
- Scripts to manage the whole thing
The approach is intentionally minimal. No orchestration framework, no custom tooling beyond bash scripts and Docker.
How It Works
The architecture has four parts:
- A bash loop restarts Claude Code after every session. Each session picks up one task, completes it (or documents why it couldn’t), and exits. The loop gives it a fresh context window for the next task.
- Docker containers isolate each agent. Every agent gets its own container with its own clone of the repo. No shared filesystem, no stepping on each other’s files.
- A bare git repo acts as the synchronization layer. Agents push and pull just like developers would. Every container mounts this repo as a volume.
- Lock files in git prevent two agents from working on the same task. Before starting work, an agent commits a lock file. If another agent already claimed it, the push gets rejected.
Project Structure
Here’s everything you’ll create. The numbered steps below walk through each file.
your-project/├── run-agent.sh # Step 1: Agent loop with backoff├── AGENT_PROMPT.md # Step 2: What Claude does each session├── TODO.md # Task list agents read and update├── current_tasks/ # Lock files for active tasks│ └── .gitkeep├── session_logs/ # Per-session summaries│ └── .gitkeep├── Dockerfile # Step 3: Container image├── entrypoint.sh # Step 3: Container startup├── setup-upstream.sh # Step 4: Bare repo initialization├── .env # Step 5: API key└── docker-compose.yml # Step 5: Multi-agent orchestrationStep 1: The Agent Loop
Create run-agent.sh in the root of your project. This is the foundation — a bash loop that restarts Claude Code every time it finishes.
#!/bin/bash
PROMPT_FILE="${PROMPT_FILE:-AGENT_PROMPT.md}"BACKOFF=0MAX_BACKOFF=300
while true; do if [ "$BACKOFF" -gt 0 ]; then echo "[$(date)] Waiting ${BACKOFF}s before retry..." >> agent_logs/loop.log sleep "$BACKOFF" fi
COMMIT=$(git rev-parse --short=6 HEAD) TIMESTAMP=$(date +%Y%m%d-%H%M%S) LOGFILE="agent_logs/agent_${TIMESTAMP}_${COMMIT}.log"
mkdir -p agent_logs
claude --dangerously-skip-permissions \ -p "$(cat "$PROMPT_FILE")" \ --model claude-opus-4-6 &> "$LOGFILE"
EXIT_CODE=$? echo "[$(date)] Session ended with code $EXIT_CODE" >> agent_logs/loop.log
if [ $EXIT_CODE -ne 0 ]; then BACKOFF=$(( BACKOFF == 0 ? 5 : BACKOFF * 2 )) BACKOFF=$(( BACKOFF > MAX_BACKOFF ? MAX_BACKOFF : BACKOFF )) else BACKOFF=0 fidone$ chmod +x run-agent.sh--dangerously-skip-permissions is what makes it autonomous — Claude won’t stop to ask for confirmation on file edits, shell commands, or anything else. This is why you run it in a container and not on your actual machine.
The --model flag is optional. It defaults to whatever you have configured. Opus is the most capable for complex tasks but also the most expensive. Sonnet works fine for smaller scoped work.
Each session gets logged with a timestamp and the current commit hash, so you can trace what Claude did and when.
The backoff logic handles API outages and rate limits gracefully. If Claude exits with an error, the loop waits 5 seconds, then 10, 20, 40… up to 5 minutes. On a successful session, the delay resets to zero. Without this, a loop can burn through hundreds of failed sessions in minutes during an outage.
Step 2: The Agent Prompt
Create AGENT_PROMPT.md in the root of your project. This is the prompt Claude receives at the start of every session.
# Task
You are an autonomous agent working on [project description].
## Current State
Read TODO.md for the list of remaining tasks. Check session_logs/for recent session summaries to understand what other agents havebeen doing and what the current state of the project looks like.
## Instructions
1. Pull the latest changes from upstream2. Read TODO.md and recent session logs3. Pick ONE task that isn't locked by another agent (check current_tasks/)4. Create a lock file: current_tasks/your-task-name.txt with your agent ID and timestamp5. Commit and push the lock file immediately so other agents can see it6. Work on the task until it's done or you're stuck7. Run the test suite to make sure nothing is broken8. Update TODO.md9. Write a session summary to session_logs/ (see below)10. Commit your changes, pull from upstream, resolve any conflicts, push11. Remove your lock file and stop
## Session Summary
Before finishing, write a short summary tosession_logs/YYYY-MM-DD-HHMMSS-agent-id.md containing:
- What task you worked on- What you changed- Whether it was completed or if it's still in progress- Any issues you ran into or things the next agent should know
## Rules
- One task per session. Complete it or document why you couldn't, then stop.- Do not break existing tests- Keep commits small and focused- If you're stuck on something for more than 3 attempts, document the issue in STUCK.md and move on- Always pull before pushingEach session handles exactly one task. Claude picks it up, works on it, writes a summary, and exits. The loop restarts it with a fresh context window for the next task. This prevents context degradation — Claude gets less effective the longer a session runs, so short focused sessions produce better results than long ones.
The key elements:
- TODO.md tracks what needs to be done. Every session starts by reading it, so each Claude instance knows what’s left.
- session_logs/ captures what happened in each session — including failed attempts and current project state. The next agent reads recent logs to understand context, learn from what didn’t work, and see how the project has evolved.
- current_tasks/ is the locking mechanism. Before starting a task, an agent creates a file here and pushes it. If two agents try to lock the same task simultaneously, the second agent’s push gets rejected — it pulls, sees the lock, and picks something else.
- The instruction to document failures in STUCK.md prevents agents from spinning on the same problem forever.
You’ll want to tune this prompt for your specific project. The more concrete your instructions, the better Claude stays on track.
Now create the directories agents expect and add a TODO.md with your tasks:
$ mkdir -p current_tasks session_logs$ touch current_tasks/.gitkeep session_logs/.gitkeep$ git add run-agent.sh AGENT_PROMPT.md current_tasks/.gitkeep session_logs/.gitkeep$ git commit -m "init: add agent loop, prompt, and coordination directories"Create a TODO.md with tasks for the agents to work on:
# TODO
- [ ] Task one description- [ ] Task two description- [ ] Task three description$ git add TODO.md$ git commit -m "init: add task list"Step 3: Docker Setup
Each agent runs in its own container with a clone of the repo. Create the Dockerfile:
FROM node:22-bookworm
RUN apt-get update && apt-get install -y \ git \ curl \ build-essential \ && rm -rf /var/lib/apt/lists/*
# Install Claude CodeRUN curl -fsSL https://claude.ai/install.sh | bash
COPY entrypoint.sh /entrypoint.shRUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]And the entrypoint.sh that clones the repo and starts the loop:
#!/bin/bash
AGENT_ID="${AGENT_ID:-agent-$(hostname | head -c 8)}"
# Clone from the shared bare repogit clone /upstream /workcd /work
git config user.name "$AGENT_ID"git config user.email "${AGENT_ID}@agents.local"
echo "[$AGENT_ID] Starting agent loop..."exec ./run-agent.sh$ chmod +x entrypoint.shThe base image is node:22-bookworm because Claude Code requires Node.js. The entrypoint clones from a shared bare git repo (mounted at /upstream) into /work, configures git identity using the agent ID, and hands off to the loop.
Step 4: The Shared Repository
The bare git repo is how agents share code. Every container mounts it as a volume. Create setup-upstream.sh:
#!/bin/bash
UPSTREAM_DIR="./upstream.git"
if [ ! -d "$UPSTREAM_DIR" ]; then git init --bare "$UPSTREAM_DIR" echo "Created bare repo at $UPSTREAM_DIR"fi
# Push your project into itgit remote add upstream "$UPSTREAM_DIR" 2>/dev/nullgit push upstream mainecho "Pushed main branch to upstream"$ chmod +x setup-upstream.shThis creates a bare repository and pushes your current project (including run-agent.sh, AGENT_PROMPT.md, TODO.md, and the coordination directories) into it. Every container will clone from this repo, and every push goes back to it.
Step 5: Docker Compose
Create a .env file with your Anthropic API key:
ANTHROPIC_API_KEY=sk-ant-...Then the docker-compose.yml that ties everything together:
services: agent-1: build: . environment: - AGENT_ID=agent-1 - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
agent-2: build: . environment: - AGENT_ID=agent-2 - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
agent-3: build: . environment: - AGENT_ID=agent-3 - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
agent-4: build: . environment: - AGENT_ID=agent-4 - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstreamEach service gets its own container, its own agent ID, your API key, and a shared mount to the bare repo.
Task Locking
Worth understanding how the locking works before you run this. When an agent picks a task, it creates a file in current_tasks/ and pushes it immediately:
current_tasks/├── fix-parser-error.txt # locked by agent-1├── add-type-checking.txt # locked by agent-2└── .gitkeepEach lock file contains the agent ID and a timestamp:
agent: agent-1started: 2026-02-16T14:30:00Zdescription: Fixing the parser error on nested function callsWhen the agent finishes, it removes the lock file, commits, and pushes. The next agent that pulls will see the task is available again if it wasn’t completed, or gone from TODO.md if it was.
This isn’t bulletproof. If two agents try to lock the same task simultaneously, the second agent’s push gets rejected by git. It then has to pull, at which point it sees the lock and picks a different task. Git’s push rejection handles most cases, and Claude is smart enough to resolve these situations.
Running It
With all the files in place, the full startup is three commands:
$ ./setup-upstream.sh$ docker compose build$ docker compose upTo watch what’s happening:
# Follow a specific agent's logs$ docker compose logs -f agent-1
# See what tasks are currently locked$ git --git-dir=upstream.git show HEAD:current_tasks/
# Check the latest session log$ git --git-dir=upstream.git log -1 --oneline -- session_logs/Scale down to fewer agents by commenting out services in the compose file, or scale up with the script in the next section.
Everything below this line is optional. The core setup above is complete — you have agents running, coordinating, and pushing code. The sections that follow cover scaling, specialized roles, testing strategies, guardrails, and monitoring.
Scaling with a Script
If you want more than a handful of agents, generating the compose file by hand gets tedious. A script handles it:
#!/bin/bash
NUM_AGENTS="${1:-4}"COMPOSE_FILE="docker-compose.yml"
cat > "$COMPOSE_FILE" <<EOFservices:EOF
for i in $(seq 1 "$NUM_AGENTS"); do cat >> "$COMPOSE_FILE" <<EOF agent-${i}: build: . environment: - AGENT_ID=agent-${i} - ANTHROPIC_API_KEY=\${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
EOFdone
echo "Generated $COMPOSE_FILE with $NUM_AGENTS agents"echo "Run: docker compose up --build"$ chmod +x spawn-agents.sh$ ./spawn-agents.sh 16Generated docker-compose.yml with 16 agentsRun: docker compose up --buildSpecialized Agents
Not every agent needs the same prompt. You can dedicate agents to different roles — some fixing bugs, others refactoring, one maintaining documentation — using different prompt files. The PROMPT_FILE environment variable in run-agent.sh controls which prompt each agent uses.
#!/bin/bash
# Generate a compose file with specialized rolescat > docker-compose.yml <<EOFservices:EOF
# Main workersfor i in $(seq 1 8); do cat >> docker-compose.yml <<EOF worker-${i}: build: . environment: - AGENT_ID=worker-${i} - ANTHROPIC_API_KEY=\${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
EOFdone
# Specialist agents with custom promptscat >> docker-compose.yml <<EOF quality: build: . environment: - AGENT_ID=quality - PROMPT_FILE=prompts/quality.md - ANTHROPIC_API_KEY=\${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
docs: build: . environment: - AGENT_ID=docs - PROMPT_FILE=prompts/docs.md - ANTHROPIC_API_KEY=\${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstream
tests: build: . environment: - AGENT_ID=tests - PROMPT_FILE=prompts/tests.md - ANTHROPIC_API_KEY=\${ANTHROPIC_API_KEY} volumes: - ./upstream.git:/upstreamEOF
echo "Generated docker-compose.yml with 8 workers + 3 specialists"echo "Run: docker compose up --build"A quality agent prompt might look like:
# Task
You are a code quality agent. Your job is to review recent commitsand improve code quality.
## Instructions
1. Pull latest changes2. Look at the last 10 commits for code smells, duplication, or bugs3. If you find issues, fix them4. Do not change functionality — only improve structure and clarity5. Run tests to make sure nothing breaks6. Commit and pushCommit the prompt files to your repo so agents can access them after cloning.
Writing Good Tests
This is the most important part. Autonomous agents will solve whatever the tests tell them to solve. If your tests are wrong or incomplete, Claude will confidently produce the wrong thing.
Some things that help:
Keep test output short. Claude’s context window is finite. A test suite that dumps 10,000 lines of output will drown out useful information. Print a summary, log details to a file.
#!/bin/bash
LOGFILE="test_results/$(date +%Y%m%d-%H%M%S).log"mkdir -p test_results
# Run tests, capture full output to log./test-suite.sh > "$LOGFILE" 2>&1
# Print only the summaryTOTAL=$(grep -c "^TEST" "$LOGFILE")PASSED=$(grep -c "^TEST.*PASS" "$LOGFILE")FAILED=$(grep -c "^TEST.*FAIL" "$LOGFILE")
echo "Tests: $PASSED/$TOTAL passed, $FAILED failed"
if [ "$FAILED" -gt 0 ]; then echo "" echo "Failures:" grep "^TEST.*FAIL" "$LOGFILE" | head -20 echo "" echo "Full log: $LOGFILE"fiInclude a fast mode. A full test suite might take 30 minutes. Claude doesn’t need to run the whole thing every iteration. A --fast flag that runs a random 10% sample keeps things moving while still catching regressions.
Use deterministic sampling. Each agent should run the same subset consistently (so it can identify regressions), but different agents should cover different subsets. Seed the random sample with the agent ID.
Prefix errors consistently. ERROR: description on a single line makes it easy for Claude to grep for problems. Don’t scatter error information across multiple lines or formats.
Guarding the Repo
With multiple agents pushing code autonomously, it’s easy for one bad commit to cascade. A pre-receive hook on the bare repo runs the fast test suite before accepting any push. If tests fail, the push is rejected and the agent has to fix it first.
#!/bin/bash
WORK_DIR=$(mktemp -d)trap "rm -rf $WORK_DIR" EXIT
# Check out the incoming codewhile read oldrev newrev refname; do git --work-tree="$WORK_DIR" checkout -f "$newrev" 2>/dev/nulldone
# Run the fast test suitecd "$WORK_DIR"if [ -f "./run-tests.sh" ]; then ./run-tests.sh --fast > /tmp/pre-receive-test.log 2>&1 if [ $? -ne 0 ]; then echo "ERROR: Tests failed. Push rejected." echo "" tail -20 /tmp/pre-receive-test.log exit 1 fifi$ chmod +x upstream.git/hooks/pre-receiveThis is the single highest-value addition to the whole setup. Without it, an agent pushes broken code, three other agents pull it, build on top of it, and now you have four agents all working on a broken foundation.
Stale Lock Cleanup
If an agent crashes mid-task — container dies, API error, whatever — its lock file stays in current_tasks/ forever. Other agents see the lock and avoid the task, so it never gets picked up again.
A simple cleanup script removes locks older than a threshold:
#!/bin/bash
MAX_AGE_HOURS="${1:-2}"REPO_DIR="${2:-./upstream.git}"WORK_DIR=$(mktemp -d)trap "rm -rf $WORK_DIR" EXIT
git clone "$REPO_DIR" "$WORK_DIR" 2>/dev/nullcd "$WORK_DIR"
CLEANED=0
for lock in current_tasks/*.txt; do [ -f "$lock" ] || continue
# Extract the timestamp from the lock file STARTED=$(grep "^started:" "$lock" | awk '{print $2}') if [ -z "$STARTED" ]; then continue fi
# Compare with current time (GNU date) LOCK_EPOCH=$(date -d "$STARTED" +%s 2>/dev/null) NOW_EPOCH=$(date +%s) AGE_HOURS=$(( (NOW_EPOCH - LOCK_EPOCH) / 3600 ))
if [ "$AGE_HOURS" -ge "$MAX_AGE_HOURS" ]; then echo "Removing stale lock: $lock (${AGE_HOURS}h old)" git rm "$lock" CLEANED=$((CLEANED + 1)) fidone
if [ "$CLEANED" -gt 0 ]; then git commit -m "cleanup: remove $CLEANED stale lock(s)" git push origin mainfiThis script uses GNU date -d, so run it on a Linux host or inside a container. Run it on a cron every 30 minutes or so:
$ crontab -e*/30 * * * * /path/to/cleanup-locks.sh 2 /path/to/upstream.gitTwo hours is a reasonable default. Most tasks finish well within that. If you have longer-running tasks, bump the threshold.
Slack Notifications
A post-receive hook on the bare repo posts to Slack whenever an agent pushes. Lets you passively monitor progress from your phone without watching logs.
#!/bin/bash
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"REPO_NAME=$(basename $(pwd) .git)
while read oldrev newrev refname; do BRANCH=$(echo "$refname" | sed 's|refs/heads/||') AUTHOR=$(git log -1 --format="%an" "$newrev") MESSAGE=$(git log -1 --format="%s" "$newrev") CHANGED=$(git diff --stat "$oldrev" "$newrev" 2>/dev/null | tail -1)
PAYLOAD=$(cat <<EOF{ "text": "*${REPO_NAME}* — ${AUTHOR} pushed to \`${BRANCH}\`\n>${MESSAGE}\n\`\`\`${CHANGED}\`\`\`"}EOF)
curl -s -X POST -H "Content-Type: application/json" \ -d "$PAYLOAD" "$SLACK_WEBHOOK" > /dev/nulldone$ chmod +x upstream.git/hooks/post-receiveYou get a message per push with the agent name, commit message, and a quick diffstat. Enough to know things are moving without opening a terminal.
Monitoring
A monitoring script gives you the full picture at a glance:
#!/bin/bash
UPSTREAM="upstream.git"
echo "=== Agent Status ==="echo ""
for container in $(docker compose ps -q); do NAME=$(docker inspect --format '{{.Name}}' "$container" | sed 's/^\/\+//') STATUS=$(docker inspect --format '{{.State.Status}}' "$container") LAST_LOG=$(docker logs --tail 1 "$container" 2>&1)
echo "$NAME [$STATUS]: $LAST_LOG"done
echo ""echo "=== Git Status ==="echo "Commits in last hour: $(git --git-dir="$UPSTREAM" log --since='1 hour ago' --oneline | wc -l | tr -d ' ')"echo "Latest commit: $(git --git-dir="$UPSTREAM" log -1 --oneline)"echo ""echo "=== Active Tasks ==="git --git-dir="$UPSTREAM" show HEAD:current_tasks/ 2>/dev/null \ | grep -v .gitkeep || echo "None"Run it in a watch loop:
$ watch -n 30 ./monitor.shCost
This burns through API credits fast. Some rough numbers based on Opus 4.6 pricing:
- A single agent session runs maybe 10-30 minutes depending on the task
- Each session uses roughly 50-200k input tokens and 5-20k output tokens
- With 4 agents running continuously for 8 hours, expect $200-800 depending on task complexity
Sonnet is significantly cheaper if your tasks don’t need Opus-level reasoning. For straightforward bug fixes, refactoring, or test writing, Sonnet handles it fine at a fraction of the cost.
Limitations
Worth being honest about where this falls apart:
- Merge conflicts get messy. Claude handles simple conflicts fine. Complex ones — especially in the same function — can produce broken merges. More agents means more conflicts.
- No real communication between agents. The shared files (TODO.md, session logs) are a rough approximation. Agents can’t discuss a design decision or coordinate on an approach.
- Claude can get stuck in loops. Without a human to course-correct, an agent might spend an entire session trying the same failing approach repeatedly. The STUCK.md convention helps, but doesn’t fully solve this.
- Tests are the bottleneck. The quality of autonomous output is directly limited by the quality of your test suite. No tests, no guardrails.
- Context window resets. Each new session starts fresh. Claude has to re-read project files every time, which wastes tokens and time. Session logs mitigate this but don’t eliminate it.
Wrapping Up
The core setup is five files: an agent loop, a prompt, a Dockerfile, a setup script, and a compose file. Everything else — scaling scripts, specialized agents, pre-receive hooks, monitoring — builds on top of that foundation.
The hard part isn’t the infrastructure — it’s writing good tests and prompts that keep agents productive without supervision. Most of your time should go into the test harness and the agent prompt, not the scaffolding.
Start small. Run one agent on a well-tested project and watch what it does. Add more agents once you’re confident the tests catch regressions. Scale up the ambition from there.