Night-Orch Deployment Guide
This guide covers deploying night-orch on a Debian server behind Tailscale.
Prerequisites
- Debian 13+ server accessible only via Tailscale
- Root access for one-time system setup (
useradd, systemd, packages) - Node.js 24+ installed (via
mise,nvm, or system package) - Docker installed (
apt install docker.io docker-cli docker-compose) - Caddy installed (
apt install caddy)
Initial Setup
1. Create Dedicated Runtime User (Required)
Do not run night-orch as root. Create and use a dedicated service user:
useradd --create-home --shell /bin/bash orch
usermod -aG docker orch
sudo -iu orchIf you already switched users but echo $HOME is not /home/orch, re-enter with:
sudo -iu orch2. Install
npm install -g night-orchVerify: night-orch --help
3. Install Agent CLIs (as orch)
Install both worker CLIs for the non-root user:
npm install -g @openai/codex @anthropic-ai/claude-code
codex --version
claude --versionAuthenticate interactively (one time):
codex login
claude authIf any command still references /root/..., your session is not a clean orch login shell. Re-enter with sudo -iu orch and retry.
4. Configuration
Run the interactive setup wizard:
night-orch initOr configure manually:
mkdir -p ~/.night-orch
night-orch init # copies example config and walks through setup
# Or manually create ~/.night-orch/config.yamlCreate the environment file with secrets:
cat > ~/.night-orch/.env << 'EOF'
GITHUB_TOKEN=ghp_...
GRAFANA_ADMIN_PASSWORD=changeme
EOF
chmod 0600 ~/.night-orch/.env5. Monitoring Stack (Prometheus + Grafana)
Night-orch bundles Prometheus and Grafana configs. Extract them:
night-orch monitoring initThis creates ~/.config/night-orch/monitoring/ with a Docker Compose file, Prometheus scrape config, and Grafana dashboards.
For production, edit the compose file to bind ports to localhost only:
services:
prometheus:
ports:
- "127.0.0.1:9091:9090" # not "9091:9090"
grafana:
ports:
- "127.0.0.1:3001:3000"If using network_mode: host for Prometheus (to scrape host-local metrics without firewall issues), set --web.listen-address=127.0.0.1:9091 in the command.
Start the monitoring stack:
night-orch monitoring up6. AppArmor Fix for Docker
On Debian 13, the runc AppArmor profile breaks Docker containers:
aa-disable runcThis must be re-applied after aa-enforce /etc/apparmor.d/* or reboots that re-enforce profiles.
Running
Supervisor Mode (Recommended)
The night-orch serve command runs a supervisor that manages both the poller and web server as child processes, with self-update support:
night-orch serve --allowed-host night-orch.hllg.euThis spawns:
night-orch run— headless poller + MCP + metricsnight-orch web— web UI (attach mode)
The supervisor handles:
- Auto-respawn on child crash (with exponential backoff)
- Graceful drain and restart on self-update
- Rollback if an update fails or if post-update health checks fail
Systemd Unit
Create /etc/systemd/system/night-orch.service:
[Unit]
Description=Night-Orch Supervisor
After=network-online.target wait-for-tailscale.service docker.service
Wants=network-online.target
Requires=wait-for-tailscale.service
[Service]
Type=simple
User=orch
Group=orch
WorkingDirectory=/home/orch
Environment=HOME=/home/orch
Environment=XDG_CONFIG_HOME=/home/orch/.config
Environment=XDG_DATA_HOME=/home/orch/.local/share
Environment=XDG_CACHE_HOME=/home/orch/.cache
Environment=XDG_STATE_HOME=/home/orch/.local/state
Environment=PATH=/home/orch/.local/bin:/usr/local/bin:/usr/bin:/bin
ExecStartPre=/home/orch/.local/bin/night-orch monitoring up
ExecStart=/home/orch/.local/bin/night-orch serve --allowed-host night-orch.hllg.eu --config /home/orch/.night-orch/config.yaml
EnvironmentFile=-/home/orch/.night-orch/.env
Restart=on-failure
RestartSec=5
MemoryMax=6G
CPUQuota=300%
[Install]
WantedBy=multi-user.targetsystemctl daemon-reload
systemctl enable --now night-orchPorts (all localhost except Caddy)
| Port | Service | Binding |
|---|---|---|
| 3200 | night-orch web UI | 127.0.0.1 |
| 3100 | MCP HTTP/SSE | 127.0.0.1 |
| 9090 | night-orch metrics | 0.0.0.0 (for Prometheus) |
| 9091 | Prometheus | 127.0.0.1 |
| 3001 | Grafana | 127.0.0.1 |
| 443 | Caddy (HTTPS) | Tailscale IP |
Reverse Proxy (Caddy)
Caddy terminates TLS and provides basic auth. Configure /etc/caddy/Caddyfile:
{
auto_https off
}
night-orch.example.com {
bind <TAILSCALE_IP>
tls /etc/caddy/certs/cert.crt /etc/caddy/certs/cert.key
reverse_proxy 127.0.0.1:3200
basicauth {
admin <BCRYPT_HASH>
}
}
monitoring.example.com {
bind <TAILSCALE_IP>
tls /etc/caddy/certs/cert.crt /etc/caddy/certs/cert.key
reverse_proxy 127.0.0.1:3001
basicauth {
admin <BCRYPT_HASH>
}
}Generate the password hash: caddy hash-password
TLS Certificates
Since the server has no public IP (Tailscale only), standard ACME won't work. Options:
- Own CA: Issue certs from your own CA, place at
/etc/caddy/certs/ - DNS-01 ACME: Build Caddy with a DNS plugin (e.g., cloudflare), use a scoped API token for automatic Let's Encrypt certs
- Tailscale HTTPS: Use
tailscale certfor*.ts.netdomains
Caddy Systemd Drop-in
Caddy must wait for Tailscale since it binds to the Tailscale IP:
mkdir -p /etc/systemd/system/caddy.service.d
cat > /etc/systemd/system/caddy.service.d/tailscale.conf << 'EOF'
[Unit]
After=wait-for-tailscale.service
Requires=wait-for-tailscale.service
[Service]
MemoryMax=512M
CPUQuota=50%
EOF
systemctl daemon-reloadSelf-Update
Night-orch supports one-button self-update that installs the latest version and restarts all services without downtime.
From the Web UI
Click the Pull & Restart button in the Deploy section of the Operations panel. The button shows the current update state (draining, pulling, building, restarting, health-checking).
From the CLI
night-orch updateIf running under the supervisor, this sends an IPC message. Otherwise, it creates a trigger file that the supervisor picks up.
From MCP (Claude Code)
Use the night-orch-update tool.
Update Flow
The update mechanism auto-detects how night-orch was installed:
npm global install (default):
- Supervisor receives update trigger
- Sends SIGTERM to both children (waits up to 5 min for active runs to finish)
- Checks npm registry for latest version
npm install -g night-orch@latest- Respawns both children with new code
- Runs health checks (see below)
- On failure: rolls back via
npm install -g night-orch@<previous-version>
Git checkout (development):
- Supervisor receives update trigger
- Sends SIGTERM to both children
git pull --ff-onlypnpm install && pnpm build && pnpm install-global- Respawns both children with new code
- Runs health checks (see below)
- On failure: rolls back to previous commit, rebuilds, respawns
Health checks (both modes):
- run server (
/healthon MCP HTTP endpoint when MCP is enabled, otherwise process liveness stabilization) - web API (
/api/health) - web frontend (
/)
Update Status
Status is tracked at ~/.config/night-orch/update-status.json and available via GET /api/update-status. States: idle, draining, pulling, building, restarting, health-checking, rolling-back, failed.
Troubleshooting
Port already in use
If switching from separate night-orch-server + night-orch-web services to the supervisor, stop the old units first:
systemctl stop night-orch-server night-orch-web
systemctl disable night-orch-server night-orch-webDocker containers won't start
Check if the runc AppArmor profile is enforced:
aa-status | grep runc
# If it shows "enforce", disable it:
aa-disable runc
systemctl restart dockerWeb UI shows "Forbidden host"
Add --allowed-host <your-domain> to the night-orch serve command. The web server validates the Host header against allowed hostnames.
Grafana shows "no data"
Use this order so you distinguish "no runs yet" from a broken scrape quickly:
night-orch doctorDoctor now probes the metrics /healthz endpoint and reports one of:
ok(endpoint alive)not-ready(startup in progress)connection-refused(likely nonight-orch runprocess)timeout(handler/network stall)disabled-runtime(runtime override disabled metrics)
Then verify the process model:
night-orch runalways owns metrics.night-orch webin default attach mode does not bind:9090.night-orch web --standalonebinds metrics itself.
If you run attach-mode web without a companion run daemon, Grafana panels stay empty.
Next verify Prometheus target health:
curl http://127.0.0.1:9091/api/v1/targetsIf the target is down:
- Confirm Prometheus scrape target matches the daemon metrics port (default
9090). - Confirm
metrics.hostis reachable from Prometheus. For the default Docker stack this should be0.0.0.0. - If your scrape config uses
host.docker.internal, ensure your runtime supports it. On Linux/non-Docker-Desktop setups you may need an explicitextra_hostsmapping or a concrete host IP instead. - Re-check your compose port mapping in
docker-compose.example.yaml(line 5):127.0.0.1:9091:9090maps host9091to container9090(Prometheus UI), not to night-orch metrics. - Ensure reverse proxies do not intercept/rewrite
:9090; Prometheus should scrape night-orch directly, not through Caddy/nginx routes meant for web UI traffic.