diff --git a/playbooks/service-golive.md b/playbooks/service-golive.md new file mode 100644 index 0000000..54be4b4 --- /dev/null +++ b/playbooks/service-golive.md @@ -0,0 +1,145 @@ +# Playbook: Service Go-Live Review + +Use this playbook before exposing any service to external access through Nginx Proxy Manager (NPM). +When invoked, read the project directory in the current working directory and work through each section as an interactive checklist. + +--- + +## How to Use + +Tell the AI: _"Use the service-golive playbook to review this project"_ + +The AI will: +1. Read the project files in the current directory +2. Work through each section below +3. For each item — report PASS, FAIL, or WARN with specific findings +4. At the end, give a go/no-go recommendation + +Do not proceed to the next section until the current one is resolved or explicitly deferred. + +--- + +## Section 1: Feature & Improvement Review + +Goal: Catch missing functionality before users find it. + +- [ ] Does the service have a health check endpoint (e.g. `/health` or `/ping`)? +- [ ] Are all intended routes/endpoints implemented and reachable? +- [ ] Is there a meaningful error response for bad input (not raw stack traces)? +- [ ] Are there any obvious UX gaps or incomplete flows in the UI (if applicable)? +- [ ] Is there logging in place to capture errors and key events? +- [ ] Are there any TODO/FIXME/HACK comments in the code that indicate unfinished work? +- [ ] Does the service handle its own startup failures gracefully (exits cleanly, logs reason)? + +**AI Action:** List any gaps found with file and line references. Ask the user whether to fix now or defer. + +--- + +## Section 2: Performance Review + +Goal: Ensure the service won't collapse under real load. + +- [ ] Are database queries using indexes on columns used in WHERE/JOIN/ORDER BY clauses? +- [ ] Are N+1 query patterns present (loop that fires a query per item)? +- [ ] Is connection pooling configured for the database? +- [ ] Are large responses paginated? +- [ ] Are any blocking operations (file I/O, external API calls) being done synchronously in an async context? +- [ ] Are static assets (if any) being served through Nginx, not the app? +- [ ] Is there any unbounded data being loaded into memory (e.g. `SELECT *` with no limit)? +- [ ] Are background tasks or scheduled jobs using a proper queue/worker model (not threading hacks)? +- [ ] Is Gzip/Brotli compression enabled in Nginx for text responses? + +**AI Action:** Flag any issues with specific file references. Suggest fixes. Ask user to confirm or defer. + +--- + +## Section 3: Security Audit + +Goal: Do not put a vulnerable service on the internet. Be thorough. + +### 3a. Secrets & Credentials +- [ ] No hardcoded passwords, tokens, API keys, or secrets in any source file +- [ ] `.env` file is in `.gitignore` and not committed +- [ ] `.env.example` exists with placeholder values only +- [ ] No secrets in Docker Compose files (use `env_file` or environment variable references, not literal values) +- [ ] No secrets in Nginx config files + +### 3b. Authentication & Authorization +- [ ] All non-public endpoints require authentication +- [ ] Authentication tokens/sessions have an expiry +- [ ] Password hashing uses bcrypt, argon2, or scrypt — not MD5/SHA1 +- [ ] There is no default admin password that ships with the service +- [ ] Role/permission checks exist if the app has multiple access levels +- [ ] Failed login attempts are rate-limited or account-locked after N failures + +### 3c. Input Validation & Injection +- [ ] All user input is validated server-side (not just client-side) +- [ ] SQL queries use parameterized statements or ORM — no string concatenation +- [ ] File upload paths are sanitized — no path traversal possible +- [ ] HTML output is escaped to prevent XSS (or a framework handles this automatically) +- [ ] Redirects only go to allowed/relative URLs — no open redirect +- [ ] JSON deserialization does not allow arbitrary object instantiation + +### 3d. HTTP & Nginx Security Headers +Verify the Nginx config for the proxy host includes: +- [ ] `X-Frame-Options: DENY` or `SAMEORIGIN` +- [ ] `X-Content-Type-Options: nosniff` +- [ ] `X-XSS-Protection: 1; mode=block` +- [ ] `Referrer-Policy: strict-origin-when-cross-origin` +- [ ] `Content-Security-Policy` header defined (even if broad to start) +- [ ] `Strict-Transport-Security` (HSTS) with `max-age` >= 31536000 +- [ ] Server version header suppressed (`server_tokens off`) +- [ ] Unnecessary HTTP methods disabled (e.g. TRACE, DELETE if not used) + +### 3e. TLS / HTTPS +- [ ] TLS certificate is valid and not self-signed for production +- [ ] HTTP traffic redirects to HTTPS (not served in parallel) +- [ ] TLS 1.0 and 1.1 disabled — only TLS 1.2+ allowed +- [ ] Weak cipher suites disabled +- [ ] Certificate expiry is monitored (NPM auto-renews, but verify it's configured) + +### 3f. Docker & Container Security +- [ ] Containers do not run as root (check `user:` in Compose or Dockerfile `USER` instruction) +- [ ] No container has `privileged: true` unless there is a documented reason +- [ ] No unnecessary host volume mounts (especially `/var/run/docker.sock` unless intentional) +- [ ] Container images are not using `latest` tag in production +- [ ] Docker socket is not exposed to the external network +- [ ] Resource limits (`mem_limit`, `cpus`) are set on containers + +**AI Action:** Run the following tools if available: +- `bandit -r . -ll` — Python static security analysis +- `trivy fs . --severity HIGH,CRITICAL` — dependency and filesystem CVE scan +- `docker scout cves ` — container image vulnerability scan + +Report all FAIL/WARN findings. Do not proceed to go-live recommendation until critical issues are resolved. + +### 3g. Network & Exposure +- [ ] Only port 80/443 are exposed publicly — no app ports (e.g. 8000, 3000) directly open to internet +- [ ] NPM proxy host has access list or basic auth if the service is internal-only +- [ ] Rate limiting is configured in Nginx or the app for API endpoints +- [ ] The service does not expose an admin panel (e.g. `/admin`, `/dashboard`) without additional auth +- [ ] Database ports (3306, 5432, 6379) are NOT exposed beyond the Docker network +- [ ] SSH is not running inside any container + +### 3h. Dependency & Supply Chain +- [ ] Dependencies are pinned to specific versions (not `*` or `latest`) +- [ ] Known CVEs in dependencies? (run `trivy fs .` or `pip-audit` / `npm audit`) +- [ ] No abandoned or unmaintained packages with known issues +- [ ] Docker base images are from official/verified sources + +--- + +## Section 4: Go-Live Decision + +After all sections are complete: + +- List all unresolved FINDs grouped by severity: **CRITICAL / HIGH / MEDIUM / LOW** +- **CRITICAL or HIGH unresolved = NO GO.** These must be fixed before external access. +- **MEDIUM/LOW unresolved** = user decides whether to defer with documented acceptance +- Provide a final summary: + - Total checks: X + - Passed: X + - Failed (critical): X + - Failed (non-critical): X + - Deferred: X + - **Recommendation: GO / NO GO / GO WITH CONDITIONS**