Files
AI/playbooks/service-golive.md

146 lines
6.7 KiB
Markdown

# Playbook: Service Go-Live Review
Use this playbook before exposing any service to external access through Nginx Proxy Manager (NPM).
When invoked, read the project directory in the current working directory and work through each section as an interactive checklist.
---
## How to Use
Tell the AI: _"Use the service-golive playbook to review this project"_
The AI will:
1. Read the project files in the current directory
2. Work through each section below
3. For each item — report PASS, FAIL, or WARN with specific findings
4. At the end, give a go/no-go recommendation
Do not proceed to the next section until the current one is resolved or explicitly deferred.
---
## Section 1: Feature & Improvement Review
Goal: Catch missing functionality before users find it.
- [ ] Does the service have a health check endpoint (e.g. `/health` or `/ping`)?
- [ ] Are all intended routes/endpoints implemented and reachable?
- [ ] Is there a meaningful error response for bad input (not raw stack traces)?
- [ ] Are there any obvious UX gaps or incomplete flows in the UI (if applicable)?
- [ ] Is there logging in place to capture errors and key events?
- [ ] Are there any TODO/FIXME/HACK comments in the code that indicate unfinished work?
- [ ] Does the service handle its own startup failures gracefully (exits cleanly, logs reason)?
**AI Action:** List any gaps found with file and line references. Ask the user whether to fix now or defer.
---
## Section 2: Performance Review
Goal: Ensure the service won't collapse under real load.
- [ ] Are database queries using indexes on columns used in WHERE/JOIN/ORDER BY clauses?
- [ ] Are N+1 query patterns present (loop that fires a query per item)?
- [ ] Is connection pooling configured for the database?
- [ ] Are large responses paginated?
- [ ] Are any blocking operations (file I/O, external API calls) being done synchronously in an async context?
- [ ] Are static assets (if any) being served through Nginx, not the app?
- [ ] Is there any unbounded data being loaded into memory (e.g. `SELECT *` with no limit)?
- [ ] Are background tasks or scheduled jobs using a proper queue/worker model (not threading hacks)?
- [ ] Is Gzip/Brotli compression enabled in Nginx for text responses?
**AI Action:** Flag any issues with specific file references. Suggest fixes. Ask user to confirm or defer.
---
## Section 3: Security Audit
Goal: Do not put a vulnerable service on the internet. Be thorough.
### 3a. Secrets & Credentials
- [ ] No hardcoded passwords, tokens, API keys, or secrets in any source file
- [ ] `.env` file is in `.gitignore` and not committed
- [ ] `.env.example` exists with placeholder values only
- [ ] No secrets in Docker Compose files (use `env_file` or environment variable references, not literal values)
- [ ] No secrets in Nginx config files
### 3b. Authentication & Authorization
- [ ] All non-public endpoints require authentication
- [ ] Authentication tokens/sessions have an expiry
- [ ] Password hashing uses bcrypt, argon2, or scrypt — not MD5/SHA1
- [ ] There is no default admin password that ships with the service
- [ ] Role/permission checks exist if the app has multiple access levels
- [ ] Failed login attempts are rate-limited or account-locked after N failures
### 3c. Input Validation & Injection
- [ ] All user input is validated server-side (not just client-side)
- [ ] SQL queries use parameterized statements or ORM — no string concatenation
- [ ] File upload paths are sanitized — no path traversal possible
- [ ] HTML output is escaped to prevent XSS (or a framework handles this automatically)
- [ ] Redirects only go to allowed/relative URLs — no open redirect
- [ ] JSON deserialization does not allow arbitrary object instantiation
### 3d. HTTP & Nginx Security Headers
Verify the Nginx config for the proxy host includes:
- [ ] `X-Frame-Options: DENY` or `SAMEORIGIN`
- [ ] `X-Content-Type-Options: nosniff`
- [ ] `X-XSS-Protection: 1; mode=block`
- [ ] `Referrer-Policy: strict-origin-when-cross-origin`
- [ ] `Content-Security-Policy` header defined (even if broad to start)
- [ ] `Strict-Transport-Security` (HSTS) with `max-age` >= 31536000
- [ ] Server version header suppressed (`server_tokens off`)
- [ ] Unnecessary HTTP methods disabled (e.g. TRACE, DELETE if not used)
### 3e. TLS / HTTPS
- [ ] TLS certificate is valid and not self-signed for production
- [ ] HTTP traffic redirects to HTTPS (not served in parallel)
- [ ] TLS 1.0 and 1.1 disabled — only TLS 1.2+ allowed
- [ ] Weak cipher suites disabled
- [ ] Certificate expiry is monitored (NPM auto-renews, but verify it's configured)
### 3f. Docker & Container Security
- [ ] Containers do not run as root (check `user:` in Compose or Dockerfile `USER` instruction)
- [ ] No container has `privileged: true` unless there is a documented reason
- [ ] No unnecessary host volume mounts (especially `/var/run/docker.sock` unless intentional)
- [ ] Container images are not using `latest` tag in production
- [ ] Docker socket is not exposed to the external network
- [ ] Resource limits (`mem_limit`, `cpus`) are set on containers
**AI Action:** Run the following tools if available:
- `bandit -r . -ll` — Python static security analysis
- `trivy fs . --severity HIGH,CRITICAL` — dependency and filesystem CVE scan
- `docker scout cves <image>` — container image vulnerability scan
Report all FAIL/WARN findings. Do not proceed to go-live recommendation until critical issues are resolved.
### 3g. Network & Exposure
- [ ] Only port 80/443 are exposed publicly — no app ports (e.g. 8000, 3000) directly open to internet
- [ ] NPM proxy host has access list or basic auth if the service is internal-only
- [ ] Rate limiting is configured in Nginx or the app for API endpoints
- [ ] The service does not expose an admin panel (e.g. `/admin`, `/dashboard`) without additional auth
- [ ] Database ports (3306, 5432, 6379) are NOT exposed beyond the Docker network
- [ ] SSH is not running inside any container
### 3h. Dependency & Supply Chain
- [ ] Dependencies are pinned to specific versions (not `*` or `latest`)
- [ ] Known CVEs in dependencies? (run `trivy fs .` or `pip-audit` / `npm audit`)
- [ ] No abandoned or unmaintained packages with known issues
- [ ] Docker base images are from official/verified sources
---
## Section 4: Go-Live Decision
After all sections are complete:
- List all unresolved FINDs grouped by severity: **CRITICAL / HIGH / MEDIUM / LOW**
- **CRITICAL or HIGH unresolved = NO GO.** These must be fixed before external access.
- **MEDIUM/LOW unresolved** = user decides whether to defer with documented acceptance
- Provide a final summary:
- Total checks: X
- Passed: X
- Failed (critical): X
- Failed (non-critical): X
- Deferred: X
- **Recommendation: GO / NO GO / GO WITH CONDITIONS**