Files
AI/playbooks/service-golive.md

6.7 KiB

Playbook: Service Go-Live Review

Use this playbook before exposing any service to external access through Nginx Proxy Manager (NPM). When invoked, read the project directory in the current working directory and work through each section as an interactive checklist.


How to Use

Tell the AI: "Use the service-golive playbook to review this project"

The AI will:

  1. Read the project files in the current directory
  2. Work through each section below
  3. For each item — report PASS, FAIL, or WARN with specific findings
  4. At the end, give a go/no-go recommendation

Do not proceed to the next section until the current one is resolved or explicitly deferred.


Section 1: Feature & Improvement Review

Goal: Catch missing functionality before users find it.

  • Does the service have a health check endpoint (e.g. /health or /ping)?
  • Are all intended routes/endpoints implemented and reachable?
  • Is there a meaningful error response for bad input (not raw stack traces)?
  • Are there any obvious UX gaps or incomplete flows in the UI (if applicable)?
  • Is there logging in place to capture errors and key events?
  • Are there any TODO/FIXME/HACK comments in the code that indicate unfinished work?
  • Does the service handle its own startup failures gracefully (exits cleanly, logs reason)?

AI Action: List any gaps found with file and line references. Ask the user whether to fix now or defer.


Section 2: Performance Review

Goal: Ensure the service won't collapse under real load.

  • Are database queries using indexes on columns used in WHERE/JOIN/ORDER BY clauses?
  • Are N+1 query patterns present (loop that fires a query per item)?
  • Is connection pooling configured for the database?
  • Are large responses paginated?
  • Are any blocking operations (file I/O, external API calls) being done synchronously in an async context?
  • Are static assets (if any) being served through Nginx, not the app?
  • Is there any unbounded data being loaded into memory (e.g. SELECT * with no limit)?
  • Are background tasks or scheduled jobs using a proper queue/worker model (not threading hacks)?
  • Is Gzip/Brotli compression enabled in Nginx for text responses?

AI Action: Flag any issues with specific file references. Suggest fixes. Ask user to confirm or defer.


Section 3: Security Audit

Goal: Do not put a vulnerable service on the internet. Be thorough.

3a. Secrets & Credentials

  • No hardcoded passwords, tokens, API keys, or secrets in any source file
  • .env file is in .gitignore and not committed
  • .env.example exists with placeholder values only
  • No secrets in Docker Compose files (use env_file or environment variable references, not literal values)
  • No secrets in Nginx config files

3b. Authentication & Authorization

  • All non-public endpoints require authentication
  • Authentication tokens/sessions have an expiry
  • Password hashing uses bcrypt, argon2, or scrypt — not MD5/SHA1
  • There is no default admin password that ships with the service
  • Role/permission checks exist if the app has multiple access levels
  • Failed login attempts are rate-limited or account-locked after N failures

3c. Input Validation & Injection

  • All user input is validated server-side (not just client-side)
  • SQL queries use parameterized statements or ORM — no string concatenation
  • File upload paths are sanitized — no path traversal possible
  • HTML output is escaped to prevent XSS (or a framework handles this automatically)
  • Redirects only go to allowed/relative URLs — no open redirect
  • JSON deserialization does not allow arbitrary object instantiation

3d. HTTP & Nginx Security Headers

Verify the Nginx config for the proxy host includes:

  • X-Frame-Options: DENY or SAMEORIGIN
  • X-Content-Type-Options: nosniff
  • X-XSS-Protection: 1; mode=block
  • Referrer-Policy: strict-origin-when-cross-origin
  • Content-Security-Policy header defined (even if broad to start)
  • Strict-Transport-Security (HSTS) with max-age >= 31536000
  • Server version header suppressed (server_tokens off)
  • Unnecessary HTTP methods disabled (e.g. TRACE, DELETE if not used)

3e. TLS / HTTPS

  • TLS certificate is valid and not self-signed for production
  • HTTP traffic redirects to HTTPS (not served in parallel)
  • TLS 1.0 and 1.1 disabled — only TLS 1.2+ allowed
  • Weak cipher suites disabled
  • Certificate expiry is monitored (NPM auto-renews, but verify it's configured)

3f. Docker & Container Security

  • Containers do not run as root (check user: in Compose or Dockerfile USER instruction)
  • No container has privileged: true unless there is a documented reason
  • No unnecessary host volume mounts (especially /var/run/docker.sock unless intentional)
  • Container images are not using latest tag in production
  • Docker socket is not exposed to the external network
  • Resource limits (mem_limit, cpus) are set on containers

AI Action: Run the following tools if available:

  • bandit -r . -ll — Python static security analysis
  • trivy fs . --severity HIGH,CRITICAL — dependency and filesystem CVE scan
  • docker scout cves <image> — container image vulnerability scan

Report all FAIL/WARN findings. Do not proceed to go-live recommendation until critical issues are resolved.

3g. Network & Exposure

  • Only port 80/443 are exposed publicly — no app ports (e.g. 8000, 3000) directly open to internet
  • NPM proxy host has access list or basic auth if the service is internal-only
  • Rate limiting is configured in Nginx or the app for API endpoints
  • The service does not expose an admin panel (e.g. /admin, /dashboard) without additional auth
  • Database ports (3306, 5432, 6379) are NOT exposed beyond the Docker network
  • SSH is not running inside any container

3h. Dependency & Supply Chain

  • Dependencies are pinned to specific versions (not * or latest)
  • Known CVEs in dependencies? (run trivy fs . or pip-audit / npm audit)
  • No abandoned or unmaintained packages with known issues
  • Docker base images are from official/verified sources

Section 4: Go-Live Decision

After all sections are complete:

  • List all unresolved FINDs grouped by severity: CRITICAL / HIGH / MEDIUM / LOW
  • CRITICAL or HIGH unresolved = NO GO. These must be fixed before external access.
  • MEDIUM/LOW unresolved = user decides whether to defer with documented acceptance
  • Provide a final summary:
    • Total checks: X
    • Passed: X
    • Failed (critical): X
    • Failed (non-critical): X
    • Deferred: X
    • Recommendation: GO / NO GO / GO WITH CONDITIONS