made my services go-live playbook to run against services before go-live.
This commit is contained in:
145
playbooks/service-golive.md
Normal file
145
playbooks/service-golive.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Playbook: Service Go-Live Review
|
||||
|
||||
Use this playbook before exposing any service to external access through Nginx Proxy Manager (NPM).
|
||||
When invoked, read the project directory in the current working directory and work through each section as an interactive checklist.
|
||||
|
||||
---
|
||||
|
||||
## How to Use
|
||||
|
||||
Tell the AI: _"Use the service-golive playbook to review this project"_
|
||||
|
||||
The AI will:
|
||||
1. Read the project files in the current directory
|
||||
2. Work through each section below
|
||||
3. For each item — report PASS, FAIL, or WARN with specific findings
|
||||
4. At the end, give a go/no-go recommendation
|
||||
|
||||
Do not proceed to the next section until the current one is resolved or explicitly deferred.
|
||||
|
||||
---
|
||||
|
||||
## Section 1: Feature & Improvement Review
|
||||
|
||||
Goal: Catch missing functionality before users find it.
|
||||
|
||||
- [ ] Does the service have a health check endpoint (e.g. `/health` or `/ping`)?
|
||||
- [ ] Are all intended routes/endpoints implemented and reachable?
|
||||
- [ ] Is there a meaningful error response for bad input (not raw stack traces)?
|
||||
- [ ] Are there any obvious UX gaps or incomplete flows in the UI (if applicable)?
|
||||
- [ ] Is there logging in place to capture errors and key events?
|
||||
- [ ] Are there any TODO/FIXME/HACK comments in the code that indicate unfinished work?
|
||||
- [ ] Does the service handle its own startup failures gracefully (exits cleanly, logs reason)?
|
||||
|
||||
**AI Action:** List any gaps found with file and line references. Ask the user whether to fix now or defer.
|
||||
|
||||
---
|
||||
|
||||
## Section 2: Performance Review
|
||||
|
||||
Goal: Ensure the service won't collapse under real load.
|
||||
|
||||
- [ ] Are database queries using indexes on columns used in WHERE/JOIN/ORDER BY clauses?
|
||||
- [ ] Are N+1 query patterns present (loop that fires a query per item)?
|
||||
- [ ] Is connection pooling configured for the database?
|
||||
- [ ] Are large responses paginated?
|
||||
- [ ] Are any blocking operations (file I/O, external API calls) being done synchronously in an async context?
|
||||
- [ ] Are static assets (if any) being served through Nginx, not the app?
|
||||
- [ ] Is there any unbounded data being loaded into memory (e.g. `SELECT *` with no limit)?
|
||||
- [ ] Are background tasks or scheduled jobs using a proper queue/worker model (not threading hacks)?
|
||||
- [ ] Is Gzip/Brotli compression enabled in Nginx for text responses?
|
||||
|
||||
**AI Action:** Flag any issues with specific file references. Suggest fixes. Ask user to confirm or defer.
|
||||
|
||||
---
|
||||
|
||||
## Section 3: Security Audit
|
||||
|
||||
Goal: Do not put a vulnerable service on the internet. Be thorough.
|
||||
|
||||
### 3a. Secrets & Credentials
|
||||
- [ ] No hardcoded passwords, tokens, API keys, or secrets in any source file
|
||||
- [ ] `.env` file is in `.gitignore` and not committed
|
||||
- [ ] `.env.example` exists with placeholder values only
|
||||
- [ ] No secrets in Docker Compose files (use `env_file` or environment variable references, not literal values)
|
||||
- [ ] No secrets in Nginx config files
|
||||
|
||||
### 3b. Authentication & Authorization
|
||||
- [ ] All non-public endpoints require authentication
|
||||
- [ ] Authentication tokens/sessions have an expiry
|
||||
- [ ] Password hashing uses bcrypt, argon2, or scrypt — not MD5/SHA1
|
||||
- [ ] There is no default admin password that ships with the service
|
||||
- [ ] Role/permission checks exist if the app has multiple access levels
|
||||
- [ ] Failed login attempts are rate-limited or account-locked after N failures
|
||||
|
||||
### 3c. Input Validation & Injection
|
||||
- [ ] All user input is validated server-side (not just client-side)
|
||||
- [ ] SQL queries use parameterized statements or ORM — no string concatenation
|
||||
- [ ] File upload paths are sanitized — no path traversal possible
|
||||
- [ ] HTML output is escaped to prevent XSS (or a framework handles this automatically)
|
||||
- [ ] Redirects only go to allowed/relative URLs — no open redirect
|
||||
- [ ] JSON deserialization does not allow arbitrary object instantiation
|
||||
|
||||
### 3d. HTTP & Nginx Security Headers
|
||||
Verify the Nginx config for the proxy host includes:
|
||||
- [ ] `X-Frame-Options: DENY` or `SAMEORIGIN`
|
||||
- [ ] `X-Content-Type-Options: nosniff`
|
||||
- [ ] `X-XSS-Protection: 1; mode=block`
|
||||
- [ ] `Referrer-Policy: strict-origin-when-cross-origin`
|
||||
- [ ] `Content-Security-Policy` header defined (even if broad to start)
|
||||
- [ ] `Strict-Transport-Security` (HSTS) with `max-age` >= 31536000
|
||||
- [ ] Server version header suppressed (`server_tokens off`)
|
||||
- [ ] Unnecessary HTTP methods disabled (e.g. TRACE, DELETE if not used)
|
||||
|
||||
### 3e. TLS / HTTPS
|
||||
- [ ] TLS certificate is valid and not self-signed for production
|
||||
- [ ] HTTP traffic redirects to HTTPS (not served in parallel)
|
||||
- [ ] TLS 1.0 and 1.1 disabled — only TLS 1.2+ allowed
|
||||
- [ ] Weak cipher suites disabled
|
||||
- [ ] Certificate expiry is monitored (NPM auto-renews, but verify it's configured)
|
||||
|
||||
### 3f. Docker & Container Security
|
||||
- [ ] Containers do not run as root (check `user:` in Compose or Dockerfile `USER` instruction)
|
||||
- [ ] No container has `privileged: true` unless there is a documented reason
|
||||
- [ ] No unnecessary host volume mounts (especially `/var/run/docker.sock` unless intentional)
|
||||
- [ ] Container images are not using `latest` tag in production
|
||||
- [ ] Docker socket is not exposed to the external network
|
||||
- [ ] Resource limits (`mem_limit`, `cpus`) are set on containers
|
||||
|
||||
**AI Action:** Run the following tools if available:
|
||||
- `bandit -r . -ll` — Python static security analysis
|
||||
- `trivy fs . --severity HIGH,CRITICAL` — dependency and filesystem CVE scan
|
||||
- `docker scout cves <image>` — container image vulnerability scan
|
||||
|
||||
Report all FAIL/WARN findings. Do not proceed to go-live recommendation until critical issues are resolved.
|
||||
|
||||
### 3g. Network & Exposure
|
||||
- [ ] Only port 80/443 are exposed publicly — no app ports (e.g. 8000, 3000) directly open to internet
|
||||
- [ ] NPM proxy host has access list or basic auth if the service is internal-only
|
||||
- [ ] Rate limiting is configured in Nginx or the app for API endpoints
|
||||
- [ ] The service does not expose an admin panel (e.g. `/admin`, `/dashboard`) without additional auth
|
||||
- [ ] Database ports (3306, 5432, 6379) are NOT exposed beyond the Docker network
|
||||
- [ ] SSH is not running inside any container
|
||||
|
||||
### 3h. Dependency & Supply Chain
|
||||
- [ ] Dependencies are pinned to specific versions (not `*` or `latest`)
|
||||
- [ ] Known CVEs in dependencies? (run `trivy fs .` or `pip-audit` / `npm audit`)
|
||||
- [ ] No abandoned or unmaintained packages with known issues
|
||||
- [ ] Docker base images are from official/verified sources
|
||||
|
||||
---
|
||||
|
||||
## Section 4: Go-Live Decision
|
||||
|
||||
After all sections are complete:
|
||||
|
||||
- List all unresolved FINDs grouped by severity: **CRITICAL / HIGH / MEDIUM / LOW**
|
||||
- **CRITICAL or HIGH unresolved = NO GO.** These must be fixed before external access.
|
||||
- **MEDIUM/LOW unresolved** = user decides whether to defer with documented acceptance
|
||||
- Provide a final summary:
|
||||
- Total checks: X
|
||||
- Passed: X
|
||||
- Failed (critical): X
|
||||
- Failed (non-critical): X
|
||||
- Deferred: X
|
||||
- **Recommendation: GO / NO GO / GO WITH CONDITIONS**
|
||||
Reference in New Issue
Block a user