PluginBench
Skill
Pass
Audit score 90

deployment-pipeline-design

wshobson/agents

Design multi-stage CI/CD pipelines with approval gates, security checks, and progressive delivery strategies.

What is deployment-pipeline-design?

This skill helps you architect robust deployment pipelines that balance speed with safety through proper stage organization, automated quality gates, and progressive delivery strategies like canary and blue-green deployments. Use it when designing zero-downtime pipelines, implementing rollout strategies, setting up multi-environment promotion workflows, or debugging failed deployment gates.

  • Design multi-stage pipeline architecture with job dependencies and parallelism
  • Configure deployment strategies including canary, blue-green, and rolling updates
  • Set up health checks and automated quality gates between environments
  • Define approval workflows and security scanning requirements
  • Establish automated rollback triggers based on metrics and manual runbook steps
  • Optimize Docker layer caching and dependency management in CI/CD

How to install deployment-pipeline-design

npx skills add https://github.com/wshobson/agents --skill deployment-pipeline-design
Prerequisites
  • Understanding of your application type (language, containerized vs bare-metal, monolith vs microservices)
  • Knowledge of your deployment target (Kubernetes, ECS, VMs, serverless, or PaaS)
  • Defined environment topology (dev/staging/prod, regions, air-gap requirements)
  • Access to your monitoring stack (Prometheus, Datadog, CloudWatch, etc.)
  • Familiarity with your CI/CD platform (GitHub Actions, GitLab CI, Azure Pipelines, etc.)
Claude Code
Cursor
Windsurf
Cline

How to use deployment-pipeline-design

  1. 1.Provide application type, deployment target, and environment topology
  2. 2.Define rollout requirements (acceptable downtime, rollback SLA, traffic splitting preference)
  3. 3.Specify gate constraints (approval teams, test coverage thresholds, compliance scans)
  4. 4.Configure health checks with deep readiness probes that verify actual dependencies
  5. 5.Set up automated metric thresholds for promotion gates and rollback triggers
  6. 6.Test pipeline stages with shallow health checks before production deployment
  7. 7.Implement backward-compatible database migrations to support safe rollbacks

Use cases

Good for
  • Design CI/CD architecture for a new service or platform migration
  • Implement deployment gates with mandatory security scanning between environments
  • Configure canary deployments with Prometheus-based promotion decisions
  • Debug pipelines where stages succeed but production behavior fails
  • Reduce mean time to recovery by automating rollback on metric degradation
Who it's for
  • Platform engineers designing CI/CD infrastructure
  • DevOps engineers implementing deployment strategies
  • Backend engineers setting up production pipelines
  • SREs establishing reliability and rollback procedures
  • Teams migrating to Kubernetes or container-based deployments

deployment-pipeline-design FAQ

Why does the health check pass in the pipeline but the service is unhealthy in production?

The pipeline health check is likely hitting a shallow endpoint (e.g., /ping) that returns 200 even when dependencies like databases are unreachable. Use a deep readiness check that verifies actual dependencies instead.

Why does a canary deployment never promote to 100%?

Argo Rollouts requires a valid AnalysisTemplate with proper Prometheus queries. If the metric name changed or returns no data, the analysis stays inconclusive. Add inconclusiveLimit to fail fast rather than hang indefinitely.

Why does the staging deploy succeed but the production job never starts?

Check that production environment protection rules are configured with required reviewers assigned. A missing reviewer assignment means the approval gate waits indefinitely with no notification.

How do I prevent Docker layer cache from being busted on every run?

Reorder your Dockerfile to copy dependency manifests (package.json, requirements.txt) before source code, then run dependency installation, then copy source. This keeps the dependency layer cached separately from source changes.

How do I safely rollback when database migrations have been applied?

Make migrations backward-compatible (additive only) for at least one release cycle. Keep undo scripts versioned alongside migrations and never run destructive migrations (DROP COLUMN, ALTER NOT NULL) until old code is fully retired from all environments.

Full instructions (SKILL.md)

Source of truth, from wshobson/agents.


name: deployment-pipeline-design description: Design multi-stage CI/CD pipelines with approval gates, security checks, and deployment orchestration. Use this skill when designing zero-downtime deployment pipelines, implementing canary rollout strategies, setting up multi-environment promotion workflows, or debugging failed deployment gates in CI/CD.

Deployment Pipeline Design

Architecture patterns for multi-stage CI/CD pipelines with approval gates, deployment strategies, and environment promotion workflows.

Purpose

Design robust, secure deployment pipelines that balance speed with safety through proper stage organization, automated quality gates, and progressive delivery strategies. This skill covers both the structural design of pipeline architecture and the operational patterns for reliable production deployments.

Input / Output

What You Provide

  • Application type: Language/runtime, containerized or bare-metal, monolith or microservices
  • Deployment target: Kubernetes, ECS, VMs, serverless, or platform-as-a-service
  • Environment topology: Number of environments (dev/staging/prod), region layout, air-gap requirements
  • Rollout requirements: Acceptable downtime, rollback SLA, traffic splitting needs, canary vs blue-green preference
  • Gate constraints: Approval teams, required test coverage thresholds, compliance scans (SAST, DAST, SCA)
  • Monitoring stack: Prometheus, Datadog, CloudWatch, or other metrics sources used for automated promotion decisions

What This Skill Produces

  • Pipeline configuration: Stage definitions, job dependencies, parallelism, and caching strategy
  • Deployment strategy: Chosen rollout pattern with annotated configuration (canary weights, blue-green switchover, rolling parameters)
  • Health check setup: Shallow vs deep readiness probes, post-deployment smoke test scripts
  • Gate definitions: Automated metric thresholds and manual approval workflows
  • Rollback plan: Automated rollback triggers and manual runbook steps

When to Use

  • Design CI/CD architecture for a new service or platform migration
  • Implement deployment gates between environments
  • Configure multi-environment pipelines with mandatory security scanning
  • Establish progressive delivery with canary or blue-green strategies
  • Debug pipelines where stages succeed but production behavior is wrong
  • Reduce mean time to recovery by automating rollback on metric degradation

Detailed patterns and worked examples

Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.

Troubleshooting

Health check passes in pipeline but service is unhealthy in production

The pipeline health check is hitting a shallow /ping endpoint that returns 200 even when the database is unreachable. Use a deep readiness check that verifies actual dependencies (see Health Checks section above).

Canary deployment never promotes to 100%

Argo Rollouts requires a valid AnalysisTemplate to auto-promote. If the Prometheus query returns no data (e.g., metric name changed), the analysis stays inconclusive and promotion stalls. Add inconclusiveLimit so the rollout fails fast rather than hanging:

spec:
  metrics:
  - name: error-rate
    failureCondition: "result[0] > 0.05"
    inconclusiveLimit: 2   # fail after 2 inconclusive results, not hang indefinitely
    provider:
      prometheus:
        query: |
          sum(rate(http_requests_total{status=~"5.."}[2m]))
          / sum(rate(http_requests_total[2m]))

Staging deploy succeeds but production job never starts

Check that production environment protection rules are configured — a missing reviewer assignment means the approval gate waits indefinitely with no notification. In GitHub Actions, ensure Required reviewers is set to an existing user or team in Settings → Environments → production.

Docker layer cache busted on every run causing slow builds

If COPY . . appears before dependency installation, any source file change invalidates the dependency layer. Reorder to copy dependency manifests first:

# Good: dependencies cached separately from source code
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

Rollback leaves database migrations applied to old code

A service rollback without a migration rollback causes schema/code mismatch errors. Always make migrations backward-compatible (additive only) for at least one release cycle, and keep undo scripts versioned alongside the migration:

# migrations/V20240315__add_nullable_column.sql       (forward)
# migrations/V20240315__add_nullable_column.undo.sql  (backward)

Never run destructive migrations (DROP COLUMN, ALTER NOT NULL) until the old code version is fully retired from all environments.

Advanced Topics

For platform-specific pipeline configurations, multi-region promotion workflows, and advanced Argo Rollouts patterns, see:

  • references/advanced-strategies.md — Extended YAML examples, platform-specific configs (GitHub Actions, GitLab CI, Azure Pipelines), multi-region canary patterns, and database migration rollback strategies

Related Skills

  • github-actions-templates - For GitHub Actions implementation patterns and reusable workflows
  • gitlab-ci-patterns - For GitLab CI/CD pipeline implementation
  • secrets-management - For secrets handling in CI/CD pipelines