How to install airflow-dag-patterns
npx skills add https://github.com/wshobson/agents --skill airflow-dag-patternsFull instructions (SKILL.md)
Source of truth, from wshobson/agents.
name: airflow-dag-patterns description: Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.
Apache Airflow DAG Patterns
Production-ready patterns for Apache Airflow including DAG design, operators, sensors, testing, and deployment strategies.
When to Use This Skill
- Creating data pipeline orchestration with Airflow
- Designing DAG structures and dependencies
- Implementing custom operators and sensors
- Testing Airflow DAGs locally
- Setting up Airflow in production
- Debugging failed DAG runs
Core Concepts
1. DAG Design Principles
| Principle | Description |
|---|---|
| Idempotent | Running twice produces same result |
| Atomic | Tasks succeed or fail completely |
| Incremental | Process only new/changed data |
| Observable | Logs, metrics, alerts at every step |
2. Task Dependencies
# Linear
task1 >> task2 >> task3
# Fan-out
task1 >> [task2, task3, task4]
# Fan-in
[task1, task2, task3] >> task4
# Complex
task1 >> task2 >> task4
task1 >> task3 >> task4
Quick Start
# dags/example_dag.py
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.empty import EmptyOperator
default_args = {
'owner': 'data-team',
'depends_on_past': False,
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=5),
'retry_exponential_backoff': True,
'max_retry_delay': timedelta(hours=1),
}
with DAG(
dag_id='example_etl',
default_args=default_args,
description='Example ETL pipeline',
schedule='0 6 * * *', # Daily at 6 AM
start_date=datetime(2024, 1, 1),
catchup=False,
tags=['etl', 'example'],
max_active_runs=1,
) as dag:
start = EmptyOperator(task_id='start')
def extract_data(**context):
execution_date = context['ds']
# Extract logic here
return {'records': 1000}
extract = PythonOperator(
task_id='extract',
python_callable=extract_data,
)
end = EmptyOperator(task_id='end')
start >> extract >> end
Detailed patterns and worked examples
Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.
Best Practices
Do's
- Use TaskFlow API - Cleaner code, automatic XCom
- Set timeouts - Prevent zombie tasks
- Use
mode='reschedule'- For sensors, free up workers - Test DAGs - Unit tests and integration tests
- Idempotent tasks - Safe to retry
Don'ts
- Don't use
depends_on_past=True- Creates bottlenecks - Don't hardcode dates - Use
{{ ds }}macros - Don't use global state - Tasks should be stateless
- Don't skip catchup blindly - Understand implications
- Don't put heavy logic in DAG file - Import from modules
Related skills
More from wshobson/agents and the wider catalog.
tailwind-design-system
Build production-ready design systems with Tailwind CSS v4, design tokens, and component libraries.
typescript-advanced-types
Master TypeScript's advanced type system: generics, conditional types, mapped types, and utility types for type-safe applications.
nodejs-backend-patterns
Build production-ready Node.js backends with Express/Fastify, middleware patterns, auth, and database integration.
python-performance-optimization
Profile and optimize Python code using cProfile, memory profilers, and performance best practices.
brand-landingpage
Brand-first landing page designer with guided interviews and Stitch-powered iteration.
python-testing-patterns
Implement comprehensive testing strategies with pytest, fixtures, mocking, and test-driven development.