Workflow Automation
Automate responses to task failures with event-driven workflows.
Create workflows that respond automatically to task failures, orphan detection, or execution time thresholds. No code required — configure triggers and actions through the UI.
Why workflows?
If you've ever manually retried a batch of failed tasks at 2am or wished Slack would just tell you when things break, workflows solve that.
Example use cases:
- Auto-retry transient failures (network timeouts, rate limits)
- Send Slack alerts when critical tasks fail
- Recover orphaned tasks automatically
- Notify on-call engineers when failure rates spike
- Escalate to humans when retry limits are exceeded
How workflows work
Workflows consist of three parts:
- Trigger: What event kicks off the workflow (task failure, orphan detected, etc.)
- Conditions: Optional filters to narrow when the workflow runs (specific task names, queues, error patterns)
- Actions: What happens when triggered (retry task, send Slack message, webhook call)
Event → Conditions Check → Actions Execute → Result LoggedEach workflow runs independently with full execution history and rollback support.
Creating a workflow
Navigate to Workflows
From the dashboard, click Workflows in the sidebar, then Create Workflow.
Choose a trigger
Select what event should start this workflow:
- Task Failed: When any task transitions to FAILURE state
- Task Orphaned: When orphan detection flags an abandoned task
- Execution Time Exceeded: When a task runs longer than a threshold
- Worker Offline: When a worker stops sending heartbeats
Add conditions (optional)
Narrow when the workflow runs:
# Only trigger for specific task
task_name_pattern: "myapp.tasks.send_email"# Only trigger for high-priority queue
queue: "priority-high"# Only trigger for specific errors
error_pattern: "ConnectionError|Timeout"# Combine multiple conditions
task_name_pattern: "myapp.tasks.*"
queue: "email"
error_pattern: "SMTPException"Configure actions
Choose what happens when the workflow triggers:
Retry Task
- Retry with same arguments
- Configure delay between attempts
- Set maximum retry count
- Enable exponential backoff
Send Slack Notification
- Configure webhook URL
- Customize message template
- Include task context (name, args, error)
Call Webhook
- POST task details to external endpoint
- Include custom headers
- Configure timeout and retries
Set circuit breaker limits
Prevent infinite loops:
- Max executions: Stop after N workflow runs
- Time window: Reset counter after X hours
- On circuit open: Send notification instead of executing actions
Save and enable
Review your workflow configuration, then click Save and Enable.
The workflow activates immediately and begins monitoring for matching events.
Circuit breaker protection
Every workflow includes a circuit breaker to prevent runaway automation.
How it works:
- Workflow tracks execution count within a rolling time window (default: 1 hour)
- When count exceeds threshold (default: 100 executions), circuit opens
- While open, workflow sends notifications instead of executing actions
- Circuit auto-resets after time window elapses
Example scenario:
Time Window: 1 hour
Max Executions: 50
Hour 1: 45 executions → Circuit CLOSED, actions run normally
Hour 1: 51 executions → Circuit OPEN, notifications sent instead
Hour 2: Counter resets → Circuit CLOSED againCircuit breakers are critical for production deployments. A misconfigured retry workflow could create thousands of duplicate tasks if not limited.
Slack integration
Native Slack action for workflow notifications.
Create a Slack incoming webhook
- Go to Slack API: Incoming Webhooks
- Create a new webhook for your workspace
- Choose the channel for notifications
- Copy the webhook URL
Configure Slack action in workflow
In the workflow editor:
- Select Slack Notification action
- Paste your webhook URL
- Customize the message template
Available variables:
Task {task_name} failed on worker {worker}
Error: {error_message}
Args: {task_args}
Time: {timestamp}
Workflow: {workflow_name}Test the integration
Use the Test Workflow button to send a sample notification.
Verify it appears in your Slack channel with the correct formatting.
Retry orchestration
Built-in retry action with configurable strategies.
Simple retry (fixed delay):
action:
type: retry
delay_seconds: 60 # Wait 60s before retry
max_attempts: 3 # Stop after 3 retries
same_arguments: true # Use original task argsExponential backoff:
action:
type: retry
delay_seconds: 10 # Initial delay
max_attempts: 5
exponential_backoff: true
backoff_multiplier: 2 # Delays: 10s, 20s, 40s, 80s, 160s
max_delay_seconds: 300 # Cap at 5 minutesConditional retry (only for specific errors):
trigger:
event: task.failed
conditions:
error_pattern: "ConnectionError|Timeout|503"
action:
type: retry
delay_seconds: 30
max_attempts: 5Retry workflows track parent-child task relationships. If a task is retried multiple times, the full retry chain is visible in the task detail view.
Workflow templates
Pre-built workflows for common scenarios.
Auto-retry transient failures:
name: "Auto-retry network errors"
trigger:
event: task.failed
conditions:
error_pattern: "ConnectionError|Timeout|ConnectionRefusedError"
action:
type: retry
delay_seconds: 60
max_attempts: 3
circuit_breaker:
max_executions: 100
time_window_hours: 1Alert on critical task failures:
name: "Alert on payment processing failures"
trigger:
event: task.failed
conditions:
task_name_pattern: "payments.process_*"
action:
type: slack
webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK"
message: "💳 Payment task {task_name} failed: {error_message}"
circuit_breaker:
max_executions: 50
time_window_hours: 1Orphan task recovery:
name: "Recover orphaned tasks"
trigger:
event: task.orphaned
action:
type: retry
delay_seconds: 300 # Wait 5 minutes before retry
max_attempts: 1 # Only retry once
circuit_breaker:
max_executions: 200
time_window_hours: 24Escalation on repeated failures:
name: "Escalate after 3 failures"
trigger:
event: task.failed
conditions:
failure_count_threshold: 3 # Only trigger after 3rd failure
action:
type: slack
webhook_url: "https://hooks.slack.com/services/ONCALL/WEBHOOK"
message: "🚨 Task {task_name} failed 3 times. Manual intervention needed."Workflow history and debugging
Every workflow execution is logged with:
- Timestamp: When the workflow ran
- Trigger: Which event caused it to run
- Conditions: Whether conditions matched
- Actions: What actions executed and their results
- Circuit breaker status: Whether the circuit was open/closed
- Outcome: Success, failure, or skipped
Access workflow history from the Workflows page, click workflow name, then go to the Execution History tab.
Best practices
Start conservative:
# Good: Limited retries with circuit breaker
circuit_breaker:
max_executions: 50
time_window_hours: 1
action:
max_attempts: 3
# Bad: Unlimited retries, no circuit breaker
action:
max_attempts: 999 # Don't do thisUse specific task patterns:
# Good: Narrow scope
task_name_pattern: "myapp.tasks.send_email"
# Bad: Too broad, may trigger unexpectedly
task_name_pattern: "*"Test workflows before enabling:
Use the Test Workflow button to simulate execution without affecting production tasks.
Monitor workflow executions:
Check execution history regularly to ensure workflows aren't triggering more often than expected.
Combine workflows for escalation:
Create multiple workflows for the same trigger with different conditions:
- First failure → Auto-retry
- Third failure → Send Slack alert
- Fifth failure → Call PagerDuty webhook
Limitations
No conditional branching:
Workflows execute actions sequentially. For complex logic (if/else, loops), use webhooks to external services.
No task cancellation:
Workflows can retry or notify, but cannot cancel running tasks. Cancellation must be handled in your Celery application.
No cross-task dependencies:
Workflows operate on individual task events. For complex task chains or DAGs, use Celery's built-in primitives (chain, chord, group).
Rate limits:
Circuit breakers prevent infinite loops, but external services (Slack, webhooks) may have their own rate limits. Configure accordingly.