Workflow Automation

Create workflows that respond automatically to task failures, orphan detection, or execution time thresholds. No code required — configure triggers and actions through the UI.

Why workflows?

If you've ever manually retried a batch of failed tasks at 2am or wished Slack would just tell you when things break, workflows solve that.

Example use cases:

Auto-retry transient failures (network timeouts, rate limits)
Send Slack alerts when critical tasks fail
Recover orphaned tasks automatically
Notify on-call engineers when failure rates spike
Escalate to humans when retry limits are exceeded

How workflows work

Workflows consist of three parts:

Trigger: What event kicks off the workflow (task failure, orphan detected, etc.)
Conditions: Optional filters to narrow when the workflow runs (specific task names, queues, error patterns)
Actions: What happens when triggered (retry task, send Slack message, webhook call)

Event → Conditions Check → Actions Execute → Result Logged

Each workflow runs independently with full execution history and rollback support.

Creating a workflow

Navigate to Workflows

From the dashboard, click Workflows in the sidebar, then Create Workflow.

Choose a trigger

Select what event should start this workflow:

Task Failed: When any task transitions to FAILURE state
Task Orphaned: When orphan detection flags an abandoned task
Execution Time Exceeded: When a task runs longer than a threshold
Worker Offline: When a worker stops sending heartbeats

Add conditions (optional)

Narrow when the workflow runs:

# Only trigger for specific task
task_name_pattern: "myapp.tasks.send_email"

# Only trigger for high-priority queue
queue: "priority-high"

# Only trigger for specific errors
error_pattern: "ConnectionError|Timeout"

# Combine multiple conditions
task_name_pattern: "myapp.tasks.*"
queue: "email"
error_pattern: "SMTPException"

Configure actions

Choose what happens when the workflow triggers:

Retry Task

Retry with same arguments
Configure delay between attempts
Set maximum retry count
Enable exponential backoff

Send Slack Notification

Configure webhook URL
Customize message template
Include task context (name, args, error)

Call Webhook

POST task details to external endpoint
Include custom headers
Configure timeout and retries

Set circuit breaker limits

Prevent infinite loops:

Max executions: Stop after N workflow runs
Time window: Reset counter after X hours
On circuit open: Send notification instead of executing actions

Save and enable

Review your workflow configuration, then click Save and Enable.

The workflow activates immediately and begins monitoring for matching events.

Circuit breaker protection

Every workflow includes a circuit breaker to prevent runaway automation.

How it works:

Workflow tracks execution count within a rolling time window (default: 1 hour)
When count exceeds threshold (default: 100 executions), circuit opens
While open, workflow sends notifications instead of executing actions
Circuit auto-resets after time window elapses

Example scenario:

Time Window: 1 hour
Max Executions: 50

Hour 1: 45 executions → Circuit CLOSED, actions run normally
Hour 1: 51 executions → Circuit OPEN, notifications sent instead
Hour 2: Counter resets → Circuit CLOSED again

Circuit breakers are critical for production deployments. A misconfigured retry workflow could create thousands of duplicate tasks if not limited.

Slack integration

Native Slack action for workflow notifications.

Create a Slack incoming webhook

Go to Slack API: Incoming Webhooks
Create a new webhook for your workspace
Choose the channel for notifications
Copy the webhook URL

Configure Slack action in workflow

In the workflow editor:

Select Slack Notification action
Paste your webhook URL
Customize the message template

Available variables:

Task {task_name} failed on worker {worker}
Error: {error_message}
Args: {task_args}
Time: {timestamp}
Workflow: {workflow_name}

Test the integration

Use the Test Workflow button to send a sample notification.

Verify it appears in your Slack channel with the correct formatting.

Retry orchestration

Built-in retry action with configurable strategies.

Simple retry (fixed delay):

action:
  type: retry
  delay_seconds: 60        # Wait 60s before retry
  max_attempts: 3          # Stop after 3 retries
  same_arguments: true     # Use original task args

Exponential backoff:

action:
  type: retry
  delay_seconds: 10        # Initial delay
  max_attempts: 5
  exponential_backoff: true
  backoff_multiplier: 2    # Delays: 10s, 20s, 40s, 80s, 160s
  max_delay_seconds: 300   # Cap at 5 minutes

Conditional retry (only for specific errors):

trigger:
  event: task.failed
conditions:
  error_pattern: "ConnectionError|Timeout|503"
action:
  type: retry
  delay_seconds: 30
  max_attempts: 5

Retry workflows track parent-child task relationships. If a task is retried multiple times, the full retry chain is visible in the task detail view.

Workflow templates

Pre-built workflows for common scenarios.

Auto-retry transient failures:

name: "Auto-retry network errors"
trigger:
  event: task.failed
conditions:
  error_pattern: "ConnectionError|Timeout|ConnectionRefusedError"
action:
  type: retry
  delay_seconds: 60
  max_attempts: 3
circuit_breaker:
  max_executions: 100
  time_window_hours: 1

Alert on critical task failures:

name: "Alert on payment processing failures"
trigger:
  event: task.failed
conditions:
  task_name_pattern: "payments.process_*"
action:
  type: slack
  webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK"
  message: "💳 Payment task {task_name} failed: {error_message}"
circuit_breaker:
  max_executions: 50
  time_window_hours: 1

Orphan task recovery:

name: "Recover orphaned tasks"
trigger:
  event: task.orphaned
action:
  type: retry
  delay_seconds: 300  # Wait 5 minutes before retry
  max_attempts: 1     # Only retry once
circuit_breaker:
  max_executions: 200
  time_window_hours: 24

Escalation on repeated failures:

name: "Escalate after 3 failures"
trigger:
  event: task.failed
conditions:
  failure_count_threshold: 3  # Only trigger after 3rd failure
action:
  type: slack
  webhook_url: "https://hooks.slack.com/services/ONCALL/WEBHOOK"
  message: "🚨 Task {task_name} failed 3 times. Manual intervention needed."

Workflow history and debugging

Every workflow execution is logged with:

Timestamp: When the workflow ran
Trigger: Which event caused it to run
Conditions: Whether conditions matched
Actions: What actions executed and their results
Circuit breaker status: Whether the circuit was open/closed
Outcome: Success, failure, or skipped

Access workflow history from the Workflows page, click workflow name, then go to the Execution History tab.

Best practices

Start conservative:

# Good: Limited retries with circuit breaker
circuit_breaker:
  max_executions: 50
  time_window_hours: 1
action:
  max_attempts: 3

# Bad: Unlimited retries, no circuit breaker
action:
  max_attempts: 999  # Don't do this

Use specific task patterns:

# Good: Narrow scope
task_name_pattern: "myapp.tasks.send_email"

# Bad: Too broad, may trigger unexpectedly
task_name_pattern: "*"

Test workflows before enabling:

Use the Test Workflow button to simulate execution without affecting production tasks.

Monitor workflow executions:

Check execution history regularly to ensure workflows aren't triggering more often than expected.

Combine workflows for escalation:

Create multiple workflows for the same trigger with different conditions:

First failure → Auto-retry
Third failure → Send Slack alert
Fifth failure → Call PagerDuty webhook

Limitations

No conditional branching:

Workflows execute actions sequentially. For complex logic (if/else, loops), use webhooks to external services.

No task cancellation:

Workflows can retry or notify, but cannot cancel running tasks. Cancellation must be handled in your Celery application.

No cross-task dependencies:

Workflows operate on individual task events. For complex task chains or DAGs, use Celery's built-in primitives (chain, chord, group).

Rate limits:

Circuit breakers prevent infinite loops, but external services (Slack, webhooks) may have their own rate limits. Configure accordingly.

Why workflows?

How workflows work

Creating a workflow

Circuit breaker protection

Slack integration

Retry orchestration

Workflow templates

Workflow history and debugging

Best practices

Limitations

Troubleshooting

Next steps

Orphan Detection

Analytics

Workflow API

On this page

Workflow Automation

Workflow not triggering

Too many workflow executions

Slack notifications not sending

Retries creating duplicate tasks

Orphan Detection

Analytics

Workflow API

On this page