Workflow Blueprint
A workflow blueprint declares a multi-agent pipeline as a YAML file. The modules/workflows Terraform module reads these files and generates AWS Step Functions state machines — one state machine per blueprint. Workflow blueprints support sequential agent steps, parallel branches, choice routing, callback-pattern wait states, and EventBridge triggers.
Required fields: id, version, name. All other fields have safe defaults.
Workflow blueprint files live under blueprints/workflows/*.yaml in a domain repo.
Top-Level Identity Fields
id: research-synthesis-pipeline # Unique workflow identifier. Kebab-case by convention.
name: Research and Synthesis # Human-readable name for dashboards. Required.
version: "1.0.0" # Semantic version. Required.
description: | # Optional description.
Parallel data collection followed by synthesis and
confidence-based routing to the output handler.
timeout_minutes: 60 # Overall workflow timeout. Default: 60. Must be > 0.
trigger: Block
Configures how the workflow is started. Three trigger types are supported.
# Schedule trigger — runs on an EventBridge cron or rate expression.
trigger:
type: schedule
schedule: "cron(0 8 * * ? *)" # EventBridge schedule expression.
timezone: "UTC" # Optional IANA timezone.
# Event trigger — reacts to an EventBridge event pattern.
trigger:
type: event
event_pattern:
source: ["my.application"]
detail-type: ["DataReady"]
# Manual trigger — Step Functions StartExecution or API call only.
trigger:
type: manual
For scheduled workflows, the platform creates an aws_cloudwatch_event_rule and wires an aws_cloudwatch_event_target that injects trigger, scheduled_time, workflow, and environment into the execution input.
states: Block
The states: list defines the workflow DAG. Each entry is one of seven state types:
| Type | Description |
|---|---|
task | Invokes an agent via AgentCore Runtime |
choice | Routes to different states based on a condition |
parallel | Runs multiple branches concurrently |
wait | Pauses for a fixed duration |
wait_for_token | Pauses until an external callback sends a task token (Step Functions callback pattern) |
succeed | Terminal success state |
fail | Terminal error state |
Task State (Sequential Agent Step)
states:
- id: ResearchStep # State name. PascalCase by convention.
type: task
agent_ref: researcher # Agent blueprint ID. Resolved to Runtime ARN by Terraform.
next: SynthesisStep # Next state ID. Omit for terminal states.
prompt: "$.input.query" # JSONPath to the prompt field in execution input.
retry_max: 3 # Maximum retry attempts (>= 1). Shorthand for simple retry.
input_mapping: # Optional. Map state input keys to agent payload keys.
query: "$.input.query"
context: "$.results.previous"
Terraform resolves the agent’s Runtime ARN from var.agent_runtime_arns (passed from module.agents.runtime_arns) and generates a Step Functions SDK integration task using arn:aws:states:::aws-sdk:bedrockagentcore:invokeAgentRuntime. Results are written to $.results.<agent_id> in the execution state.
Parallel State
Runs multiple agents simultaneously. All branches must complete before the workflow continues. Results are merged into $.parallel_results.
- id: ParallelAnalysis
type: parallel
branches:
- states:
- id: AnalystA
type: task
agent_ref: analyst-a
prompt: "$.input.query"
- states:
- id: AnalystB
type: task
agent_ref: analyst-b
prompt: "$.input.query"
next: SynthesisStep
Choice State
Routes to different next states based on the execution context.
- id: RouteByConfidence
type: choice
choices:
- condition:
path: "$.results.researcher.confidence"
op: ">="
value: 0.8
next: HighConfidenceHandler
- condition:
path: "$.results.researcher.confidence"
op: ">="
value: 0.5
next: MediumConfidenceHandler
default: LowConfidenceHandler # Fallback if no condition matches.
Succeed State
Terminal state that records explicit success.
- id: PipelineComplete
type: succeed
Fail State
Terminal state that records an error.
- id: ValidationFailed
type: fail
error: "ValidationError"
cause: "Input did not pass validation checks."
Wait for Token State (Callback Pattern)
Pauses the workflow until an external system calls SendTaskSuccess or SendTaskFailure with the task token. This implements the Step Functions callback pattern.
- id: AwaitHumanReview
type: wait_for_token
heartbeat_seconds: 3600 # Timeout if no heartbeat received within this interval.
next: PostReviewStep
retry: and catch: (per state)
Each task state supports standard Step Functions retry and catch policies alongside the retry_max shorthand:
- id: CriticalStep
type: task
agent_ref: critical-agent
prompt: "$.input"
retry: # Full retry config — list of retry policies.
- ErrorEquals: ["States.TaskFailed"]
IntervalSeconds: 5
MaxAttempts: 3
BackoffRate: 2.0
catch: # Catch config — list of catch policies.
- ErrorEquals: ["States.ALL"]
Next: ErrorHandler
next: NextStep
memory_branching: Block
Configures AgentCore Memory branching for multi-agent workflows. When enabled, each agent step operates on an isolated memory branch.
memory_branching:
enabled: true # Enable per-state memory branches. Default: false.
merge_strategy: union # union | latest | coordinator_wins | none. Default: union.
branch_namespace: "{sessionId}/branches/{stateId}" # Namespace template.
| Merge Strategy | Behaviour |
|---|---|
union | Merge all branch memories into the main namespace |
latest | Keep only the most recent branch’s memories |
coordinator_wins | Coordinator’s memories take precedence on conflicts |
none | No merge — branches remain isolated |
Complete Multi-Agent Pipeline Example
id: research-synthesis-pipeline
name: Research and Synthesis Pipeline
version: "1.1.0"
description: |
Parallel data collection, followed by synthesis and
confidence-based routing to a final output handler.
timeout_minutes: 45
trigger:
type: manual
states:
# Step 1: Validate the incoming request.
- id: ValidateRequest
type: task
agent_ref: validator
next: ParallelCollection
retry_max: 2
prompt: "$.input"
# Step 2: Run three collection agents in parallel.
- id: ParallelCollection
type: parallel
branches:
- states:
- id: WebResearch
type: task
agent_ref: web-researcher
prompt: "$.input.query"
- states:
- id: DatabaseResearch
type: task
agent_ref: database-researcher
prompt: "$.input.query"
- states:
- id: ArchiveResearch
type: task
agent_ref: archive-researcher
prompt: "$.input.query"
next: SynthesisStep
# Step 3: Synthesise parallel results.
- id: SynthesisStep
type: task
agent_ref: synthesizer
next: RouteByConfidence
retry_max: 3
prompt: "$.parallel_results"
# Step 4: Route based on synthesiser confidence.
- id: RouteByConfidence
type: choice
choices:
- condition:
path: "$.results.synthesizer.confidence"
op: ">="
value: 0.85
next: HighQualityOutput
- condition:
path: "$.results.synthesizer.confidence"
op: ">="
value: 0.55
next: ReviewRequired
default: LowQualityFallback
# Step 5a: High-quality path — store result and finish.
- id: HighQualityOutput
type: task
agent_ref: output-formatter
prompt: "$.results.synthesizer"
next: PipelineComplete
# Step 5b: Route to human review via callback pattern.
- id: ReviewRequired
type: wait_for_token
heartbeat_seconds: 86400 # 24-hour review window.
next: PostReviewStep
- id: PostReviewStep
type: task
agent_ref: review-finalizer
prompt: "$.results.synthesizer"
next: PipelineComplete
# Step 5c: Low-quality fallback.
- id: LowQualityFallback
type: task
agent_ref: fallback-handler
prompt: "$.results.synthesizer"
next: PipelineComplete
- id: PipelineComplete
type: succeed
memory_branching:
enabled: true
merge_strategy: coordinator_wins
branch_namespace: "{sessionId}/pipeline/{stateId}"
How Workflows Map to Step Functions
The Terraform module translates the blueprint into an Amazon States Language (ASL) definition:
flowchart TD
YAML["workflow.yaml"] --> TF["modules/workflows\nlocals.tf: yamldecode()"]
TF --> SFN["aws_sfn_state_machine\n(one per blueprint)"]
TF --> LogGroup["aws_cloudwatch_log_group\n/aws/stepfunctions/<id>"]
TF --> IAM["aws_iam_role (sfn)\nbedrock-agentcore:InvokeAgentRuntime"]
TF -->|"trigger.type = schedule"| EventRule["aws_cloudwatch_event_rule"]
EventRule --> EventTarget["aws_cloudwatch_event_target"]
EventTarget --> SFN
SFN --> AgentRuntime["AgentCore Runtimes\n(from modules/agents outputs)"]
The module uses the AWS SDK integration pattern (arn:aws:states:::aws-sdk:bedrockagentcore:invokeAgentRuntime) because no native Step Functions optimised integration exists for AgentCore Runtime at this time.
Cross-Module Wiring
The workflows module receives agent Runtime ARNs from the agents module:
module "workflows" {
source = "git::https://github.com/The-Cloud-Clockwork/tcc-aws-agent-platform.git//modules/workflows"
workflow_dir = "./blueprints/workflows"
agent_runtime_arns = module.agents.runtime_arns # map of agent_id -> Runtime ARN
environment = var.environment
resource_prefix = var.resource_prefix
aws_region = var.aws_region
ssm_root_path = var.ssm_root_path
}
See Deployment Patterns for the full three-module composition.
Schema Reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
id | str | Yes | — | Unique workflow identifier (kebab-case recommended) |
name | str | Yes | — | Human-readable display name |
version | str | Yes | — | Semantic version string |
description | str | No | "" | Workflow description |
trigger | TriggerConfig | No | type=schedule | Trigger configuration |
trigger.type | str | No | schedule | schedule | event | manual |
trigger.schedule | str | No | null | EventBridge cron or rate expression |
trigger.timezone | str | No | null | IANA timezone for scheduled triggers |
trigger.event_pattern | dict | No | null | EventBridge event pattern for event triggers |
timeout_minutes | int | No | 60 | Overall workflow timeout. Must be > 0. |
states | list[WorkflowState] | No | [] | State machine definition |
states[].id | str | Yes | — | State name (unique within the workflow) |
states[].type | str | No | task | task | choice | parallel | wait | wait_for_token | succeed | fail |
states[].agent_ref | str | No | null | Agent blueprint ID (task states) |
states[].prompt | str | No | null | JSONPath to prompt field in execution input |
states[].retry_max | int | No | null | Maximum retry attempts shorthand (>= 1) |
states[].heartbeat_seconds | int | No | null | Heartbeat timeout for wait_for_token (> 0) |
states[].next | str | No | null | Next state ID |
states[].choices | list[ChoiceRule] | No | null | Choice rules (choice states) |
states[].default | str | No | null | Default state for unmatched choices |
states[].branches | list[dict] | No | null | Sub-workflow definitions (parallel states) |
states[].retry | list[dict] | No | null | Full Step Functions retry policy list |
states[].catch | list[dict] | No | null | Step Functions catch policy list |
states[].error | str | No | null | Error code (fail states) |
states[].cause | str | No | null | Error description (fail states) |
memory_branching | MemoryBranchConfig | No | null | Per-state memory branch isolation |