gensphere

3. Nesting Workflows

GenSphere allows you to create modular and reusable workflows by nesting workflows within each other. This is achieved using the yml_flow node type. Nesting workflows can help you organize complex tasks and promote code reuse.

3.1 Understanding YML Flow Nodes

A yml_flow node refers to another YAML workflow file. When the main workflow is executed, GenSphere will incorporate the nested workflow, resolving dependencies and combining them into a single execution graph.

Example:

# main_workflow.yaml

nodes:
  - name: data_collection
    type: yml_flow
    yml_file: data_collection_workflow.yaml
    params:
      start_date: '2023-01-01'
      end_date: '2023-01-31'
    outputs:
      - collected_data

  - name: data_analysis
    type: function_call
    function: analyze_data_function
    params:
      data: ''
    outputs:
      - analysis_results

In this example:

3.2 Creating Nested Workflows

To create a nested workflow:

  1. Define the Sub-Workflow: Create a separate YAML file (e.g., data_collection_workflow.yaml) with its own nodes.

    # data_collection_workflow.yaml
    
    nodes:
      - name: fetch_data
        type: function_call
        function: fetch_data_function
        params:
          start_date: ''
          end_date: ''
        outputs:
          - raw_data
    
      - name: process_data
        type: function_call
        function: process_data_function
        params:
          data: ''
        outputs:
          - processed_data
    
  2. Reference the Sub-Workflow: In your main workflow, use a yml_flow node to include the sub-workflow.

3.3 Passing Parameters to Nested Workflows

Parameters can be passed to nested workflows using the params field in the yml_flow node. These parameters can then be used within the nested workflow.

Example:

# main_workflow.yaml

nodes:
  - name: sub_workflow
    type: yml_flow
    yml_file: sub_workflow.yaml
    params:
      param1: ''

3.4 Accessing Outputs from Nested Workflows

Outputs specified in the outputs field of the yml_flow node are made available to the main workflow.

Example:

- name: data_collection
  type: yml_flow
  yml_file: data_collection_workflow.yaml
  outputs:
    - collected_data

In subsequent nodes, you can reference data_collection.collected_data.

3.5 Composing the Combined Workflow

Use the YamlCompose class to combine the main workflow and nested workflows into a single executable workflow.

Example:

from gensphere.yaml_utils import YamlCompose

composer = YamlCompose(
    yaml_file='main_workflow.yaml',
    functions_filepath='functions.py',
    structured_output_schema_filepath='schemas.py'
)
combined_yaml_data = composer.compose(save_combined_yaml=True, output_file='combined.yaml')