gensphere

GenSphere Tutorial

Welcome to the GenSphere tutorial! In this guide, we’ll walk you through the main functionalities of GenSphere, an AI agent development framework that simplifies the creation and execution of complex workflows involving functions and language models.

By completing this tutorial, you will learn how to:

  1. Define workflows using YAML files.
  2. Use pre-built components from the GenSphere platform.
  3. Nest workflows to create complex pipelines.
  4. Utilize custom functions and schemas, and integrate with LangChain and Composio tools.
  5. Visualize workflows for better understanding.
  6. Push and pull workflows to and from the GenSphere platform.

You can also run this example directly on Google Colab here. Let’s get started!


Table of Contents

  1. Installation
  2. Importing GenSphere
  3. Setting Up Environment Variables
  4. Defining Your Workflow with YAML
  5. Combining Workflows
  6. Running Your Project
  7. Pushing to the Platform
  8. Checking Project Popularity
  9. Conclusion

1. Installation

First, ensure you have Python 3.10 or higher installed on your system. Then, install GenSphere and other required libraries using pip:

pip install gensphere

2. Importing GenSphere

In your Python script or Jupyter notebook, import the necessary modules:

import logging
import os
from gensphere.genflow import GenFlow
from gensphere.yaml_utils import YamlCompose
from gensphere.visualizer import Visualizer
from gensphere.hub import Hub

Set up logging to monitor the execution:

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler("app.log", mode='w'),
        logging.StreamHandler()
    ]
)

3. Setting Up Environment Variables

If you haven’t defined your enviroment variables yet, you can do so now. Replace the placeholders with your actual API keys. You’ll need API keys for OpenAI, Composio, and FireCrawl.

os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'
os.environ['COMPOSIO_API_KEY'] = 'YOUR_COMPOSIO_API_KEY'  # Visit composio.dev to get one
os.environ['FIRECRAWL_API_KEY'] = 'YOUR_FIRECRAWL_API_KEY'  # Visit firecrawl.dev to get one

4. Defining Your Workflow with YAML

Our goal is to create a workflow that automatically finds the latest product releases on Product Hunt, explores their revenue and traction, and analyzes a new startup idea based on that information.

4.1 Pulling a Base YAML File

We will use a pre-built workflow from the GenSphere open platform that extracts information from Product Hunt. This workflow will be nested into a larger workflow to achieve our objective.

Pulling from the Platform

Use the Hub class to pull the base YAML file, along with its associated functions and schema files:

# Define paths to save the files
path_to_save_yaml_file = 'product_hunt_analyzer.yaml'
path_to_save_functions_file = 'gensphere_functions.py'
path_to_save_schema_file = 'structured_output_schema.py'

# Initialize the Hub
hub = Hub()

# Pull the files using the push_id
hub.pull(
    push_id='de8afbeb-06cb-4f8f-8ead-64d9e6ef5326',
    yaml_filename=path_to_save_yaml_file,
    functions_filename=path_to_save_functions_file,
    schema_filename=path_to_save_schema_file,
    save_to_disk=True
)

Examining the YAML File

The YAML file product_hunt_analyzer.yaml has been saved locally. Here’s the content:

# product_hunt_analyzer.yaml

nodes:
  - name: get_current_date
    type: function_call
    function: get_current_date_function
    outputs:
      - current_date

  - name: get_timewindow
    type: function_call
    function: get_timewindow_function
    outputs:
      - time_window

  - name: product_hunt_scrape
    type: llm_service
    service: openai
    model: "gpt-4o-2024-08-06"
    tools:
      - COMPOSIO.FIRECRAWL_SCRAPE
    params:
      prompt: |
        You should visit Product Hunt at https://www.producthunt.com/leaderboard/monthly/yyyy/mm
        Today is 
        Substitute yyyy and mm with the year and month you want to search.
        The search time window should be .
        Extract raw content from the HTML pages, which contain information about new product launches, companies, number of upvotes, etc.
        Scroll the page until the end and wait a few milliseconds before scraping.

    outputs:
      - product_hunt_scrape_results

  - name: extract_info_from_search
    type: llm_service
    service: openai
    model: "gpt-4o-2024-08-06"
    structured_output_schema: StartupInformationList
    params:
      prompt: |
        You are given reports from a search on Product Hunt containing products featured last month:
        .
        Extract accurate information about these new product launches.
        Structure the information with the following dimensions: product name, company name, company URL, number of upvotes, business model, and brief description.

    outputs:
      - structured_search_info

  - name: postprocess_search_results
    type: function_call
    function: postprocess_search_results_function
    params:
      info: ''
    outputs:
      - postprocessed_search_results

  - name: find_extra_info
    type: llm_service
    service: openai
    model: "gpt-4o-2024-08-06"
    tools:
      - COMPOSIO.TAVILY_TAVILY_SEARCH
    params:
      prompt: |
        Conduct a comprehensive web search about the following entry from Product Hunt:
        .
        Find relevant news about the company, especially related to revenue, valuation, traction, acquisition, number of users, etc.

    outputs:
      - startup_extra_info

4.2 Visualizing Your Project

To better understand the workflow, use the Visualizer class to visualize the project:

viz = Visualizer(
    yaml_file='product_hunt_analyzer.yaml',
    functions_file='gensphere_functions.py',
    schema_file='structured_output_schema.py',
    address='127.0.0.1',
    port=8050
)
viz.start_visualization()

Note: Running the visualization inside environments like Google Colab might be cumbersome. It’s recommended to run it locally and access it through your browser.

4.3 Understanding the YAML Syntax

GenSphere uses a YAML-based syntax to define workflows. There are three types of nodes:

  1. Function Call Nodes (function_call)
  2. LLM Service Nodes (llm_service)
  3. YML Flow Nodes (yml_flow)

Function Call Nodes

These nodes trigger the execution of Python functions defined in a separate .py file. They have params and outputs fields.

Example:

- name: get_current_date
  type: function_call
  function: get_current_date_function
  outputs:
    - current_date

Function Definition (gensphere_functions.py):

# gensphere_functions.py

from datetime import datetime

def get_current_date_function():
    """
    Returns the current date as a string.

    Returns:
        dict: A dictionary with 'current_date' as key and current date as value.
    """
    return {'current_date': datetime.today().strftime('%Y-%m-%d')}

Key Points:

LLM Service Nodes

These nodes execute calls to language model APIs. Currently, GenSphere supports OpenAI’s API, including structured outputs and function calling.

Example:

- name: product_hunt_scrape
  type: llm_service
  service: openai
  model: "gpt-4o-2024-08-06"
  tools:
    - COMPOSIO.FIRECRAWL_SCRAPE
  params:
    prompt: |
      You should visit Product Hunt at https://www.producthunt.com/leaderboard/monthly/yyyy/mm
      Today is 
      Substitute yyyy and mm with the year and month you want to search.
      The search time window should be .
      Extract raw content from the HTML pages, which contain information about new product launches, companies, number of upvotes, etc.
      Scroll the page until the end and wait a few milliseconds before scraping.

  outputs:
    - product_hunt_scrape_results

Key Points:

Structured Output Example:

- name: extract_info_from_search
  type: llm_service
  service: openai
  model: "gpt-4o-2024-08-06"
  structured_output_schema: StartupInformationList
  params:
    prompt: |
      You are given reports from a search on Product Hunt containing products featured last month:
      .
      Extract accurate information about these new product launches.
      Structure the information with the following dimensions: product name, company name, company URL, number of upvotes, business model, and brief description.

  outputs:
    - structured_search_info

Schema Definition (structured_output_schema.py):

# structured_output_schema.py

from pydantic import BaseModel, Field
from typing import List

class StartupInformation(BaseModel):
    product_name: str = Field(..., description="The name of the product")
    company_name: str = Field(..., description="The name of the company offering the product")
    url: str = Field(..., description="URL associated with the product")
    number_upvotes: int = Field(..., description="Number of upvotes the product has received")
    business_model: str = Field(..., description="Brief description of the business model")
    brief_description: str = Field(..., description="Brief description of the product")

class StartupInformationList(BaseModel):
    information_list: List[StartupInformation]

Post-Processing Node:

After obtaining structured output, we want post-process it. The output is an instance of the class StartupInformationList, which is a list of StartupInformation instances, as defined in the pydantic mode. We want to extract this as a list, so we applu the postprocess_search_results_function.

- name: postprocess_search_results
  type: function_call
  function: postprocess_search_results_function
  params:
    info: ''
  outputs:
    - postprocessed_search_results

Function Definition (gensphere_functions.py):

def postprocess_search_results_function(info):
    """
    Processes the structured search information.

    Args:
        info (StartupInformationList): The structured search info.

    Returns:
        dict: A dictionary with 'postprocessed_search_results' as key.
    """
    result = info.model_dump().get('information_list')
    return {'postprocessed_search_results': result}

YML Flow Nodes

These nodes represent entire YAML files themselves, allowing you to nest workflows.

Example:

- name: product_hunt_analyzer
  type: yml_flow
  yml_file: product_hunt_analyzer.yaml
  outputs:
    - postprocessed_search_results
    - startup_extra_info

Key Points:

Working with Lists

When the output of a node is a list, you might want to apply the next node to each element individually.

Syntax:

Example:

- name: find_extra_info
  type: llm_service
  service: openai
  model: "gpt-4o-2024-08-06"
  tools:
    - COMPOSIO.TAVILY_TAVILY_SEARCH
  params:
    prompt: |
      Conduct a comprehensive web search about the following entry from Product Hunt:
      .
      Find relevant news about the company, especially related to revenue, valuation, traction, acquisition, number of users, etc.

  outputs:
    - startup_extra_info

Key Points:


5. Combining Workflows

Now, we’ll embed the product_hunt_analyzer workflow into a larger workflow to analyze a new startup idea.

Defining the New Workflow

Create a new YAML file named startup_idea_evaluator.yaml:

# startup_idea_evaluator.yaml

nodes:
  - name: read_idea
    type: function_call
    function: read_file_as_string
    params:
      file_path: "domains_to_search.txt"
    outputs:
      - domains

  - name: product_hunt_analyzer
    type: yml_flow
    yml_file: product_hunt_analyzer.yaml
    outputs:
      - postprocessed_search_results
      - startup_extra_info

  - name: generate_report
    type: llm_service
    service: openai
    model: "gpt-4o-2024-08-06"
    params:
      prompt: |
        You are a world-class VC analyst. You are analyzing the following startup idea:
        
        Your task is to analyze this idea in the context of recent launches on Product Hunt.
        Recent launches are:
        
        Additional information about these companies:
        .

        Create a detailed report containing:
        1. An overview of recent launches on Product Hunt. What are the main ideas being explored?
        2. A list of companies from Product Hunt that may become direct competitors to the startup idea. Explain your rationale.
        3. A list of the most promising startups from the Product Hunt launches, based on valuation, revenue, traction, or other relevant metrics.
        4. A table containing all information found from the Product Hunt launches.

        Answer in markdown format.

    outputs:
      - report

Composing the Combined Workflow

Use YamlCompose to create a combined YAML file that resolves all dependencies:

# Assuming 'startup_idea_evaluator.yaml' is in the current directory
composer = YamlCompose(
    yaml_file='startup_idea_evaluator.yaml',
    functions_filepath='gensphere_functions.py',
    structured_output_schema_filepath='structured_output_schema.py'
)
combined_yaml_data = composer.compose(save_combined_yaml=True, output_file='combined.yaml')

Note: Ensure all referenced functions and schemas are available in the specified files.

Visualizing the Combined Workflow

You can visualize the combined workflow to verify that nesting has been handled correctly:

viz = Visualizer(
    yaml_file='combined.yaml',
    functions_file='gensphere_functions.py',
    schema_file='structured_output_schema.py',
    address='127.0.0.1',
    port=8050
)
viz.start_visualization()

6. Running Your Project

Now that you have the combined YAML file and necessary Python files, you can execute the workflow.

Preparing Input Files

The first node read_idea expects a text file named domains_to_search.txt. Create this file with your startup idea:

startup_idea = """
A startup that creates interactive voice agents using generative AI with emphasis on applications like
language tutoring, entertainment, or mental health. The business model would be B2C.
"""

with open("domains_to_search.txt", "w") as text_file:
    text_file.write(startup_idea)

Executing the Workflow

Initialize GenFlow and run the workflow:

flow = GenFlow(
    yaml_file='combined.yaml',
    functions_filepath='gensphere_functions.py',
    structured_output_schema_filepath='structured_output_schema.py'
)
flow.parse_yaml()
flow.run()

Accessing the Outputs

After execution, you can access the results:

# Access all outputs
outputs = flow.outputs

# Print the final report
final_report = outputs.get("generate_report").get("report")

# Display the report in Markdown format
from IPython.display import display, Markdown
display(Markdown(final_report))

7. Pushing to the Platform

You can push your project to the GenSphere platform, allowing others to pull and use it.

hub = Hub(
    yaml_file='combined.yaml',
    functions_file='gensphere_functions.py',
    schema_file='structured_output_schema.py'
)

result = hub.push(push_name='Workflow to analyze startup idea based on recent Product Hunt launches.')

Retrieve and print the push_id:

print(f"Push ID: {result.get('push_id')}")
print(f"Uploaded Files: {result.get('uploaded_files')}")

8. Checking Project Popularity

Check how many times your project has been pulled from the platform:

# Replace with your actual push_id
push_id = result.get('push_id')

# Get the total number of pulls for the push_id
total_pulls = hub.count_pulls(push_id=push_id)
print(f"Total pulls for push_id {push_id}: {total_pulls}")

9. Conclusion

Congratulations! You’ve successfully:


Additional Resources


Troubleshooting

If you encounter any issues:


Feedback

Your feedback is valuable. If you have suggestions or find any issues, please open an issue on the GenSphere GitHub repository.