Welcome to the GenSphere tutorial! In this guide, we’ll walk you through the main functionalities of GenSphere, an AI agent development framework that simplifies the creation and execution of complex workflows involving functions and language models.
By completing this tutorial, you will learn how to:
You can also run this example directly on Google Colab here. Let’s get started!
First, ensure you have Python 3.10 or higher installed on your system. Then, install GenSphere and other required libraries using pip
:
pip install gensphere
In your Python script or Jupyter notebook, import the necessary modules:
import logging
import os
from gensphere.genflow import GenFlow
from gensphere.yaml_utils import YamlCompose
from gensphere.visualizer import Visualizer
from gensphere.hub import Hub
Set up logging to monitor the execution:
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("app.log", mode='w'),
logging.StreamHandler()
]
)
If you haven’t defined your enviroment variables yet, you can do so now. Replace the placeholders with your actual API keys. You’ll need API keys for OpenAI, Composio, and FireCrawl.
os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'
os.environ['COMPOSIO_API_KEY'] = 'YOUR_COMPOSIO_API_KEY' # Visit composio.dev to get one
os.environ['FIRECRAWL_API_KEY'] = 'YOUR_FIRECRAWL_API_KEY' # Visit firecrawl.dev to get one
Our goal is to create a workflow that automatically finds the latest product releases on Product Hunt, explores their revenue and traction, and analyzes a new startup idea based on that information.
We will use a pre-built workflow from the GenSphere open platform that extracts information from Product Hunt. This workflow will be nested into a larger workflow to achieve our objective.
Use the Hub
class to pull the base YAML file, along with its associated functions and schema files:
# Define paths to save the files
path_to_save_yaml_file = 'product_hunt_analyzer.yaml'
path_to_save_functions_file = 'gensphere_functions.py'
path_to_save_schema_file = 'structured_output_schema.py'
# Initialize the Hub
hub = Hub()
# Pull the files using the push_id
hub.pull(
push_id='de8afbeb-06cb-4f8f-8ead-64d9e6ef5326',
yaml_filename=path_to_save_yaml_file,
functions_filename=path_to_save_functions_file,
schema_filename=path_to_save_schema_file,
save_to_disk=True
)
The YAML file product_hunt_analyzer.yaml
has been saved locally. Here’s the content:
# product_hunt_analyzer.yaml
nodes:
- name: get_current_date
type: function_call
function: get_current_date_function
outputs:
- current_date
- name: get_timewindow
type: function_call
function: get_timewindow_function
outputs:
- time_window
- name: product_hunt_scrape
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
tools:
- COMPOSIO.FIRECRAWL_SCRAPE
params:
prompt: |
You should visit Product Hunt at https://www.producthunt.com/leaderboard/monthly/yyyy/mm
Today is
Substitute yyyy and mm with the year and month you want to search.
The search time window should be .
Extract raw content from the HTML pages, which contain information about new product launches, companies, number of upvotes, etc.
Scroll the page until the end and wait a few milliseconds before scraping.
outputs:
- product_hunt_scrape_results
- name: extract_info_from_search
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
structured_output_schema: StartupInformationList
params:
prompt: |
You are given reports from a search on Product Hunt containing products featured last month:
.
Extract accurate information about these new product launches.
Structure the information with the following dimensions: product name, company name, company URL, number of upvotes, business model, and brief description.
outputs:
- structured_search_info
- name: postprocess_search_results
type: function_call
function: postprocess_search_results_function
params:
info: ''
outputs:
- postprocessed_search_results
- name: find_extra_info
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
tools:
- COMPOSIO.TAVILY_TAVILY_SEARCH
params:
prompt: |
Conduct a comprehensive web search about the following entry from Product Hunt:
.
Find relevant news about the company, especially related to revenue, valuation, traction, acquisition, number of users, etc.
outputs:
- startup_extra_info
To better understand the workflow, use the Visualizer
class to visualize the project:
viz = Visualizer(
yaml_file='product_hunt_analyzer.yaml',
functions_file='gensphere_functions.py',
schema_file='structured_output_schema.py',
address='127.0.0.1',
port=8050
)
viz.start_visualization()
Note: Running the visualization inside environments like Google Colab might be cumbersome. It’s recommended to run it locally and access it through your browser.
GenSphere uses a YAML-based syntax to define workflows. There are three types of nodes:
function_call
)llm_service
)yml_flow
)These nodes trigger the execution of Python functions defined in a separate .py
file. They have params
and outputs
fields.
Example:
- name: get_current_date
type: function_call
function: get_current_date_function
outputs:
- current_date
Function Definition (gensphere_functions.py
):
# gensphere_functions.py
from datetime import datetime
def get_current_date_function():
"""
Returns the current date as a string.
Returns:
dict: A dictionary with 'current_date' as key and current date as value.
"""
return {'current_date': datetime.today().strftime('%Y-%m-%d')}
Key Points:
outputs
defined in the YAML file.params
field.These nodes execute calls to language model APIs. Currently, GenSphere supports OpenAI’s API, including structured outputs and function calling.
Example:
- name: product_hunt_scrape
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
tools:
- COMPOSIO.FIRECRAWL_SCRAPE
params:
prompt: |
You should visit Product Hunt at https://www.producthunt.com/leaderboard/monthly/yyyy/mm
Today is
Substitute yyyy and mm with the year and month you want to search.
The search time window should be .
Extract raw content from the HTML pages, which contain information about new product launches, companies, number of upvotes, etc.
Scroll the page until the end and wait a few milliseconds before scraping.
outputs:
- product_hunt_scrape_results
Key Points:
.py
file, Composio tools (COMPOSIO.tool_name
), or LangChain tools (LANGCHAIN.tool_name
).structured_output_schema
field to specify a Pydantic schema for the expected output.Structured Output Example:
- name: extract_info_from_search
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
structured_output_schema: StartupInformationList
params:
prompt: |
You are given reports from a search on Product Hunt containing products featured last month:
.
Extract accurate information about these new product launches.
Structure the information with the following dimensions: product name, company name, company URL, number of upvotes, business model, and brief description.
outputs:
- structured_search_info
Schema Definition (structured_output_schema.py
):
# structured_output_schema.py
from pydantic import BaseModel, Field
from typing import List
class StartupInformation(BaseModel):
product_name: str = Field(..., description="The name of the product")
company_name: str = Field(..., description="The name of the company offering the product")
url: str = Field(..., description="URL associated with the product")
number_upvotes: int = Field(..., description="Number of upvotes the product has received")
business_model: str = Field(..., description="Brief description of the business model")
brief_description: str = Field(..., description="Brief description of the product")
class StartupInformationList(BaseModel):
information_list: List[StartupInformation]
Post-Processing Node:
After obtaining structured output, we want post-process it. The output is an instance of the class StartupInformationList
, which is a list of StartupInformation
instances, as defined in the pydantic mode. We want to extract this as a list, so we applu the postprocess_search_results_function
.
- name: postprocess_search_results
type: function_call
function: postprocess_search_results_function
params:
info: ''
outputs:
- postprocessed_search_results
Function Definition (gensphere_functions.py
):
def postprocess_search_results_function(info):
"""
Processes the structured search information.
Args:
info (StartupInformationList): The structured search info.
Returns:
dict: A dictionary with 'postprocessed_search_results' as key.
"""
result = info.model_dump().get('information_list')
return {'postprocessed_search_results': result}
These nodes represent entire YAML files themselves, allowing you to nest workflows.
Example:
- name: product_hunt_analyzer
type: yml_flow
yml_file: product_hunt_analyzer.yaml
outputs:
- postprocessed_search_results
- startup_extra_info
Key Points:
params
field.When the output of a node is a list, you might want to apply the next node to each element individually.
Syntax:
params
or prompt
field.Example:
- name: find_extra_info
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
tools:
- COMPOSIO.TAVILY_TAVILY_SEARCH
params:
prompt: |
Conduct a comprehensive web search about the following entry from Product Hunt:
.
Find relevant news about the company, especially related to revenue, valuation, traction, acquisition, number of users, etc.
outputs:
- startup_extra_info
Key Points:
postprocessed_search_results
individually.Now, we’ll embed the product_hunt_analyzer
workflow into a larger workflow to analyze a new startup idea.
Create a new YAML file named startup_idea_evaluator.yaml
:
# startup_idea_evaluator.yaml
nodes:
- name: read_idea
type: function_call
function: read_file_as_string
params:
file_path: "domains_to_search.txt"
outputs:
- domains
- name: product_hunt_analyzer
type: yml_flow
yml_file: product_hunt_analyzer.yaml
outputs:
- postprocessed_search_results
- startup_extra_info
- name: generate_report
type: llm_service
service: openai
model: "gpt-4o-2024-08-06"
params:
prompt: |
You are a world-class VC analyst. You are analyzing the following startup idea:
Your task is to analyze this idea in the context of recent launches on Product Hunt.
Recent launches are:
Additional information about these companies:
.
Create a detailed report containing:
1. An overview of recent launches on Product Hunt. What are the main ideas being explored?
2. A list of companies from Product Hunt that may become direct competitors to the startup idea. Explain your rationale.
3. A list of the most promising startups from the Product Hunt launches, based on valuation, revenue, traction, or other relevant metrics.
4. A table containing all information found from the Product Hunt launches.
Answer in markdown format.
outputs:
- report
Use YamlCompose
to create a combined YAML file that resolves all dependencies:
# Assuming 'startup_idea_evaluator.yaml' is in the current directory
composer = YamlCompose(
yaml_file='startup_idea_evaluator.yaml',
functions_filepath='gensphere_functions.py',
structured_output_schema_filepath='structured_output_schema.py'
)
combined_yaml_data = composer.compose(save_combined_yaml=True, output_file='combined.yaml')
Note: Ensure all referenced functions and schemas are available in the specified files.
You can visualize the combined workflow to verify that nesting has been handled correctly:
viz = Visualizer(
yaml_file='combined.yaml',
functions_file='gensphere_functions.py',
schema_file='structured_output_schema.py',
address='127.0.0.1',
port=8050
)
viz.start_visualization()
Now that you have the combined YAML file and necessary Python files, you can execute the workflow.
The first node read_idea
expects a text file named domains_to_search.txt
. Create this file with your startup idea:
startup_idea = """
A startup that creates interactive voice agents using generative AI with emphasis on applications like
language tutoring, entertainment, or mental health. The business model would be B2C.
"""
with open("domains_to_search.txt", "w") as text_file:
text_file.write(startup_idea)
Initialize GenFlow
and run the workflow:
flow = GenFlow(
yaml_file='combined.yaml',
functions_filepath='gensphere_functions.py',
structured_output_schema_filepath='structured_output_schema.py'
)
flow.parse_yaml()
flow.run()
After execution, you can access the results:
# Access all outputs
outputs = flow.outputs
# Print the final report
final_report = outputs.get("generate_report").get("report")
# Display the report in Markdown format
from IPython.display import display, Markdown
display(Markdown(final_report))
You can push your project to the GenSphere platform, allowing others to pull and use it.
hub = Hub(
yaml_file='combined.yaml',
functions_file='gensphere_functions.py',
schema_file='structured_output_schema.py'
)
result = hub.push(push_name='Workflow to analyze startup idea based on recent Product Hunt launches.')
Retrieve and print the push_id
:
print(f"Push ID: {result.get('push_id')}")
print(f"Uploaded Files: {result.get('uploaded_files')}")
Check how many times your project has been pulled from the platform:
# Replace with your actual push_id
push_id = result.get('push_id')
# Get the total number of pulls for the push_id
total_pulls = hub.count_pulls(push_id=push_id)
print(f"Total pulls for push_id {push_id}: {total_pulls}")
Congratulations! You’ve successfully:
If you encounter any issues:
app.log
file for detailed error messages.Your feedback is valuable. If you have suggestions or find any issues, please open an issue on the GenSphere GitHub repository.