SnowflakeExecutor

Use this component to establish a connection to a Snowflake database. You can then query your data in a Snowflake database with a deepset Cloud pipeline.

Basic Information

  • Pipeline type: Used in query pipelines.
  • Type: deepset_cloud_custom_nodes.augmenters.snowflake_executor.DeepsetSnowflakeExecutor
  • Components it can connect with:
    • Generators: You can instruct a Generator to generate SQL from the user query and send it to SnowflakeExecutor. This way, you can ask queries in natural language.
      You can use another Generator to construct a generated answer, including the database table formatted in Markdown and the SQL query. Such a Generator would follow SnowflakeExecutor and construct the answer based on the table SnowflakeExecutor fetched.

Inputs

NameTypeDescription
queryStringA SQL query to execute.

Outputs

NameTypeDescription
tableDictionaryThe database table matching the query.
answersGeneratedAnswerThe answer containing the table in Markdown, the query, and matching documents.

Overview

Snowflake is a cloud-based data warehousing platform that provides an SQL database engine. With SnowflakeExecutor, you can pass all the details needed to query your data in Snowflake with your deepset Cloud pipeline.

You can use SnowflakeExecutor on its own. Then, you need to provide an SQL query to it, and you'll get a database table formatted as Markdown and the query as an output. However, a more common way is to use SnowflakeExecutor preceded by a Generator instructed to turn the user query into SQL followed by another Generator that answers the query based on the database tables SnowflakeExecutor retrieved.

Usage Example

This is an example of a query pipeline with SnowflakeExecutor and two Generators: one to translate a natural language query into SQL and send it to SnowflakeExecutor and another to construct an answer.

components:
  bm25_retriever:
    type: >-
      haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: >-
          haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
      top_k: 20
  query_embedder:
    type: >-
      haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
    init_parameters:
      model: intfloat/e5-base-v2
  embedding_retriever:
    type: >-
      haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: >-
          haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          use_ssl: true
          verify_certs: false
          hosts:
            - ${OPENSEARCH_HOST}
          http_auth:
            - ${OPENSEARCH_USER}
            - ${OPENSEARCH_PASSWORD}
          embedding_dim: 768
          similarity: cosine
      top_k: 20
  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  ranker:
    type: >-
      haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8
      model_kwargs:
        torch_dtype: torch.float16
  sql_prompt_builder: 
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: >-
        You are a SQL expert working with Snowflake.

        Your task is to create a Snowflake SQL query for the given question.

        Refrain from explaining your answer. Your answer must be the SQL query
        in plain text format without using Markdown.

        Here are some relevant tables, a description about it, and their
        columns:

        {% for document in documents %}

        Document[{{ loop.index }}]:

        {{ document.content }}

        {% endfor %}

        User's question: {{ question }}

        Generated SQL query:
  sql_llm: #This Generator converts a query into SQL
    type: >-
      deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
    init_parameters:
      model: anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_region_name: us-east-1
      max_length: 1000
      model_max_length: 200000
      temperature: 0
  snowflake_executor: #Receives the SQL query and returns a relevant table
    type: >-
      deepset_cloud_custom_nodes.augmenters.snowflake_executor.DeepsetSnowflakeExecutor
    init_parameters:
      user: <SF-USER>
      account: <SF-ACCOUNT>
  replies_to_sql: #Converts the LLM output into a format SnowflakeExecutor can take in
    type: haystack.components.converters.output_adapter.OutputAdapter
    init_parameters:
      template: '{{ replies[0] }}'
      output_type: str
  display_prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: >-
        You are an expert data analyst.

        Your role is to answer the user's question {query} using the information
        in the table.

        You will base your response solely on the information provided in the
        table.

        Do not rely on your knowledge base; only the data that is in the table.

        Refrain from using the term "table" in your response, but instead, use
        the word "data"

        If the table is blank say:

        "The specific answer can't be found in the database. Try rephrasing your
        question."

        Additionally, you will present the table in a tabular format and provide
        the SQL query used to extract the relevant rows from the database in
        Markdown.

        If the table is larger than 10 rows, display the most important rows up
        to 10 rows. Your answer must be detailed and provide insights based on
        the question and the available data.

        SQL query:

        {{ sql_query }}

        Table:

        {{ table }}

        Answer:
  display_llm: #Generates an answer that includes the SQL query, the table, and a generated answer. 
    type: >-
      deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
    init_parameters:
      model: anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_region_name: us-east-1
      max_length: 650
      model_max_length: 200000
      temperature: 0
  answer_builder:
    type: >-
      deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm
connections:
  - sender: bm25_retriever.documents
    receiver: document_joiner.documents
  - sender: query_embedder.embedding
    receiver: embedding_retriever.query_embedding
  - sender: embedding_retriever.documents
    receiver: document_joiner.documents
  - sender: document_joiner.documents
    receiver: ranker.documents
  - sender: ranker.documents
    receiver: sql_prompt_builder.documents
  - sender: ranker.documents
    receiver: answer_builder.documents
  - sender: sql_prompt_builder.prompt
    receiver: sql_llm.prompt
  - sender: sql_llm.replies
    receiver: replies_to_sql.replies
  - sender: replies_to_sql.output
    receiver: snowflake_executor.query
  - sender: snowflake_executor.answers
    receiver: display_prompt_builder.table
  - sender: replies_to_sql.output
    receiver: display_prompt_builder.sql_query
  - sender: display_prompt_builder.prompt
    receiver: answer_builder.prompt
  - sender: display_prompt_builder.prompt
    receiver: display_llm.prompt
  - sender: display_llm.replies
    receiver: answer_builder.replies
max_loops_allowed: 100
metadata: {}
inputs:
  query:
    - bm25_retriever.query
    - query_embedder.text
    - ranker.query
    - sql_prompt_builder.question
    - answer_builder.query
  filters:
    - bm25_retriever.filters
    - embedding_retriever.filters
outputs:
  documents: ranker.documents
  answers: answer_builder.answers

Init Parameters

ParameterTypePossible valuesDescription
userStringUser's Snowflake login.
Required.
accountStringSnowflake account identifier.
Required.
api_keySecretDefault: Secret.from_env_var("SNOWFLAKE_API_KEY")Snowflake account password.
Required.
databaseString Default: NoneName of the database to use.
Optional.
db_schemaStringDefault: NoneName of the schema to use.
Optional.
warehouseString Default: NoneName of the warehouse to use.
Optional.
login_timeoutInteger Default: 60Timeout in seconds for login. The default is 60 seconds.
Optional.