Skip to main content

Connect to Your Snowflake Database

Query data in your Snowflake database using deepset pipelines.


About this Task

deepset AI Platform provides the SnowflakeTableRetriever component that connects to your Snowflake database. To query your Snowflake data, you add this component to your query pipeline. It takes a SQL query as input and returns a database table that matches the query and the generated answer.

Prerequisites

You need the following information about your Snowflake database:

  • Snowflake account identifier
  • Snowflake user login
  • Warehouse name
  • Schema name
  • Database name

Query Your Snowflake Database

First, connect deepset AI Platform to Snowflake through the Integrations page. Youc an add the integration only for a particular workspace or for the whole organization:

Add Workspace-Level Integration

  1. Click your profile icon and choose Settings.
  2. Go to Workspace>Integrations.
  3. Find the provider you want to connect and click Connect next to them.
  4. Enter the API key and any other required details.
  5. Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

  1. Click your profile icon and choose Settings.
  2. Go to Organization>Integrations.
  3. Find the provider you want to connect and click Connect next to them.
  4. Enter the API key and any other required details.
  5. Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Then, add SnowflakeTableRetriever to your query pipeline or use one of our ready-made Text-to-SQL pipeline templates.

Usage Example

This is an example of a query pipeline with SnowflakeTableRetriever and two Generators: one to translate a natural language query into SQL and send it to SnowflakeTableRetriever and another to construct an answer.


components:
retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever.OpenSearchHybridRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 1024
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
fuzziness: 0
embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: "BAAI/bge-m3"

ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: "BAAI/bge-reranker-v2-m3"
top_k: 5

sql_prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a SQL expert working with Snowflake.

Your task is to create a Snowflake SQL query for the given question.

Refrain from explaining your answer. Your answer must be the SQL query
in plain text format without using Markdown.

Here are some relevant tables, a description about them, and their
columns:

{% for document in documents %}

Document [{{ loop.index }}] :

{{ document.content }}

{% endfor %}

User's question: {{ question }}

Generated SQL query:

sql_llm:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key: {"type": "env_var", "env_vars": ["OPENAI_API_KEY"], "strict": false}
model: "gpt-4o"
generation_kwargs:
max_tokens: 650
temperature: 0
seed: 0

snowflake_retriever:
type: haystack_integrations.components.retrievers.snowflake.snowflake_table_retriever.SnowflakeTableRetriever
init_parameters:
user: "<snowflake-user-identifier>"
account: "<snowflake-account-identifier>"
authenticator: "SNOWFLAKE" # Choose from: SNOWFLAKE, SNOWFLAKE_JWT, OAUTH
warehouse: "<snowflake-warehouse-identifier>"
database: "<snowflake-database-identifier>"
db_schema: "<snowflake-schema-identifier>"

replies_to_sql:
type: haystack.components.converters.output_adapter.OutputAdapter
init_parameters:
template: '{{ replies[0] }}'
output_type: str

display_prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are an expert data analyst.

Your role is to answer the user's question {{ question }} using the information
in the table.

You will base your response solely on the information provided in the
table(s).

Do not rely on your knowledge base; only the data that is in the table.

Refrain from using the term "table" in your response, but instead, use
the word "data".

If the table is blank say:

"The specific answer can't be found in the database. Try rephrasing your
question."

Additionally, you will present the table in a tabular format and provide
the SQL query used to extract the relevant rows from the database in
Markdown.

If the table is larger than 10 rows, display the most important rows up
to 10 rows. Your answer must be detailed and provide insights based on
the question and the available data.

SQL query:
{{ sql_query }}

Table:
{{ table }}

Answer:

display_llm:
type: haystack.components.generators.openai.OpenAIGenerator
init_parameters:
api_key: {"type": "env_var", "env_vars": ["OPENAI_API_KEY"], "strict": false}
model: "gpt-4o"
generation_kwargs:
max_tokens: 2000
temperature: 0
seed: 0

answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

connections:
- sender: retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: sql_prompt_builder.documents
- sender: ranker.documents
receiver: answer_builder.documents
- sender: sql_prompt_builder.prompt
receiver: sql_llm.prompt
- sender: sql_llm.replies
receiver: replies_to_sql.replies
- sender: replies_to_sql.output
receiver: snowflake_retriever.query
- sender: snowflake_retriever.table
receiver: display_prompt_builder.table
- sender: replies_to_sql.output
receiver: display_prompt_builder.sql_query
- sender: display_prompt_builder.prompt
receiver: display_llm.prompt
- sender: display_prompt_builder.prompt
receiver: answer_builder.prompt
- sender: display_llm.replies
receiver: answer_builder.replies

inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "retriever.query"
- "ranker.query"
- "sql_prompt_builder.question"
- "display_prompt_builder.question"
- "answer_builder.query"

filters: # These components will receive a potential query filter as input
- "retriever.filters_bm25"
- "retriever.filters_embedding"

outputs: # Defines the output of your pipeline
documents: "ranker.documents" # The output of the pipeline is the retrieved documents
answers: "answer_builder.answers" # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}