Skip to main content

PgvectorEmbeddingRetriever

Retrieves documents from the PgvectorDocumentStore, based on their dense embeddings.

Basic Information

  • Type: haystack_integrations.pgvector.src.haystack_integrations.components.retrievers.pgvector.embedding_retriever.PgvectorEmbeddingRetriever

Inputs

ParameterTypeDefaultDescription
query_embeddingList[float]Embedding of the query.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_kOptional[int]NoneMaximum number of Documents to return.
vector_functionOptional[Literal['cosine_similarity', 'inner_product', 'l2_distance']]NoneThe similarity function to use when searching for similar embeddings.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A dictionary with the following keys: - documents: List of Documents that are similar to query_embedding.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Retrieves documents from the PgvectorDocumentStore, based on their dense embeddings.

Example usage:

from haystack.document_stores import DuplicatePolicy
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder

from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
from haystack_integrations.components.retrievers.pgvector import PgvectorEmbeddingRetriever

# Set an environment variable `PG_CONN_STR` with the connection string to your PostgreSQL database.
# e.g., "postgresql://USER:PASSWORD@HOST:PORT/DB_NAME"

document_store = PgvectorDocumentStore(
embedding_dimension=768,
vector_function="cosine_similarity",
recreate_table=True,
)

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates..."),
Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]

document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", PgvectorEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

res = query_pipeline.run({"text_embedder": {"text": query}})

assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."

Usage Example

components:
PgvectorEmbeddingRetriever:
type: pgvector.src.haystack_integrations.components.retrievers.pgvector.embedding_retriever.PgvectorEmbeddingRetriever
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
document_storePgvectorDocumentStoreAn instance of PgvectorDocumentStore.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved Documents.
top_kint10Maximum number of Documents to return.
vector_functionOptional[Literal['cosine_similarity', 'inner_product', 'l2_distance']]NoneThe similarity function to use when searching for similar embeddings. Defaults to the one set in the document_store instance. "cosine_similarity" and "inner_product" are similarity functions and higher scores indicate greater similarity between the documents. "l2_distance" returns the straight-line distance between vectors, and the most similar documents are the ones with the smallest score. Important: if the document store is using the "hnsw" search strategy, the vector function should match the one utilized during index creation to take advantage of the index.
filter_policyUnion[str, FilterPolicy]FilterPolicy.REPLACEPolicy to determine how filters are applied.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
query_embeddingList[float]Embedding of the query.
filtersOptional[Dict[str, Any]]NoneFilters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_kOptional[int]NoneMaximum number of Documents to return.
vector_functionOptional[Literal['cosine_similarity', 'inner_product', 'l2_distance']]NoneThe similarity function to use when searching for similar embeddings.