PgvectorKeywordRetriever
Retrieve documents from the PgvectorDocumentStore, based on keywords.
Basic Information
- Type:
haystack_integrations.pgvector.src.haystack_integrations.components.retrievers.pgvector.keyword_retriever.PgvectorKeywordRetriever
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | String to search in Documents' content. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details. |
| top_k | Optional[int] | None | Maximum number of Documents to return. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A dictionary with the following keys: - documents: List of Documents that match the query. |
Overview
Bear with us while we're working on adding pipeline examples and most common components connections.
Retrieve documents from the PgvectorDocumentStore, based on keywords.
To rank the documents, the ts_rank_cd function of PostgreSQL is used.
It considers how often the query terms appear in the document, how close together the terms are in the document,
and how important is the part of the document where they occur.
For more details, see
Postgres documentation.
Usage example:
from haystack.document_stores import DuplicatePolicy
from haystack import Document
from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
from haystack_integrations.components.retrievers.pgvector import PgvectorKeywordRetriever
# Set an environment variable `PG_CONN_STR` with the connection string to your PostgreSQL database.
# e.g., "postgresql://USER:PASSWORD@HOST:PORT/DB_NAME"
document_store = PgvectorDocumentStore(language="english", recreate_table=True)
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates..."),
Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]
document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
retriever = PgvectorKeywordRetriever(document_store=document_store)
result = retriever.run(query="languages")
assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."
## Usage Example
```yaml
components:
PgvectorKeywordRetriever:
type: pgvector.src.haystack_integrations.components.retrievers.pgvector.keyword_retriever.PgvectorKeywordRetriever
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| document_store | PgvectorDocumentStore | An instance of PgvectorDocumentStore. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. |
| top_k | int | 10 | Maximum number of Documents to return. |
| filter_policy | Union[str, FilterPolicy] | FilterPolicy.REPLACE | Policy to determine how filters are applied. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | String to search in Documents' content. | |
| filters | Optional[Dict[str, Any]] | None | Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details. |
| top_k | Optional[int] | None | Maximum number of Documents to return. |
Was this page helpful?