SentenceTransformersDiversityRanker
A Diversity Ranker based on Sentence Transformers.
Basic Information
- Type:
haystack_integrations.rankers.sentence_transformers_diversity.SentenceTransformersDiversityRanker
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | The search query. | |
| documents | List[Document] | List of Document objects to be ranker. | |
| top_k | Optional[int] | None | Optional. An integer to override the top_k set during initialization. |
| lambda_threshold | Optional[float] | None | Override the trade-off parameter between relevance and diversity. Only used when strategy is "maximum_margin_relevance". |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A dictionary with the following key: - documents: List of Document objects that have been selected based on the diversity ranking. |
Overview
Bear with us while we're working on adding pipeline examples and most common components connections.
A Diversity Ranker based on Sentence Transformers.
Applies a document ranking algorithm based on one of the two strategies:
-
Greedy Diversity Order:
Implements a document ranking algorithm that orders documents in a way that maximizes the overall diversity of the documents based on their similarity to the query.
It uses a pre-trained Sentence Transformers model to embed the query and the documents.
-
Maximum Margin Relevance:
Implements a document ranking algorithm that orders documents based on their Maximum Margin Relevance (MMR) scores.
MMR scores are calculated for each document based on their relevance to the query and diversity from already selected documents. The algorithm iteratively selects documents based on their MMR scores, balancing between relevance to the query and diversity from already selected documents. The 'lambda_threshold' controls the trade-off between relevance and diversity.
Usage Example
components:
SentenceTransformersDiversityRanker:
type: haystack.components.rankers.sentence_transformers_diversity.SentenceTransformersDiversityRanker
init_parameters:
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | str | sentence-transformers/all-MiniLM-L6-v2 | Local path or name of the model in Hugging Face's model hub, such as 'sentence-transformers/all-MiniLM-L6-v2'. |
| top_k | int | 10 | The maximum number of Documents to return per query. |
| device | Optional[ComponentDevice] | None | The device on which the model is loaded. If None, the default device is automatically selected. |
| token | Optional[Secret] | Secret.from_env_var(['HF_API_TOKEN', 'HF_TOKEN'], strict=False) | The API token used to download private models from Hugging Face. |
| similarity | Union[str, DiversityRankingSimilarity] | cosine | Similarity metric for comparing embeddings. Can be set to "dot_product" (default) or "cosine". |
| query_prefix | str | A string to add to the beginning of the query text before ranking. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and BGE. | |
| query_suffix | str | A string to add to the end of the query text before ranking. | |
| document_prefix | str | A string to add to the beginning of each Document text before ranking. Can be used to prepend the text with an instruction, as required by some embedding models, such as E5 and BGE. | |
| document_suffix | str | A string to add to the end of each Document text before ranking. | |
| meta_fields_to_embed | Optional[List[str]] | None | List of meta fields that should be embedded along with the Document content. |
| embedding_separator | str | \n | Separator used to concatenate the meta fields to the Document content. |
| strategy | Union[str, DiversityRankingStrategy] | greedy_diversity_order | The strategy to use for diversity ranking. Can be either "greedy_diversity_order" or "maximum_margin_relevance". |
| lambda_threshold | float | 0.5 | The trade-off parameter between relevance and diversity. Only used when strategy is "maximum_margin_relevance". |
| model_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoModelForSequenceClassification.from_pretrained when loading the model. Refer to specific model documentation for available kwargs. |
| tokenizer_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoTokenizer.from_pretrained when loading the tokenizer. Refer to specific model documentation for available kwargs. |
| config_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for AutoConfig.from_pretrained when loading the model configuration. |
| backend | Literal['torch', 'onnx', 'openvino'] | torch | The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino". Refer to the Sentence Transformers documentation for more information on acceleration and quantization options. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | The search query. | |
| documents | List[Document] | List of Document objects to be ranker. | |
| top_k | Optional[int] | None | Optional. An integer to override the top_k set during initialization. |
| lambda_threshold | Optional[float] | None | Override the trade-off parameter between relevance and diversity. Only used when strategy is "maximum_margin_relevance". |
Was this page helpful?