FastembedRanker

Ranks Documents based on their similarity to the query using

Basic Information

Type: haystack_integrations.fastembed.src.haystack_integrations.components.rankers.fastembed.ranker.FastembedRanker

Inputs

Parameter	Type	Default	Description
query	str		The input query to compare the documents to.
documents	List[Document]		A list of documents to be ranked.
top_k	Optional[int]	None	The maximum number of documents to return.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A dictionary with the following keys: - `documents`: A list of documents closest to the query, sorted from most similar to least similar.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

Ranks Documents based on their similarity to the query using Fastembed models.

Documents are indexed from most to least semantically relevant to the query.

Usage example:

from haystack import Document
from haystack_integrations.components.rankers.fastembed import FastembedRanker

ranker = FastembedRanker(model_name="Xenova/ms-marco-MiniLM-L-6-v2", top_k=2)

docs = [Document(content="Paris"), Document(content="Berlin")]
query = "What is the capital of germany?"
output = ranker.run(query=query, documents=docs)
print(output["documents"][0].content)

# Berlin

Usage Example

components:
  FastembedRanker:
    type: fastembed.src.haystack_integrations.components.rankers.fastembed.ranker.FastembedRanker
    init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
model_name	str	Xenova/ms-marco-MiniLM-L-6-v2	Fastembed model name. Check the list of supported models in the Fastembed documentation.
top_k	int	10	The maximum number of documents to return.
cache_dir	Optional[str]	None	The path to the cache directory. Can be set using the `FASTEMBED_CACHE_PATH` env variable. Defaults to `fastembed_cache` in the system's temp directory.
threads	Optional[int]	None	The number of threads single onnxruntime session can use. Defaults to None.
batch_size	int	64	Number of strings to encode at once.
parallel	Optional[int]	None	If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_only	bool	False	If `True`, only use the model files in the `cache_dir`.
meta_fields_to_embed	Optional[List[str]]	None	List of meta fields that should be concatenated with the document content for reranking.
meta_data_separator	str	\n	Separator used to concatenate the meta fields to the Document content.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
query	str		The input query to compare the documents to.
documents	List[Document]		A list of documents to be ranked.
top_k	Optional[int]	None	The maximum number of documents to return.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Usage Example​

Parameters​

Init Parameters​

Run Method Parameters​