Skip to main content

FastembedSparseTextEmbedder

FastembedSparseTextEmbedder computes string embedding using fastembed sparse models.

Basic Information

  • Type: haystack_integrations.fastembed.src.haystack_integrations.components.embedders.fastembed.fastembed_sparse_text_embedder.FastembedSparseTextEmbedder

Inputs

ParameterTypeDefaultDescription
textstrA string to embed.

Outputs

ParameterTypeDefaultDescription
sparse_embeddingSparseEmbeddingA dictionary with the following keys: - embedding: A list of floats representing the embedding of the input text.

Overview

Work in Progress

Bear with us while we're working on adding pipeline examples and most common components connections.

FastembedSparseTextEmbedder computes string embedding using fastembed sparse models.

Usage example:

from haystack_integrations.components.embedders.fastembed import FastembedSparseTextEmbedder

text = ("It clearly says online this will work on a Mac OS system. "
"The disk comes and it does not, only Windows. Do Not order this if you have a Mac!!")

sparse_text_embedder = FastembedSparseTextEmbedder(
model="prithivida/Splade_PP_en_v1"
)
sparse_text_embedder.warm_up()

sparse_embedding = sparse_text_embedder.run(text)["sparse_embedding"]

Usage Example

components:
FastembedSparseTextEmbedder:
type: fastembed.src.haystack_integrations.components.embedders.fastembed.fastembed_sparse_text_embedder.FastembedSparseTextEmbedder
init_parameters:

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
modelstrprithivida/Splade_PP_en_v1Local path or name of the model in Fastembed's model hub, such as prithivida/Splade_PP_en_v1
cache_dirOptional[str]NoneThe path to the cache directory. Can be set using the FASTEMBED_CACHE_PATH env variable. Defaults to fastembed_cache in the system's temp directory.
threadsOptional[int]NoneThe number of threads single onnxruntime session can use. Defaults to None.
progress_barboolTrueIf True, displays progress bar during embedding.
parallelOptional[int]NoneIf > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. If 0, use all available cores. If None, don't use data-parallel processing, use default onnxruntime threading instead.
local_files_onlyboolFalseIf True, only use the model files in the cache_dir.
model_kwargsOptional[Dict[str, Any]]NoneDictionary containing model parameters such as k, b, avg_len, language.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
textstrA string to embed.