CohereTextEmbedder
Embed strings using Cohere models. Use this component in query pipelines to transform user queries into vectors for embedding-based retrieval.
Key Features
- Uses Cohere models to embed text strings such as user queries.
- Outputs a float vector embedding suitable for use with embedding retrievers.
- Supports multiple Cohere embedding models. For a full list, see the Cohere documentation.
- The embedding model must match the one used by
CohereDocumentEmbedderin the indexing pipeline.
Embedding Models in Query Pipelines and Indexes
The embedding model you use to embed documents in your indexing pipeline must be the same as the embedding model you use to embed the query in your query pipeline.
This means the embedders for your indexing and query pipelines must match. For example, if you use CohereDocumentEmbedder to embed your documents, you should use CohereTextEmbedder with the same model to embed your queries.
Configuration
- Drag the
CohereTextEmbeddercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Select the embedding model to use. Make sure Haystack Platform is connected to your Cohere account. For details, see Use Cohere Models.
- Set the
input_typetosearch_queryfor query pipelines.
- Go to the Advanced tab to configure additional settings such as
truncate,timeout,embedding_type, andapi_base_url.
Connections
CohereTextEmbedder receives the user query as a text string, typically from the Input component. It outputs a float vector through its embedding output, which you connect to an embedding retriever such as OpenSearchEmbeddingRetriever or ElasticsearchEmbeddingRetriever.
Source Code
To check this component's source code, open text_embedder.py in the Haystack Core Integrations repository.
Usage Examples
Basic Configuration
CohereTextEmbedder:
type: haystack_integrations.components.embedders.cohere.text_embedder.CohereTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- COHERE_API_KEY
- CO_API_KEY
strict: false
model: embed-english-v2.0
input_type: search_query
api_base_url: https://api.cohere.com
truncate: END
use_async_client: false
timeout: 120
Using the Component in a Pipeline
This is an example of a query pipeline with CohereTextEmbedder that receives a query to embed and then sends the embedded query to OpenSearchEmbeddingRetriever to find matching documents.
components:
CohereTextEmbedder:
type: haystack_integrations.components.embedders.cohere.text_embedder.CohereTextEmbedder
init_parameters:
api_key:
type: env_var
env_vars:
- COHERE_API_KEY
- CO_API_KEY
strict: false
model: embed-english-v2.0
input_type: search_query
api_base_url: https://api.cohere.com
truncate: END
use_async_client: false
timeout: 120
embedding_type:
OpenSearchEmbeddingRetriever:
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
filters:
top_k: 10
filter_policy: replace
custom_query:
raise_on_failure: true
efficient_filtering: true
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: Standard-Index-English
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
connections:
- sender: CohereTextEmbedder.embedding
receiver: OpenSearchEmbeddingRetriever.query_embedding
max_runs_per_component: 100
metadata: {}
inputs:
query:
- CohereTextEmbedder.text
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
| text | str | The text to embed. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| embedding | List[float] | The embedding of the text. | |
| meta | Dict[str, Any] | Metadata about the request. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| api_key | Secret | Secret.from_env_var(['COHERE_API_KEY', 'CO_API_KEY']) | The Cohere API key. |
| model | str | embed-english-v2.0 | The name of the model to use. Choose a model from the list on the component card. |
| input_type | str | search_query | Specifies the type of input you're giving to the model. Supported values are "search_document", "search_query", "classification", and "clustering". Not required for older versions of the embedding models (meaning anything lower than v3), but is required for more recent versions (meaning anything bigger than v2). |
| api_base_url | str | https://api.cohere.com | The Cohere API base url. |
| truncate | str | END | Truncates embeddings that are too long from start or end, ("NONE"|"START"|"END"). Passing "START" discards the start of the input. "END" discards the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If "NONE" is selected, when the input exceeds the maximum input token length, an error is returned. |
| timeout | int | 120 | Request timeout in seconds. |
| embedding_type | Optional[EmbeddingTypes] | None | The type of embeddings to return. Defaults to float embeddings. Note that int8, uint8, binary, and ubinary are only valid for v3 models. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| text | str | The text to embed. |
Related Information
Was this page helpful?