DeepsetDeepLDocumentTranslator
Translate the content of your documents using DeepL Python SDK.
Basic Information
- Pipeline type: Indexing and Query
When used in an indexing pipeline, it translates documents before they're indexed.
When used in a query pipeline, it translates documents after they're retrieved from a document store. - Type:
deepset_cloud_custom_nodes.converters.deepl_document_translator.DeepsetDeepLDocumentTranslator
- Components it can connect with:
- Converters: You can use
DeepsetDeepLTextTranslator
after converters to translate the documents converters return. - Retrievers: You can use this component to translate documents fetched by a retriever.
- PromptBuilder:
DeepsetDeepLTextTranslator
can send the translated documents to a PromptBuilder, which then includes them in the prompt for the LLM.
- Converters: You can use
Inputs
Name | Type | Description |
---|---|---|
documents | List of Document objects | A list of documents to be translated. |
Outputs
Name | Type | Description |
---|---|---|
translated_documents | List of Document objects | A list of translated documents. |
Overview
DeepsetDeepLDocumentTranslator uses the DeepL Python library to translate documents into the languages you specify. For a list of supported languages, see DeepL documentation. You can translate one set of documents into multiple languages at once; just pass the language codes in the target_languages
parameter.
Authorization
You must have an active DeepL account and a DeepL API key to use this component. Connect DeepL to deepset Cloud on the Connections page:
-
Click your initials in the top right corner and select Connections.
-
Click Connect next to the provider.
-
Enter your API key and submit it.
Once deepset Cloud is connected, you can use DeepsetDeepLDocumentTranslator
without passing the API key in the pipeline YAML.
Usage Example
This is an example of a query pipeline where DeepsetDeepLDocumentTranslator
receives documents from a Ranker and translates them into German. The output of the pipeline are the translated documents:
data:image/s3,"s3://crabby-images/9e7c0/9e7c0671090b7c201bc6d3002d6b4008d2364ae3" alt="DeepL translator in a query pipeline where it receives documents to translate from the ranker, translates them and then sends them to the pipeline output"
To specify the languages you want DeepL to translate into, you list their codes in the target_languages
parameter:
data:image/s3,"s3://crabby-images/f462d/f462dafd4681bfcf99aed9cfb52744c97ff33bf2" alt=""
Here is the pipeline YAML:
components:
query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/multilingual-e5-base
embedding_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
init_parameters:
embedding_dim: 768
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20 # The number of results to return
ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: "jeffwan/mmarco-mMiniLMv2-L12-H384-v1"
top_k: 20
model_kwargs:
torch_dtype: "torch.float16"
deepl_translator:
type: deepset_cloud_custom_nodes.converters.deepl_translator.DeepsetDeepLDocumentTranslator
# For more information about DeepL supported languages, see https://developers.deepl.com/docs/resources/supported-languages
init_parameters:
api_key: {"type": "env_var", "env_vars": ["DEEPL_API_KEY"], "strict": false}
target_languages: ["DE"] # Translate documents into German
source_language: # Auto-detects the source language when set to "null"
preserve_formatting: true # Prevent automatic correction of formatting
include_score: true # Display relevance score
connections: # Defines how the components are connected
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: deepl_translator.documents
inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "query_embedder.text"
- "ranker.query"
filters: # These components will receive a potential query filter as input
- "embedding_retriever.filters"
outputs: # Defines the output of your pipeline
documents: "deepl_translator.documents" # The output of the pipeline is the retrieved documents translated into German.
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
Parameter | Type | Possible values | Description |
---|---|---|---|
target_languages | List of strings | For a list of possible options, see DeepL documentation. | Codes of languages you want to translate your documents into. Required. |
source_language | String | Default: None | The language of the documents you want to translate. If None , it's automatically detected. For a list of possible options, see DeepL documentation.Optional. |
api_key | Secret | Default: Secret.from_env_var("DEEPL_API_TOKEN") | The DeepL API key. Required. |
preserve_formatting | Boolean | Default: True | Controls automatic formatting correction. When True , prevents automatic correction of formatting.Optional |
split_sentences | Literal | 0 1 nonewlines Default: None | Controls how the translation engine splits input into sentences before translation and whether it first splits the input into sentences. This is enabled by default. Possible values are: - 0 : means OFF. No splitting at all, the whole input is treated as one sentence. Use this option if the input text is already split into sentences to prevent the engine from splitting the sentence unintentionally.- 1 : means ALL. (default) splits on punctuation and on newlines.- nonewlines : splits on punctuation only, ignoring newlines.Required. |
context | String | Default: None | Use this parameter to include additional context that can influence a translation without being translated itself. Providing additional context can potentially improve translation quality, especially for short, low-context source texts such as product names on an e-commerce website, article headlines on a news website, or UI elements. For details, see DeepL API documentation. Optional. |
formality | String | less more Default: None | Controls whether translations lean toward informal or formal language. This works only for target languages: German (DE), French (FR), Italian (IT), Spanish (ES), Dutch (NL), Polish (PL), Portugese (PT_BR and PT_PT), Japanese (JA), and Russian (RU). Possible values: - less : Uses more informal language- more : Uses more polite and formal language.Optional. |
max_retries | Integer | Default: 5 | The maximum number of network retries after a failed HTTP request. Optional. |
glossary | Union | Default: None | Glossary ID to use for translation. Must match specified source_lang and target_lang .Required. |
tag_handling | Literal | xml html Default: None | Type of tags to parse before translation. Currently, supports only XML and HTML. Required. |
outline_detection | Boolean | True False Default: None | Set to False to disable automatic tag detection. Optional. |
non_splitting_tags | Union | Default: None | XML tags that should not split a sentence. Required. |
splitting_tags | Union | Default: None | XML tags that should split a sentence. Required. |
ignore_tags | Union | Default: None | XML tags containing text that should not be translated. Required. |
include_score | Boolean | True False Default: True | Whether to include the original document score in the translated document. Required. |
Run Method Parameters
These are the parameters you can configure for the component's run()
method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
Parameter | Type | Description |
---|---|---|
documents | List of Document objects | List of documents to be translated. Required. |
Updated 9 days ago