Skip to main content

DeepsetDeepLDocumentTranslator

Translate the content of your documents using DeepL Python SDK.

Basic Information

  • Type: deepset_cloud_custom_nodes.converters.deepl_document_translator.DeepsetDeepLDocumentTranslator
  • Components it can connect with:
    • Converters: You can use DeepsetDeeplDocumentTranslator after converters to translate the documents converters return.
    • Retrievers: You can use this component to translate documents fetched by a retriever.
    • PromptBuilder: DeepsetDeepLDocumentTranslator can send the translated documents to a PromptBuilder, which then includes them in the prompt for the LLM.

Inputs

ParameterTypeDefaultDescription
documentsList[Document]List of Haystack documents to be translated.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]A list of translated documents.

Overview

DeepsetDeepLDocumentTranslator uses the DeepL Python library to translate documents into the languages you specify. For a list of supported languages, see DeepL documentation. You can translate one set of documents into multiple languages at once; just pass the language codes in thetarget_languages\ parameter.

Authorization

You must have an active DeepL account and a DeepL API key to use this component. Connect DeepL to deepset on the Integrations page:

Connection Instructions

  1. Click your profile icon in the top right corner and choose Integrations.
    Integrations menu screenshot
  2. Click Connect next to the provider.
  3. Enter your API key and submit it.

Once deepset is connected, you can use DeepsetDeepLDocumentTranslator without passing the API key in the pipeline YAML.

Usage Example

Initializing the Component

components:
DeepsetDeepLDocumentTranslator:
type: converters.deepl_translator.DeepsetDeepLDocumentTranslator
init_parameters:

Using the Component in a Pipeline

This is an example of a query pipeline where DeepsetDeepLDocumentTranslator receives documents from a Ranker and translates them into German. The output of the pipeline are the translated documents:

DeepsetDeepLDocumentTranslator in a pipeline

To specify the languages you want DeepL to translate into, you list their codes in the target_languages parameter:

Target languages configuration window

Here's the pipeline YAML:

components:
query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/multilingual-e5-base

embedding_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
init_parameters:
embedding_dim: 768
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
top_k: 20 # The number of results to return

ranker:
type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
init_parameters:
model: "jeffwan/mmarco-mMiniLMv2-L12-H384-v1"
top_k: 20
model_kwargs:
torch_dtype: "torch.float16"

deepl_translator:
type: deepset_cloud_custom_nodes.converters.deepl_translator.DeepsetDeepLDocumentTranslator
# For more information about DeepL supported languages, see https://developers.deepl.com/docs/resources/supported-languages
init_parameters:
api_key: {"type": "env_var", "env_vars": ["DEEPL_API_KEY"], "strict": false}
target_languages: ["DE"] # Translate documents into German
source_language: # Auto-detects the source language when set to "null"
preserve_formatting: true # Prevent automatic correction of formatting
include_score: true # Display relevance score

connections: # Defines how the components are connected
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: deepl_translator.documents

inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "query_embedder.text"
- "ranker.query"

filters: # These components will receive a potential query filter as input
- "embedding_retriever.filters"
outputs: # Defines the output of your pipeline
documents: "deepl_translator.documents" # The output of the pipeline is the retrieved documents translated into German.

max_runs_per_component: 100

metadata: {}


Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
target_languagesUnion[List[str], str]The target language code or a list of target language codes. For a list of target languages, refer to target languages. If multiple languages are specified, a translated document is returned for each language.
source_languageOptional[str]NoneThe source language code. If set to None, the source language is auto-detected. For a list of source languages, refer to source languages.
api_keySecretSecret.from_env_var('DEEPL_API_KEY')DeepL API key.
preserve_formattingOptional[bool]NoneControls automatic formatting correction. If set to None, it acts as True to prevent automatic correction of formatting.
split_sentencesLiteral[0, 1, 'nonewlines', None]NoneControls how the translation engine should split input into sentences before translation. Sets whether the translation engine should first split the input into sentences. This is enabled by default. Possible values are: - 0: 0 means OFF. No splitting at all, whole input is treated as one sentence. Use this option if the input text is already split into sentences, to prevent the engine from splitting the sentence unintentionally. - 1: 1 means ALL. (default) splits on punctuation and on newlines. - 'nonewlines': splits on punctuation only, ignoring newlines.
contextOptional[str]NoneMakes it possible to include additional context that can influence a translation without being translated itself. Providing additional context can potentially improve translation quality, especially for short, low-context source texts such as product names on an e-commerce website, article headlines on a news website, or UI elements. For more information and examples, refer to the API documentation.
formalityLiteral[None, 'less', 'default', 'more', 'prefer_more', 'prefer_less']NoneControls whether translations should lean toward informal or formal language. This feature currently only works for the following languages DE (German), FR (French), IT (Italian), ES (Spanish), NL (Dutch), PL (Polish), PT-BR and PT-PT (Portuguese), JA (Japanese), and RU (Russian). The available options are: - 'less': Translate using informal language. - 'default': Translate using the default formality. - 'more': Translate using formal language. - 'prefer_more': Translate using formal language if the target language supports formality, otherwise use default formality. - 'prefer_less': Translate using informal language if the target language supports formality, otherwise use default formality.
max_retriesOptional[int]5Maximum number of network retries after failed HTTP request. Default retries is set to 5.
glossaryUnion[str, None]None(Optional) glossary ID to use for translation. Must match specified source_lang and target_lang.
tag_handlingLiteral[None, 'xml', 'html']None(Optional) Type of tags to parse before translation, only "xml" and "html" are currently available.
outline_detectionOptional[bool]None(Optional) Set to False to disable automatic tag detection.
non_splitting_tagsUnion[str, List[str], None]None(Optional) XML tags that should not split a sentence.
splitting_tagsUnion[str, List[str], None]None(Optional) XML tags that should split a sentence.
ignore_tagsUnion[str, List[str], None]None(Optional) XML tags containing text that should not be translated.
include_scoreboolTrueWhether to include the original document score in the translated document.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
documentsList[Document]List of Haystack documents to be translated.