DeepsetDeepLDocumentTranslator

Translate the content of your documents using DeepL Python SDK.

Basic Information

  • Pipeline type: Indexing and Query
    When used in an indexing pipeline, it translates documents before they're indexed.
    When used in a query pipeline, it translates documents after they're retrieved from a document store.
  • Type: deepset_cloud_custom_nodes.converters.deepl_document_translator.DeepsetDeepLDocumentTranslator
  • Components it can connect with:
    • Converters: You can use DeepsetDeepLTextTranslator after converters to translate the documents converters return.
    • Retrievers: You can use this component to translate documents fetched by a retriever.
    • PromptBuilder: DeepsetDeepLTextTranslator can send the translated documents to a PromptBuilder, which then includes them in the prompt for the LLM.

Inputs

NameTypeDescription
documentsList of Document objectsA list of documents to be translated.

Outputs

NameTypeDescription
translated_documentsList of Document objectsA list of translated documents.

Overview

DeepsetDeepLDocumentTranslator uses the DeepL Python library to translate documents into the languages you specify. For a list of supported languages, see DeepL documentation. You can translate one set of documents into multiple languages at once; just pass the language codes in the target_languages parameter.

Authorization

You must have an active DeepL account and a DeepL API key to use this component. Connect DeepL to deepset Cloud on the Connections page:

  1. Click your initials in the top right corner and select Connections.

  2. Click Connect next to the provider.

  3. Enter your API key and submit it.

Once deepset Cloud is connected, you can use DeepsetDeepLDocumentTranslator without passing the API key in the pipeline YAML.

Usage Example

This is an example of a query pipeline where DeepsetDeepLDocumentTranslator receives documents from a Ranker and translates them into German. The output of the pipeline are the translated documents:

DeepL translator in a query pipeline where it receives documents to translate from the ranker, translates them and then sends them to the pipeline output

To specify the languages you want DeepL to translate into, you list their codes in the target_languages parameter:

Here is the pipeline YAML:

components:
  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/multilingual-e5-base

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        init_parameters:
          embedding_dim: 768
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: "jeffwan/mmarco-mMiniLMv2-L12-H384-v1"
      top_k: 20
      model_kwargs:
        torch_dtype: "torch.float16"

  deepl_translator:
    type: deepset_cloud_custom_nodes.converters.deepl_translator.DeepsetDeepLDocumentTranslator
    # For more information about DeepL supported languages, see https://developers.deepl.com/docs/resources/supported-languages
    init_parameters:
      api_key: {"type": "env_var", "env_vars": ["DEEPL_API_KEY"], "strict": false}
      target_languages: ["DE"] # Translate documents into German
      source_language:      # Auto-detects the source language when set to "null"
      preserve_formatting: true # Prevent automatic correction of formatting
      include_score: true # Display relevance score 

connections:  # Defines how the components are connected
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: deepl_translator.documents

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "query_embedder.text"
  - "ranker.query"

  filters:  # These components will receive a potential query filter as input
  - "embedding_retriever.filters"
outputs:  # Defines the output of your pipeline
  documents: "deepl_translator.documents"  # The output of the pipeline is the retrieved documents translated into German.

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:


ParameterTypePossible valuesDescription
target_languagesList of stringsFor a list of possible options, see DeepL documentation.Codes of languages you want to translate your documents into.
Required.
source_languageStringDefault: NoneThe language of the documents you want to translate. If None, it's automatically detected. For a list of possible options, see DeepL documentation.
Optional.
api_keySecretDefault: Secret.from_env_var("DEEPL_API_TOKEN")The DeepL API key.
Required.
preserve_formattingBooleanDefault: TrueControls automatic formatting correction. When True, prevents automatic correction of formatting.
Optional
split_sentencesLiteral0
1
nonewlines
Default: None
Controls how the translation engine splits input into sentences before translation and whether it first splits the input into sentences.
This is enabled by default.
Possible values are:

- 0: means OFF. No splitting at all, the whole input is treated as one sentence. Use this option if the input text is already split into sentences to prevent the engine from splitting the sentence unintentionally.
- 1: means ALL. (default) splits on punctuation and on newlines.
- nonewlines: splits on punctuation only, ignoring newlines.
Required.
contextStringDefault: NoneUse this parameter to include additional context that can influence a translation without being translated itself. Providing additional context can potentially improve translation quality, especially for short, low-context source texts such as product names on an e-commerce website, article headlines on a news website, or UI elements. For details, see DeepL API documentation.
Optional.
formalityStringless
more
Default: None
Controls whether translations lean toward informal or formal language. This works only for target languages: German (DE), French (FR), Italian (IT), Spanish (ES), Dutch (NL), Polish (PL), Portugese (PT_BR and PT_PT), Japanese (JA), and Russian (RU). Possible values:

- less: Uses more informal language
- more: Uses more polite and formal language.
Optional.
max_retriesIntegerDefault: 5The maximum number of network retries after a failed HTTP request.
Optional.
glossaryUnionDefault: NoneGlossary ID to use for translation. Must match specified source_lang and target_lang.
Required.
tag_handlingLiteralxml
html
Default: None
Type of tags to parse before translation. Currently, supports only XML and HTML.
Required.
outline_detectionBooleanTrue
False
Default: None
Set to False to disable automatic tag detection.
Optional.
non_splitting_tagsUnionDefault: NoneXML tags that should not split a sentence.
Required.
splitting_tagsUnionDefault: NoneXML tags that should split a sentence.
Required.
ignore_tagsUnionDefault: NoneXML tags containing text that should not be translated.
Required.
include_scoreBooleanTrue
False
Default: True
Whether to include the original document score in the translated document.
Required.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.


ParameterTypeDescription
documentsList of Document objectsList of documents to be translated.
Required.