Basic Information

When used in an index, it translates documents before they're indexed.
When used in a query pipeline, it translates documents after they're retrieved from a document store.

Type: deepset_cloud_custom_nodes.converters.deepl_document_translator.DeepsetDeepLDocumentTranslator
Components it can connect with:
- Converters: You can use DeepsetDeepLDocumentTranslator after converters to translate the documents converters return.
- Retrievers: You can use this component to translate documents fetched by a retriever.
- PromptBuilder: DeepsetDeepLDocumentTranslator can send the translated documents to a PromptBuilder, which then includes them in the prompt for the LLM.

Inputs

Name	Type	Description
`documents`	List of Document objects	A list of documents to be translated.

Outputs

Name	Type	Description
`translated_documents`	List of Document objects	A list of translated documents.

Overview

DeepsetDeepLDocumentTranslator uses the DeepL Python library to translate documents into the languages you specify. For a list of supported languages, see DeepL documentation. You can translate one set of documents into multiple languages at once; just pass the language codes in the target_languages parameter.

Authorization

You must have an active DeepL account and a DeepL API key to use this component. Connect DeepL to deepset on the Integrations page. For details, see Add Integrations.

Once deepset is connected, you can use DeepsetDeepLDocumentTranslator without passing the API key in the pipeline YAML.

Usage Example

This is an example of a query pipeline where DeepsetDeepLDocumentTranslator receives documents from a Ranker and translates them into German. The output of the pipeline are the translated documents:

To specify the languages you want DeepL to translate into, you list their codes in the target_languages parameter:

Here is the pipeline YAML:

components:
  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/multilingual-e5-base

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        init_parameters:
          embedding_dim: 768
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: "jeffwan/mmarco-mMiniLMv2-L12-H384-v1"
      top_k: 20
      model_kwargs:
        torch_dtype: "torch.float16"

  deepl_translator:
    type: deepset_cloud_custom_nodes.converters.deepl_translator.DeepsetDeepLDocumentTranslator
    # For more information about DeepL supported languages, see https://developers.deepl.com/docs/resources/supported-languages
    init_parameters:
      api_key: {"type": "env_var", "env_vars": ["DEEPL_API_KEY"], "strict": false}
      target_languages: ["DE"] # Translate documents into German
      source_language:      # Auto-detects the source language when set to "null"
      preserve_formatting: true # Prevent automatic correction of formatting
      include_score: true # Display relevance score 

connections:  # Defines how the components are connected
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: deepl_translator.documents

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "query_embedder.text"
  - "ranker.query"

  filters:  # These components will receive a potential query filter as input
  - "embedding_retriever.filters"
outputs:  # Defines the output of your pipeline
  documents: "deepl_translator.documents"  # The output of the pipeline is the retrieved documents translated into German.

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Possible values	Description
`target_languages`	List of strings	For a list of possible options, see DeepL documentation.	Codes of languages you want to translate your documents into. Required.
`source_language`	String	Default: `None`	The language of the documents you want to translate. If `None`, it's automatically detected. For a list of possible options, see DeepL documentation. Optional.
`api_key`	Secret	Default: `Secret.from_env_var("DEEPL_API_TOKEN")`	The DeepL API key. Required.
`preserve_formatting`	Boolean	Default: `True`	Controls automatic formatting correction. When `True`, prevents automatic correction of formatting. Optional
`split_sentences`	Literal	`0` `1` `nonewlines` Default: `None`	Controls how the translation engine splits input into sentences before translation and whether it first splits the input into sentences. This is enabled by default. Possible values are: - `0`: means OFF. No splitting at all, the whole input is treated as one sentence. Use this option if the input text is already split into sentences to prevent the engine from splitting the sentence unintentionally. - `1`: means ALL. (default) splits on punctuation and on newlines. - `nonewlines`: splits on punctuation only, ignoring newlines. Required.
`context`	String	Default: `None`	Use this parameter to include additional context that can influence a translation without being translated itself. Providing additional context can potentially improve translation quality, especially for short, low-context source texts such as product names on an e-commerce website, article headlines on a news website, or UI elements. For details, see DeepL API documentation. Optional.
`formality`	String	`less` `more` Default: `None`	Controls whether translations lean toward informal or formal language. This works only for target languages: German (DE), French (FR), Italian (IT), Spanish (ES), Dutch (NL), Polish (PL), Portugese (PT_BR and PT_PT), Japanese (JA), and Russian (RU). Possible values: - `less`: Uses more informal language - `more`: Uses more polite and formal language. Optional.
`max_retries`	Integer	Default: `5`	The maximum number of network retries after a failed HTTP request. Optional.
`glossary`	Union	Default: `None`	Glossary ID to use for translation. Must match specified `source_lang` and `target_lang`. Required.
`tag_handling`	Literal	`xml` `html` Default: `None`	Type of tags to parse before translation. Currently, supports only XML and HTML. Required.
`outline_detection`	Boolean	`True` `False` Default: `None`	Set to False to disable automatic tag detection. Optional.
`non_splitting_tags`	Union	Default: `None`	XML tags that should not split a sentence. Required.
`splitting_tags`	Union	Default: `None`	XML tags that should split a sentence. Required.
`ignore_tags`	Union	Default: `None`	XML tags containing text that should not be translated. Required.
`include_score`	Boolean	`True` `False` Default: `True`	Whether to include the original document score in the translated document. Required.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Run() method parameters take precedence over initialization parameters.

Parameter	Type	Description
`documents`	List of `Document` objects	List of documents to be translated. Required.