DeepsetDeepLDocumentTranslator

Translate the content of your documents using DeepL Python SDK.

Deprecation Notice

This component is deprecated. It will continue to work in your existing pipelines. You can replace it with the DeepLDocumentTranslator component.

Basic Information

Type: deepset_cloud_custom_nodes.converters.deepl_document_translator.DeepsetDeepLDocumentTranslator
Components it can connect with:
- Converters: You can use DeepsetDeeplDocumentTranslator after converters to translate the documents converters return.
- Retrievers: You can use this component to translate documents fetched by a retriever.
- PromptBuilder: DeepsetDeepLDocumentTranslator can send the translated documents to a PromptBuilder, which then includes them in the prompt for the LLM.

Inputs

Parameter	Type	Default	Description
documents	List[Document]		List of Haystack documents to be translated.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A list of translated documents.

Overview

DeepsetDeepLDocumentTranslator uses the DeepL Python library to translate documents into the languages you specify. For a list of supported languages, see DeepL documentation. You can translate one set of documents into multiple languages at once; just pass the language codes in thetarget_languages\ parameter.

Authorization

You must have an active DeepL account and a DeepL API key to use this component. Connect DeepL to deepset on the Integrations page:

Add Workspace-Level Integration

Click your profile icon and choose Settings.
Go to Workspace>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

Click your profile icon and choose Settings.
Go to Organization>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Once deepset is connected, you can use DeepsetDeepLDocumentTranslator without passing the API key in the pipeline YAML.

Usage Example

Initializing the Component

components:
  DeepsetDeepLDocumentTranslator:
    type: converters.deepl_translator.DeepsetDeepLDocumentTranslator
    init_parameters:

Using the Component in a Pipeline

This is an example of a query pipeline where DeepsetDeepLDocumentTranslator receives documents from a Ranker and translates them into German. The output of the pipeline are the translated documents:

DeepsetDeepLDocumentTranslator in a pipeline

To specify the languages you want DeepL to translate into, you list their codes in the target_languages parameter:

Here's the pipeline YAML:

components:
  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/multilingual-e5-base

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        init_parameters:
          embedding_dim: 768
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
      top_k: 20 # The number of results to return

  ranker:
    type: haystack.components.rankers.transformers_similarity.TransformersSimilarityRanker
    init_parameters:
      model: "jeffwan/mmarco-mMiniLMv2-L12-H384-v1"
      top_k: 20
      model_kwargs:
        torch_dtype: "torch.float16"

  deepl_translator:
    type: deepset_cloud_custom_nodes.converters.deepl_translator.DeepsetDeepLDocumentTranslator
    # For more information about DeepL supported languages, see https://developers.deepl.com/docs/resources/supported-languages
    init_parameters:
      api_key: {"type": "env_var", "env_vars": ["DEEPL_API_KEY"], "strict": false}
      target_languages: ["DE"] # Translate documents into German
      source_language:      # Auto-detects the source language when set to "null"
      preserve_formatting: true # Prevent automatic correction of formatting
      include_score: true # Display relevance score 

connections:  # Defines how the components are connected
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: deepl_translator.documents

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "query_embedder.text"
  - "ranker.query"

  filters:  # These components will receive a potential query filter as input
  - "embedding_retriever.filters"
outputs:  # Defines the output of your pipeline
  documents: "deepl_translator.documents"  # The output of the pipeline is the retrieved documents translated into German.

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
target_languages	Union[List[str], str]		The target language code or a list of target language codes. For a list of target languages, refer to target languages. If multiple languages are specified, a translated document is returned for each language.
source_language	Optional[str]	None	The source language code. If set to `None`, the source language is auto-detected. For a list of source languages, refer to source languages.
api_key	Secret	Secret.from_env_var('DEEPL_API_KEY')	DeepL API key.
preserve_formatting	Optional[bool]	None	Controls automatic formatting correction. If set to `None`, it acts as `True` to prevent automatic correction of formatting.
split_sentences	Literal[0, 1, 'nonewlines', None]	None	Controls how the translation engine should split input into sentences before translation. Sets whether the translation engine should first split the input into sentences. This is enabled by default. Possible values are: - 0: 0 means OFF. No splitting at all, whole input is treated as one sentence. Use this option if the input text is already split into sentences, to prevent the engine from splitting the sentence unintentionally. - 1: 1 means ALL. (default) splits on punctuation and on newlines. - 'nonewlines': splits on punctuation only, ignoring newlines.
context	Optional[str]	None	Makes it possible to include additional context that can influence a translation without being translated itself. Providing additional context can potentially improve translation quality, especially for short, low-context source texts such as product names on an e-commerce website, article headlines on a news website, or UI elements. For more information and examples, refer to the API documentation.
formality	Literal[None, 'less', 'default', 'more', 'prefer_more', 'prefer_less']	None	Controls whether translations should lean toward informal or formal language. This feature currently only works for the following languages DE (German), FR (French), IT (Italian), ES (Spanish), NL (Dutch), PL (Polish), PT-BR and PT-PT (Portuguese), JA (Japanese), and RU (Russian). The available options are: - 'less': Translate using informal language. - 'default': Translate using the default formality. - 'more': Translate using formal language. - 'prefer_more': Translate using formal language if the target language supports formality, otherwise use default formality. - 'prefer_less': Translate using informal language if the target language supports formality, otherwise use default formality.
max_retries	Optional[int]	5	Maximum number of network retries after failed HTTP request. Default retries is set to 5.
glossary	Union[str, None]	None	(Optional) glossary ID to use for translation. Must match specified `source_lang` and `target_lang`.
tag_handling	Literal[None, 'xml', 'html']	None	(Optional) Type of tags to parse before translation, only "xml" and "html" are currently available.
outline_detection	Optional[bool]	None	(Optional) Set to False to disable automatic tag detection.
non_splitting_tags	Union[str, List[str], None]	None	(Optional) XML tags that should not split a sentence.
splitting_tags	Union[str, List[str], None]	None	(Optional) XML tags that should split a sentence.
ignore_tags	Union[str, List[str], None]	None	(Optional) XML tags containing text that should not be translated.
include_score	bool	True	Whether to include the original document score in the translated document.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
documents	List[Document]		List of Haystack documents to be translated.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authorization​

Add Workspace-Level Integration​

Add Organization-Level Integration​

Usage Example​

Initializing the Component​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​