Basic Information

Type: deepset_cloud_custom_nodes.converters.vlm_pdf_to_document.DeepsetVLMPDFToDocumentConverter
Components it often connects to:
- FileTypeRouter: DeepsetVLMPDFToDocumentConverter receives sources from FileTypeRouter and converts them into documents.
- DocumentJoiner: DeepsetVLMPDFToDocumentConverter can send the converted documents to a DocumentJoiner that joins documents from all Converters in the pipeline.
- PreProcessors: DeepsetVLMPDFToDocumentConverter can send the converted documents to a Preprocessor for further processing.

Inputs

Required Inputs

Name	Type	Description
`sources`	List of `Path` and `ByteStream` objects	The lisf of PDF sources to convert.

Optional Inputs

Name	Type	Default	Description
`meta`	Dictionary	`None`	Metadata or a list of metadata dictionaries.

Outputs

Name	Type	Description
`documents`	Dictionary with a list of `Document` objects	The converted documents.

Overview

DeepsetVLMPDFToDocumentConverter uses a vision language model (VLM) to convert a screenshot of each PDF page into text based on your prompt. Use this converter with PDF files that have:

complex layouts
a mix of images and text
tables
handwritten text
figures

Through prompting, you can convert tables, images, or figures into a textual representation which can be useful for retrieval or for passing the resulting text to an LLM.

It helps to extract text in a natural reading order from PDF documents with complex layouts without having to implement custom post-processing code to keep a natural reading order.

🚧
Costs
This component can cause high costs with OpenAI or Amazon Bedrock if you use it to convert thousands of PDf pages. For OpenAI, one PDF page equals roughly 1,500 input tokens and a page equals roughly between 800 and 3,000 output tokens.

DeepsetVLMPDFToDocumentConverter supports OpenAI models through the OpenAI API and Anthropic models through Amazon Bedrock. It processes PDFs in parallel for both files and pages.

You can adjust the conversion process by passing a custom prompt or adjusting any of the other parameters.
Use the generator_kwargs argument to pass additional parameters to the underlying VLM generator.
Check the DeepsetOpenAIVisionGenerator or the DeepsetAmazonBedrockVisionGenerator to learn about
the parameters that they accept.

Usage Example

This is an example index, where DeepsetVLMPDFToDocumentConverter receives PDFs from FileTypeRouter and then sends the converted files to DocumentJoiner:

components:
  file_classifier:
    type: haystack.components.routers.file_type_router.FileTypeRouter
    init_parameters:
      mime_types:
        - text/plain
        - application/pdf
        - text/markdown
        - text/html
        - application/vnd.openxmlformats-officedocument.wordprocessingml.document
        - application/vnd.openxmlformats-officedocument.presentationml.presentation
        - application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  text_converter:
    type: haystack.components.converters.txt.TextFileToDocument
    init_parameters:
      encoding: utf-8
  markdown_converter:
    type: haystack.components.converters.markdown.MarkdownToDocument
    init_parameters: {}
  html_converter:
    type: haystack.components.converters.html.HTMLToDocument
    init_parameters:
      extraction_kwargs:
        output_format: txt
        target_language: null
        include_tables: true
        include_links: false
  docx_converter:
    type: haystack.components.converters.docx.DOCXToDocument
    init_parameters: {}
  pptx_converter:
    type: haystack.components.converters.pptx.PPTXToDocument
    init_parameters: {}
  xlsx_converter:
    type: deepset_cloud_custom_nodes.converters.xlsx.XLSXToDocument
    init_parameters: {}
  joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
  writer:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          embedding_dim: 1024
          similarity: cosine
      policy: OVERWRITE
  DeepsetVLMPDFToDocumentConverter:
    type: deepset_cloud_custom_nodes.converters.vlm_pdf_to_document.DeepsetVLMPDFToDocumentConverter
    init_parameters:
      vlm_provider: openai
      max_workers_files: 3
      max_workers_pages: 5
      max_retries: 3
      backoff_factor: 2
      initial_backoff_time: 30
      prompt: |-
        Extract the content from the document below.
        You need to extract the content exactly.
        Format everything as markdown.
        Make sure to retain the reading order of the document.

        **Headers- and Footers**
        Remove repeating page headers or footers that disrupt the reading order.
        Place letter heads that appear at the side of a document at the top of the page.


        **Images**
        Do not extract images, drawings or maps.
        Instead, add a caption that describes briefly what you see on the image.
        Enclose each image caption with [img-caption][/img-caption]

        **Tables**
        Make sure to format the table in markdown.
        Add a short caption below the table that describes the table's content.
        Enclose each table caption with [table-caption][/table-caption].
        The caption must be placed below the extracted table.

        **Forms**
        Reproduce checkbox selections with markdown.

        Go ahead and extract!

        Document:
      model: gpt-4o
      max_splits_per_page: 3
      detail: auto
      generator_kwargs:
        generation_kwargs:
          temperature: 0
          seed: 0
          max_tokens: 4000
        timeout: 120
      response_extraction_pattern: null
      progress_bar: true
      page_separator: "\f"
connections:
  - sender: file_classifier.text/plain
    receiver: text_converter.sources
  - sender: file_classifier.text/markdown
    receiver: markdown_converter.sources
  - sender: file_classifier.text/html
    receiver: html_converter.sources
  - sender: file_classifier.application/vnd.openxmlformats-officedocument.wordprocessingml.document
    receiver: docx_converter.sources
  - sender: file_classifier.application/vnd.openxmlformats-officedocument.presentationml.presentation
    receiver: pptx_converter.sources
  - sender: file_classifier.application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
    receiver: xlsx_converter.sources
  - sender: text_converter.documents
    receiver: joiner.documents
  - sender: markdown_converter.documents
    receiver: joiner.documents
  - sender: html_converter.documents
    receiver: joiner.documents
  - sender: docx_converter.documents
    receiver: joiner.documents
  - sender: pptx_converter.documents
    receiver: joiner.documents
  - sender: xlsx_converter.documents
    receiver: joiner.documents
  - sender: joiner.documents
    receiver: writer.documents
  - sender: file_classifier.application/pdf
    receiver: DeepsetVLMPDFToDocumentConverter.sources
  - sender: DeepsetVLMPDFToDocumentConverter.documents
    receiver: joiner.documents
max_runs_per_component: 100
metadata: {}
inputs:
  files:
    - file_classifier.sources

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Possible values	Description
`vlm_provider`	Literal	`openai` `bedrock` Default: `openai`	The type of VLM to use. You can choose OpenAI or Bedrock. Required.
`max_workers_files`	Integer	Default: `3`	The maximum number of threads for processing files. Required.
`max_workers_pages`	Integer	Default: `5`	The maximum number of threads for processing pages. Required.
`max_retries`	Integer	Default: `3`	The maximum number of retries for page-level extraction. Required.
`backoff_factor`	Float	Default: `2.0`	The factor for exponential backoff between retries. Required.
`initial_backoff_time`	Float	Default: `30.0`	The initial backoff time in seconds. Required.
`prompt`	String`	Default: `Extract the content from this document page. Format everything as markdown to recreate the layout as best as possible. Retain the natural reading order.`	The prompt for the VLM. Required.
`openai_api_key`	Secret	Default: `Secret.from_env_var("OPENAI_API_KEY")`	The API key for OpenAI. Required.
`model`	String	Default: `gpt-4o`	The name of the model you want to use. Required.
`max_splits_per_page`	Integer	Default: `3`	The maximum number of splits per page. This parameter only applies when using `openai` as `llm_provider`. It detects when the conversion of a page was truncated because of the maximum number of output tokens and prompts the model to continue the extraction where it left off. Check the maximum number of output tokens for your model in OpenAI-documentation. If you select `bedrock` as `llm_provider`, the output of a page is truncated if it exceeds the maximum number of output tokens. Required.
`detail`	Literal	`auto` `low` `high` Default: `auto`	The level of detail for image processing. Choose `high` for best results and `low` for lowest inference costs. If you choose `auto`, the API automatically adjusts the resolution based on the size of the image input. Required.
`generation_kwargs`	Dictionary	Default `None`	Additional keyword arguments for the generator. Check `DeepsetOpenAIVisionGenerator` or `DeepsetAmazonBedrockVisionGenerator` to learn about the parameters that you can pass. Optional.
`response_extraction_pattern`	String	Default: `None`	A regex pattern to extract text from the Generator's response. Optional.
`progress_bar`	Boolean	`True` `False` Default: `True`	Shows a progress bar. Required.
`page_separator`	String	Default: `\f`	The string to use for separating pages. Required.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Run() method parameters take precedence over initialization parameters.

Parameter	Type	Description
`sources`	List of string, Path, or ByteStream objects	List of PDF sources to convert. Required.
`meta`	List of dictionaries or a dictionary	Metadata for the request. Optional.

Basic Information

Inputs

Required Inputs

Optional Inputs

Outputs

Overview

🚧Costs

Usage Example

Parameters

Init Parameters

Run Method Parameters

🚧
Costs