RemoteWhisperTranscriber

Transcribe audio files using the OpenAI's Whisper API.

Basic Information

Type: components.audio.whisper_remote.RemoteWhisperTranscriber
Components it can connect with:
- FileClassifier: RemoteWhisperTranscriber can receive the sources to transcribe from FileClassifier.
- Preprocessors: RemoteWhisperTranscriber can send the transcribed documents to a Preprocessor, like DocumentSplitter, that prepares the documents for search.

Inputs

Parameter	Type	Default	Description
sources	List[Union[str, Path, ByteStream]]		A list of file paths or `ByteStream` objects containing the audio files to transcribe.

Outputs

Parameter	Type	Default	Description
documents	List[Document]		A list of documents, one document for each file. The content of each document is the transcribed text.

Overview

RemoteWhisperTranscriber uses OpenAI's Whisper API to transcribe audio files it receives and return them as documents that you can store in a document store to use for search. It's typically used in indexes. For supported audio formats, languages, and other parameters, see the Whisper API documentation.

Authorization

You need an OpenAI API key to use this component. Use this key to connect Haystack Platform to OpenAI:

Add Workspace-Level Integration

Click your profile icon and choose Settings.
Go to Workspace>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

Click your profile icon and choose Settings.
Go to Organization>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

Usage Example

Using the Component in an Index

This is an example of an index that can process PDF, text, and audio files. FileClassifier routes files by their type to an appropriate converter. Then, the resulting documents are joined and sent to DocumentSplitter.

components:
  file_classifier:
    type: haystack.components.routers.file_type_router.FileTypeRouter
    init_parameters:
      mime_types:
      - text/plain
      - application/pdf
      - video/mp4
  text_converter:
    type: haystack.components.converters.txt.TextFileToDocument
    init_parameters:
      encoding: utf-8

  pdf_converter:
    type: haystack.components.converters.pdfminer.PDFMinerToDocument
    init_parameters:
      line_overlap: 0.5
      char_margin: 2
      line_margin: 0.5
      word_margin: 0.1
      boxes_flow: 0.5
      detect_vertical: true
      all_texts: false
      store_full_path: false

  joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate
      sort_by_score: false

  splitter:
    type: haystack.components.preprocessors.document_splitter.DocumentSplitter
    init_parameters:
      split_by: word
      split_length: 250
      split_overlap: 30
      respect_sentence_boundary: true
      language: en

  document_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  writer:
    type: haystack.components.writers.document_writer.DocumentWriter
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: ''
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      policy: OVERWRITE

  RemoteWhisperTranscriber:
    type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
    init_parameters:
      api_key:
        type: env_var
        env_vars:
        - OPENAI_API_KEY
        strict: false
      model: whisper-1
      api_base_url:
      organization:
      http_client_kwargs:

connections:  # Defines how the components are connected
- sender: file_classifier.text/plain
  receiver: text_converter.sources
- sender: file_classifier.application/pdf
  receiver: pdf_converter.sources
- sender: text_converter.documents
  receiver: joiner.documents
- sender: pdf_converter.documents
  receiver: joiner.documents
- sender: joiner.documents
  receiver: splitter.documents
- sender: document_embedder.documents
  receiver: writer.documents
- sender: file_classifier.video/mp4
  receiver: RemoteWhisperTranscriber.sources
- sender: RemoteWhisperTranscriber.documents
  receiver: joiner.documents
- sender: splitter.documents
  receiver: document_embedder.documents

inputs:  # Define the inputs for your pipeline
  files:                            # This component will receive the files to index as input
  - file_classifier.sources

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
api_key	Secret	Secret.from_env_var('OPENAI_API_KEY')	OpenAI API key. Paste the API key on the Integrations page.
model	str	whisper-1	Name of the model to use. Currently accepts only `whisper-1`.
organization	Optional[str]	None	Your OpenAI organization ID. See OpenAI's documentation on Setting Up Your Organization.
api_base	Any		An optional URL to use as the API base. For details, see the OpenAI documentation.
http_client_kwargs	Optional[Dict[str, Any]]	None	A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. For more information, see the HTTPX documentation.
kwargs	Any		Other optional parameters for the model. These are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters are: - `language`: The language of the input audio. Provide the input language in ISO-639-1 format to improve transcription accuracy and latency. - `prompt`: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. - `response_format`: The format of the transcript output. This component only supports `json`. - `temperature`: The sampling temperature, between 0 and 1. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit.
api_base_url	Optional[str]	None

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
sources	List[Union[str, Path, ByteStream]]		A list of file paths or `ByteStream` objects containing the audio files to transcribe.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authorization​

Add Workspace-Level Integration​

Add Organization-Level Integration​

Usage Example​

Using the Component in an Index​

Parameters​

Init Parameters​

Run Method Parameters​