Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

RemoteWhisperTranscriber

Transcribe audio files using the OpenAI Whisper API.

RemoteWhisperTranscriber uses OpenAI's Whisper API to transcribe audio files and return them as documents for storage in a document store. It's typically used in indexes. For supported audio formats, languages, and other parameters, see the Whisper API documentation.

Key Features

  • Transcribes audio files to text documents using OpenAI's Whisper API.
  • Supports multiple audio formats (see Whisper API documentation for the full list).
  • Returns one document per audio file with the transcribed text as content.

Configuration

You need an OpenAI API key to use this component. Connect deepset AI Platform to OpenAI first:

Add Workspace-Level Integration

  1. Click your profile icon and choose Settings.
  2. Go to Workspace>Integrations.
  3. Find the provider you want to connect and click Connect next to them.
  4. Enter the API key and any other required details.
  5. Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

  1. Click your profile icon and choose Settings.
  2. Go to Organization>Integrations.
  3. Find the provider you want to connect and click Connect next to them.
  4. Enter the API key and any other required details.
  5. Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

:::

  1. Drag the RemoteWhisperTranscriber component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab, select the Whisper model to use.
  4. Go to the Advanced tab to configure the API key, base URL, organization, timeout, max retries, and HTTP client settings.

Connections

RemoteWhisperTranscriber accepts a list of file paths or ByteStream objects (sources) containing audio files as input. It outputs a list of documents (documents), one per audio file, each containing the transcribed text.

Typically, RemoteWhisperTranscriber receives audio files from a FileClassifier (or FileTypeRouter) that routes files by type. Its documents output connects to a DocumentJoiner to combine transcriptions with documents from other converters before splitting and indexing.

  1. Drag the RemoteWhisperTranscriber component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set the Model to the Whisper model to use. Currently only whisper-1 is supported.
  4. Go to the Advanced tab to configure additional settings:
    • Set the API Key to your OpenAI API key (configured via the Integrations page).
    • Set Organization if you want to specify your OpenAI organization ID.
    • Set API Base URL to use a custom API endpoint.
    • Set HTTP Client Kwargs for custom httpx client configuration.
    • Set kwargs for additional model parameters such as language, prompt, response_format, or temperature.

Source Code

To check this component's source code, open whisper_remote.py in the Haystack repository.

Connections

RemoteWhisperTranscriber accepts a list of file paths or ByteStream objects through its sources input. It outputs a list of Document objects, one per audio file.

It typically connects with:

  • FileTypeRouter or FileClassifier: receives audio files routed by MIME type.
  • DocumentJoiner: sends transcribed documents to join with documents from other converters.
  • DocumentSplitter: sends transcribed documents for splitting before further processing.

Usage Examples

Basic Configuration

  RemoteWhisperTranscriber:
type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: whisper-1

Using the Component in an Index

This is an example of an index that can process PDF, text, and audio files. FileClassifier routes files by their type to the appropriate converter. The resulting documents are joined and sent to DocumentSplitter.


components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- video/mp4
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8

pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false

joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false

splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en

document_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE

RemoteWhisperTranscriber:
type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: whisper-1
api_base_url:
organization:
http_client_kwargs:

connections: # Defines how the components are connected
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: text_converter.documents
receiver: joiner.documents
- sender: pdf_converter.documents
receiver: joiner.documents
- sender: joiner.documents
receiver: splitter.documents
- sender: document_embedder.documents
receiver: writer.documents
- sender: file_classifier.video/mp4
receiver: RemoteWhisperTranscriber.sources
- sender: RemoteWhisperTranscriber.documents
receiver: joiner.documents
- sender: splitter.documents
receiver: document_embedder.documents

inputs: # Define the inputs for your pipeline
files: # This component will receive the files to index as input
- file_classifier.sources

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDescription
sourcesList[Union[str, Path, ByteStream]]A list of file paths or ByteStream objects containing the audio files to transcribe.

Outputs

ParameterTypeDescription
documentsList[Document]A list of documents, one document for each file. The content of each document is the transcribed text.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
api_keySecretSecret.from_env_var('OPENAI_API_KEY')OpenAI API key. Paste the API key on the Integrations page.
modelstrwhisper-1Name of the model to use. Currently accepts only whisper-1.
organizationOptional[str]NoneYour OpenAI organization ID. See OpenAI's documentation on Setting Up Your Organization.
api_baseAnyAn optional URL to use as the API base. For details, see the OpenAI documentation.
http_client_kwargsOptional[Dict[str, Any]]NoneA dictionary of keyword arguments to configure a custom httpx.Client or httpx.AsyncClient. For more information, see the HTTPX documentation.
kwargsAnyOther optional parameters for the model. These are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Supported parameters include: language (ISO-639-1 format), prompt, response_format (only json is supported), and temperature.
api_base_urlOptional[str]NoneAn optional URL to use as the API base URL.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDescription
sourcesList[Union[str, Path, ByteStream]]A list of file paths or ByteStream objects containing the audio files to transcribe.