RemoteWhisperTranscriber
Transcribe audio files using the OpenAI Whisper API.
RemoteWhisperTranscriber uses OpenAI's Whisper API to transcribe audio files and return them as documents for storage in a document store. It's typically used in indexes. For supported audio formats, languages, and other parameters, see the Whisper API documentation.
Key Features
- Transcribes audio files to text documents using OpenAI's Whisper API.
- Supports multiple audio formats (see Whisper API documentation for the full list).
- Returns one document per audio file with the transcribed text as content.
Configuration
You need an OpenAI API key to use this component. Connect deepset AI Platform to OpenAI first:
Add Workspace-Level Integration
- Click your profile icon and choose Settings.
- Go to Workspace>Integrations.
- Find the provider you want to connect and click Connect next to them.
- Enter the API key and any other required details.
- Click Connect. You can use this integration in pipelines and indexes in the current workspace.
Add Organization-Level Integration
- Click your profile icon and choose Settings.
- Go to Organization>Integrations.
- Find the provider you want to connect and click Connect next to them.
- Enter the API key and any other required details.
- Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.
:::
- Drag the
RemoteWhisperTranscribercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab, select the Whisper model to use.
- Go to the Advanced tab to configure the API key, base URL, organization, timeout, max retries, and HTTP client settings.
Connections
RemoteWhisperTranscriber accepts a list of file paths or ByteStream objects (sources) containing audio files as input. It outputs a list of documents (documents), one per audio file, each containing the transcribed text.
Typically, RemoteWhisperTranscriber receives audio files from a FileClassifier (or FileTypeRouter) that routes files by type. Its documents output connects to a DocumentJoiner to combine transcriptions with documents from other converters before splitting and indexing.
- Drag the
RemoteWhisperTranscribercomponent onto the canvas from the Component Library. - Click on the component to open the configuration panel.
- On the General tab:
- Set the Model to the Whisper model to use. Currently only
whisper-1is supported.
- Set the Model to the Whisper model to use. Currently only
- Go to the Advanced tab to configure additional settings:
- Set the API Key to your OpenAI API key (configured via the Integrations page).
- Set Organization if you want to specify your OpenAI organization ID.
- Set API Base URL to use a custom API endpoint.
- Set HTTP Client Kwargs for custom
httpxclient configuration. - Set kwargs for additional model parameters such as
language,prompt,response_format, ortemperature.
Source Code
To check this component's source code, open whisper_remote.py in the Haystack repository.
Connections
RemoteWhisperTranscriber accepts a list of file paths or ByteStream objects through its sources input. It outputs a list of Document objects, one per audio file.
It typically connects with:
FileTypeRouterorFileClassifier: receives audio files routed by MIME type.DocumentJoiner: sends transcribed documents to join with documents from other converters.DocumentSplitter: sends transcribed documents for splitting before further processing.
Usage Examples
Basic Configuration
RemoteWhisperTranscriber:
type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: whisper-1
Using the Component in an Index
This is an example of an index that can process PDF, text, and audio files. FileClassifier routes files by their type to the appropriate converter. The resulting documents are joined and sent to DocumentSplitter.
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- video/mp4
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
document_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE
RemoteWhisperTranscriber:
type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: whisper-1
api_base_url:
organization:
http_client_kwargs:
connections: # Defines how the components are connected
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: text_converter.documents
receiver: joiner.documents
- sender: pdf_converter.documents
receiver: joiner.documents
- sender: joiner.documents
receiver: splitter.documents
- sender: document_embedder.documents
receiver: writer.documents
- sender: file_classifier.video/mp4
receiver: RemoteWhisperTranscriber.sources
- sender: RemoteWhisperTranscriber.documents
receiver: joiner.documents
- sender: splitter.documents
receiver: document_embedder.documents
inputs: # Define the inputs for your pipeline
files: # This component will receive the files to index as input
- file_classifier.sources
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Description |
|---|---|---|
sources | List[Union[str, Path, ByteStream]] | A list of file paths or ByteStream objects containing the audio files to transcribe. |
Outputs
| Parameter | Type | Description |
|---|---|---|
documents | List[Document] | A list of documents, one document for each file. The content of each document is the transcribed text. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | Secret | Secret.from_env_var('OPENAI_API_KEY') | OpenAI API key. Paste the API key on the Integrations page. |
model | str | whisper-1 | Name of the model to use. Currently accepts only whisper-1. |
organization | Optional[str] | None | Your OpenAI organization ID. See OpenAI's documentation on Setting Up Your Organization. |
api_base | Any | An optional URL to use as the API base. For details, see the OpenAI documentation. | |
http_client_kwargs | Optional[Dict[str, Any]] | None | A dictionary of keyword arguments to configure a custom httpx.Client or httpx.AsyncClient. For more information, see the HTTPX documentation. |
kwargs | Any | Other optional parameters for the model. These are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Supported parameters include: language (ISO-639-1 format), prompt, response_format (only json is supported), and temperature. | |
api_base_url | Optional[str] | None | An optional URL to use as the API base URL. |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Description |
|---|---|---|
sources | List[Union[str, Path, ByteStream]] | A list of file paths or ByteStream objects containing the audio files to transcribe. |
Related Information
Was this page helpful?