RemoteWhisperTranscriber
Transcribe audio files using the OpenAI Whisper API. The component returns each audio file as a document containing the transcribed text.
Key Features
- Transcribes audio files using the OpenAI Whisper API.
- Returns transcribed text as documents you can write into a document store.
- Supports multiple audio formats and languages as defined by the Whisper API.
- Configurable API key, base URL, and organization ID.
- Configurable timeout and retry settings.
- Accepts custom httpx client settings for advanced network configurations.
Configuration
You need an OpenAI API key to use this component. Use this key to connect Haystack Platform to OpenAI:
Add Workspace-Level Integration
- Click your profile icon and choose Settings.
- Go to Workspace>Integrations.
- Find the provider you want to connect and click Connect next to them.
- Enter the API key and any other required details.
- Click Connect. You can use this integration in pipelines and indexes in the current workspace.
Add Organization-Level Integration
- Click your profile icon and choose Settings.
- Go to Organization>Integrations.
- Find the provider you want to connect and click Connect next to them.
- Enter the API key and any other required details.
- Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.
- Drag the
RemoteWhisperTranscribercomponent onto the canvas from the Component Library. - Click the component to open the configuration panel.
- On the General tab, select the Whisper model to use.
- Go to the Advanced tab to configure the API key, base URL, organization, timeout, max retries, and HTTP client settings.
Connections
RemoteWhisperTranscriber accepts a list of file paths or ByteStream objects (sources) containing audio files as input. It outputs a list of documents (documents), one per audio file, each containing the transcribed text.
Typically, RemoteWhisperTranscriber receives audio files from a FileClassifier (or FileTypeRouter) that routes files by type. Its documents output connects to a DocumentJoiner to combine transcriptions with documents from other converters before splitting and indexing.
Usage Example
Using the Component in an Index
This is an example of an index that can process PDF, text, and audio files. FileClassifier routes files by their type to an appropriate converter. Then, the resulting documents are joined and sent to DocumentSplitter.
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- video/mp4
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
document_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE
RemoteWhisperTranscriber:
type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: whisper-1
api_base_url:
organization:
http_client_kwargs:
connections: # Defines how the components are connected
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: text_converter.documents
receiver: joiner.documents
- sender: pdf_converter.documents
receiver: joiner.documents
- sender: joiner.documents
receiver: splitter.documents
- sender: document_embedder.documents
receiver: writer.documents
- sender: file_classifier.video/mp4
receiver: RemoteWhisperTranscriber.sources
- sender: RemoteWhisperTranscriber.documents
receiver: joiner.documents
- sender: splitter.documents
receiver: document_embedder.documents
inputs: # Define the inputs for your pipeline
files: # This component will receive the files to index as input
- file_classifier.sources
max_runs_per_component: 100
metadata: {}
Parameters
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | List[Union[str, Path, ByteStream]] | A list of file paths or ByteStream objects containing the audio files to transcribe. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents, one document for each file. The content of each document is the transcribed text. |
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| api_key | Secret | Secret.from_env_var('OPENAI_API_KEY') | OpenAI API key. Paste the API key on the Integrations page. |
| model | str | whisper-1 | Name of the model to use. Currently accepts only whisper-1. |
| organization | Optional[str] | None | Your OpenAI organization ID. See OpenAI's documentation on Setting Up Your Organization. |
| api_base | Any | An optional URL to use as the API base. For details, see the OpenAI documentation. | |
| http_client_kwargs | Optional[Dict[str, Any]] | None | A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation. |
| kwargs | Any | Other optional parameters for the model. These are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters are: - language: The language of the input audio. Provide the input language in ISO-639-1 format to improve transcription accuracy and latency. - prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. - response_format: The format of the transcript output. This component only supports json. - temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit. | |
| api_base_url | Optional[str] | None |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | List[Union[str, Path, ByteStream]] | A list of file paths or ByteStream objects containing the audio files to transcribe. |
Was this page helpful?