RemoteWhisperTranscriber
Transcribe audio files using the OpenAI's Whisper API.
Basic Information
- Type:
components.audio.whisper_remote.RemoteWhisperTranscriber - Components it can connect with:
FileClassifier:RemoteWhisperTranscribercan receive the sources to transcribe fromFileClassifier.Preprocessors:RemoteWhisperTranscribercan send the transcribed documents to a Preprocessor, likeDocumentSplitter, that prepares the documents for search.
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | List[Union[str, Path, ByteStream]] | A list of file paths or ByteStream objects containing the audio files to transcribe. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| documents | List[Document] | A list of documents, one document for each file. The content of each document is the transcribed text. |
Overview
RemoteWhisperTranscriber uses OpenAI's Whisper API to transcribe audio files it receives and return them as documents that you can store in a document store to use for search. It's typically used in indexes. For supported audio formats, languages, and other parameters, see the Whisper API documentation.
Authorization
You need an OpenAI API key to use this component. Use this key to connect deepset to OpenAI:
Add Workspace-Level Integration
- Click your profile icon and choose Settings.
- Go to Workspace>Integrations.
- Find the provider you want to connect and click Connect next to them.
- Enter the API key and any other required details.
- Click Connect. You can use this integration in pipelines and indexes in the current workspace.
Add Organization-Level Integration
- Click your profile icon and choose Settings.
- Go to Organization>Integrations.
- Find the provider you want to connect and click Connect next to them.
- Enter the API key and any other required details.
- Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.
Usage Example
Initializing the Component
components:
RemoteWhisperTranscriber:
type: components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
Using the Component in an Index
This is an example of an index that can process PDF, text, and audio files. FileClassifier routes files by their type to an appropriate converter. Then, the resulting documents are joined and sent to DocumentSplitter.
# If you need help with the YAML format, have a look at https://docs.cloud.deepset.ai/v2.0/docs/create-a-pipeline#create-a-pipeline-using-pipeline-editor.
# This section defines components that you want to use in your pipelines. Each component must have a name and a type. You can also set the component's parameters here.
# The name is up to you, you can give your component a friendly name. You then use components' names when specifying the connections in the pipeline.
# Type is the class path of the component. You can check the type on the component's documentation page.
components:
file_classifier:
type: haystack.components.routers.file_type_router.FileTypeRouter
init_parameters:
mime_types:
- text/plain
- application/pdf
- video/mp4
text_converter:
type: haystack.components.converters.txt.TextFileToDocument
init_parameters:
encoding: utf-8
pdf_converter:
type: haystack.components.converters.pdfminer.PDFMinerToDocument
init_parameters:
line_overlap: 0.5
char_margin: 2
line_margin: 0.5
word_margin: 0.1
boxes_flow: 0.5
detect_vertical: true
all_texts: false
store_full_path: false
joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate
sort_by_score: false
splitter:
type: haystack.components.preprocessors.document_splitter.DocumentSplitter
init_parameters:
split_by: word
split_length: 250
split_overlap: 30
respect_sentence_boundary: true
language: en
document_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.document_embedder.DeepsetNvidiaDocumentEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2
writer:
type: haystack.components.writers.document_writer.DocumentWriter
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: ''
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
policy: OVERWRITE
RemoteWhisperTranscriber:
type: haystack.components.audio.whisper_remote.RemoteWhisperTranscriber
init_parameters:
api_key:
type: env_var
env_vars:
- OPENAI_API_KEY
strict: false
model: whisper-1
api_base_url:
organization:
http_client_kwargs:
connections: # Defines how the components are connected
- sender: file_classifier.text/plain
receiver: text_converter.sources
- sender: file_classifier.application/pdf
receiver: pdf_converter.sources
- sender: text_converter.documents
receiver: joiner.documents
- sender: pdf_converter.documents
receiver: joiner.documents
- sender: joiner.documents
receiver: splitter.documents
- sender: document_embedder.documents
receiver: writer.documents
- sender: file_classifier.video/mp4
receiver: RemoteWhisperTranscriber.sources
- sender: RemoteWhisperTranscriber.documents
receiver: joiner.documents
- sender: splitter.documents
receiver: document_embedder.documents
inputs: # Define the inputs for your pipeline
files: # This component will receive the files to index as input
- file_classifier.sources
max_runs_per_component: 100
metadata: {}
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| api_key | Secret | Secret.from_env_var('OPENAI_API_KEY') | OpenAI API key. Paste the API key on the Integrations page. |
| model | str | whisper-1 | Name of the model to use. Currently accepts only whisper-1. |
| organization | Optional[str] | None | Your OpenAI organization ID. See OpenAI's documentation on Setting Up Your Organization. |
| api_base | Any | An optional URL to use as the API base. For details, see the OpenAI documentation. | |
| http_client_kwargs | Optional[Dict[str, Any]] | None | A dictionary of keyword arguments to configure a custom httpx.Clientor httpx.AsyncClient. For more information, see the HTTPX documentation. |
| kwargs | Any | Other optional parameters for the model. These are sent directly to the OpenAI endpoint. See OpenAI documentation for more details. Some of the supported parameters are: - language: The language of the input audio. Provide the input language in ISO-639-1 format to improve transcription accuracy and latency. - prompt: An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. - response_format: The format of the transcript output. This component only supports json. - temperature: The sampling temperature, between 0 and 1. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. If set to 0, the model uses log probability to automatically increase the temperature until certain thresholds are hit. | |
| api_base_url | Optional[str] | None |
Run Method Parameters
These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.
| Parameter | Type | Default | Description |
|---|---|---|---|
| sources | List[Union[str, Path, ByteStream]] | A list of file paths or ByteStream objects containing the audio files to transcribe. |
Was this page helpful?