RegexTextExtractor
Extract text from chat messages or strings using a regex pattern.
Basic Information
- Type:
haystack.components.extractors.regex_text_extractor.RegexTextExtractor - Components it can connect with:
- Any component that produces
text_or_messages. It's usually used in query pipelines to extract text from the query it receives from theInputcomponent. You can also use it to extract text from a ChatGenerator's output. - Any component that consumes a text string. You can use it in query pipelines to send the extracted text to
PromptBuilder,AnswerBuilder, Retrievers, and similar.
- Any component that produces
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| text_or_messages | Union[str, List[ChatMessage]] | Either a string or a list of ChatMessage objects to search through. |
Outputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| captured_text | str | The matched text if a match is found. Empty string if no match is found and return_empty_on_no_match=False. |
Overview
The RegexTextExtractor parses input text or ChatMessages using a regular expression pattern you provide. You can configure it to search through all messages or only the last message in a list of ChatMessages.
The pattern should include a capture group to extract the desired text. If the pattern has no capture groups, the component returns the entire match.
Usage Example
This query pipeline uses a ChatGenerator to analyze text and produce structured output, then uses RegexTextExtractor to extract a specific issue URL from the response:
components:
ChatPromptBuilder:
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
init_parameters:
template:
- _content:
- text: |
You are a helpful assistant that identifies issues in text.
When you find an issue, format your response as:
<issue url="https://github.com/example/repo/issues/123">Description of the issue</issue>
Analyze the following text and identify any issues:
{{ query }}
_role: user
required_variables:
variables:
OpenAIChatGenerator:
type: haystack.components.generators.chat.openai.OpenAIChatGenerator
init_parameters:
model: gpt-4o-mini
generation_kwargs:
temperature: 0.3
RegexTextExtractor:
type: haystack.components.extractors.regex_text_extractor.RegexTextExtractor
init_parameters:
regex_pattern: '<issue url="([^"]+)">'
return_empty_on_no_match: true
AnswerBuilder:
type: haystack.components.builders.answer_builder.AnswerBuilder
init_parameters:
pattern:
reference_pattern:
connections:
- sender: ChatPromptBuilder.prompt
receiver: OpenAIChatGenerator.messages
- sender: OpenAIChatGenerator.replies
receiver: RegexTextExtractor.text_or_messages
- sender: RegexTextExtractor.captured_text
receiver: AnswerBuilder.replies
inputs:
query:
- ChatPromptBuilder.query
- AnswerBuilder.query
outputs:
answers: AnswerBuilder.answers
In this example:
ChatPromptBuildercreates a prompt asking the LLM to identify issues and format them with a specific XML-like tag.OpenAIChatGeneratorgenerates a response containing the structured output.RegexTextExtractoruses the pattern<issue url="([^"]+)">to extract the URL from the response. The capture group([^"]+)matches any characters except quotes inside theurlattribute.AnswerBuilderformats the extracted URL as the final answer.
Parameters
Init Parameters
These are the parameters you can configure in Pipeline Builder:
| Parameter | Type | Default | Description |
|---|---|---|---|
| regex_pattern | str | The regular expression pattern used to extract text. The pattern should include a capture group to extract the desired text. Example: '<issue url="(.+)">' captures the URL from the tag. | |
| return_empty_on_no_match | bool | True | If True, returns an empty dictionary when no match is found. If False, returns {"captured_text": ""}. |
Run Method Parameters
These are the parameters you can configure for the component's run() method.
| Parameter | Type | Default | Description |
|---|---|---|---|
| text_or_messages | Union[str, List[ChatMessage]] | Either a string or a list of ChatMessage objects to search through. |
Was this page helpful?