Custom Components

You can create your own components and add them to your deepset Cloud pipelines.

Why Custom Components?

While we offer various pre-built components to handle various tasks, you might need something more tailored to your needs. This is when custom components come in. With custom components, you can:

  • Adjust the functionality to your specific needs and use cases.
  • Integrate proprietary algorithms or business logic.
  • Extend the pipeline's capabilities.

deepset Cloud uses deepset's open source Haystack framework as the underlying technology. Custom components are based on Haystack's components. To learn more about Haystack, visit the Haystack website. To learn about custom components in Haystack, see Creating Custom Components.

Adding Custom Components

Custom Components Library

We provide a template for creating your custom components, available as a GitHub repository. This template serves as a custom components library for your organization. Components created in the ./dc-custom-component-template/src/dc_custom_component/example_components/ folder and imported into deepset Cloud are the components you can use in your pipelines.

For example, if someone in your organization creates a component called WelcomeTextGenerator and uploads it to deepset Cloud, everyone in the organization can use it. However, if later someone adds two new components, GoodbyeTextGenerator and CharacterSplitter, and deletes WelcomeTextGenerator, only the new components will be available to use in your pipelines. WelcomeTextGenerator will no longer be accessible.

Only the components present in the most recently uploaded template are available for use.

Workflow

Here's the whole workflow, step-by-step:

  1. Fork the GitHub template we provide.
  2. Use the template to create your custom components.
  3. When your components are ready, zip the template package and upload it to deepset Cloud using the API or the commands.
    The package is validated on upload to check if its structure complies with the template.
  4. Check the upload status using the Get Custom Component [private] endpoint. If the component is uploaded successfully, you can use it in your pipelines.
  5. Use the component in your pipelines just like any other component. Add its name and type to the pipeline YAML.

Components support dependencies and have access to the internet.

Currently, you can't delete custom components. To update custom components, upload a new version.

Template Structure

We offer a template for creating custom components that you can access at GitHub. There are two files in the repo that you'll need to modify:

  • ./dc-custom-component-template/src/dc_custom_component/example_components/<custom_component_folder>/<custom_component>.py: This is where the component code resides. The base path ./dc-custom-component-template/src/dc_custom_component/example_components/ must remain unchanged. You're free to create additional subfolders to organize your custom components within the components folder. You can create separate folders for each component, each containing a separate .py file with the component code or you can use a single folder for all components and one .py file for their code. Whatever works best for you.
  • ./dc-custom-component-template/src/dc_custom_component/__about__.py: This is the component version. Make sure to update it here every time you upload a new version. The version is set for all components in the package.
  • ./dc-custom-component-template/pyproject.toml: If your component has any dependencies, you can add them in this file in the dependencies section.

Versioning

Here's how pipelines use your custom component versions:

  • New pipelines can only use the latest version of your components.
  • Running pipelines use the component version that was the latest at the time when then were deployed. To use a new version of a custom component, undeploy the pipelines using the component and deploy them again.

To compare different versions of a component, upload a package with both versions included in the .py file, similar to how you would upload two components at once. Ensure each version has a unique name. After that, create multiple pipelines, each using a different version of the component, and compare their performance.

Using Components In a Pipeline

Once your component is in deepset Cloud, you can add it to your pipelines using the Code editor. Create or edit your pipeline as you normally would and add your custom component in the components section. The component type is its path in the dc-custom-component-template repository starting from the dc_custom_component directory and separated with periods instead of slashes or backslashes.

For example, if your component is called KeywordBooster and you saved your component in ./dc-custom-component-template/src/dc_custom_component/example_components/rankers/keyword_booster.py, its type is going to be: dc_custom_component.components.rankers.keyword_booster.KeywordBooster.

Component's Structure

Each component is a Python class with the following required elements:

  • The from haystack import component import statement.
  • The @component decorator to indicate you're creating a component.
  • The run() method that defines what the component does.
    • The run() method must have the @component.output_types decorator to define the type of data the component outputs and the name of the outputting edge. The names and types you define as output types must match the keys and value types of the dictionary object the run() method returns. For details, see Examples below.
    • You can specify the input parameters for the run() method.
    • The run() method must return a dictionary with the keys and values you specify.

Optionally, you can:

  • Add the initialization parameters your component requires in the __init__() method. For example, this custom component is initialized with the pydantic_model parameter:

    import json
    import random
    import pydantic
    from pydantic import ValidationError
    from typing import Optional, List
    from colorama import Fore
    from haystack import component
    
    # Define the component input parameters
    @component
    class OutputValidator:
      """
      Validates if a JSON object complies with the provided Pydantic model. If it doesn't, this component
      returns an error message along with the incorrect object.
      """
        def __init__(self, pydantic_model: pydantic.BaseModel):
          """
          Initialize the OutputValidator component.
          
          :param pydantic_model: The Pydantic model the JSON object should comply with.
          """
            self.pydantic_model = pydantic_model
            self.iteration_counter = 0
    
        # Define the component output
        @component.output_types(valid_replies=List[str], invalid_replies=Optional[List[str]], error_message=Optional[str])
        def run(self, replies: List[str]):
          """
          Validate a JSON object.
          
          :param replies: The LLM output that should be validated.
          """
    
            self.iteration_counter += 1
    
            ## Try to parse the LLM's reply ##
            # If the LLM's reply is a valid object, return `"valid_replies"`
            try:
              output_dict = json.loads(replies[0])
              self.pydantic_model.parse_obj(output_dict)
              print(
                Fore.GREEN
                + f"OutputValidator at Iteration {self.iteration_counter}: Valid JSON from LLM - No need for looping: {replies[0]}"
              )
              return {"valid_replies": replies}
    
            # If the LLM's reply is corrupted or not valid, return "invalid_replies" and the "error_message" for LLM to try again
            except (ValueError, ValidationError) as e:
              print(
                Fore.RED
                + f"OutputValidator at Iteration {self.iteration_counter}: Invalid JSON from LLM - Let's try again.\n"
                f"Output from LLM:\n {replies[0]} \n"
                f"Error from OutputValidator: {e}"
              )
              return {"invalid_replies": replies, "error_message": str(e)}
    
    
  • Add docstrings. This is not required, but we recommend adding docstrings to explain the purpose and functionality of your component. These docstrings appear as component tooltips in deepset Studio and also serve as component documentation. To add a docstring, place it under the component name within triple quotes ("""). You can use Markdown formatting:

    @component
    class ComponentName:
      # Add docstrings here like that:
      """
      Description of the component. You can use Markdown formatting.
      """
        
      # If your component takes parameters, add their explanations like this:
      def method_name(self, param1_name: Optional[type] = default_value, param2_name: Required[type] = default_value2):
        """
        Description of what the method does
    
        :param param1_name:
          Description of the parameter. You can use Markdown formatting.
    
        :param param2_name:
          Description of the parameter. You can use Markdown formatting.
        """
    
  • Add other methods your component needs.

Examples

Python Components

This is an example of a very basic component called WelcomeTextGenerator with just the run() method. The component takes name as an input parameter and returns a welcome text with the name, converted to upper case, and a note.

@component # decorator
class WelcomeTextGenerator: # component name
  
    """
    A component generating personal welcome message and making it upper case.

    Example from [Haystack documentation](https://docs.haystack.deepset.ai/docs/custom-components#extended-example).
    """

    @component.output_types(welcome_text=str, note=str) # types of data the component outputs
    def run(self, name: str) -> Dict[str, str]: # parameters for the run() method, the types match the output_types (two strings)
        """
        Generate a welcome message and make it upper case.
        
        :param name: The name of the user to include in the message.
        """
        return {
            "welcome_text": (
                "Hello {name}, welcome to Haystack!".format(name=name)
            ).upper(),
            "note": "welcome message is ready",
        }
   # For `name: Jane` the component will return:
# {'welcome_text': "HELLO JANE, WELCOME TO HAYSTACK!", 'note':"welcome message is ready"}

Here's an example of a component with initialization parameters. The component is called OutputValidator and is designed to validate if the JSON object an LLM generated complies with the provided Pydantic model. It's initialized with a pydantic_model. At runtime, it expects replies, which is the LLM's output to verify. It then returns the valid objects and invalid objects. If there are invalid objects, it also returns an error message:

import json
import pydantic
from pydantic import ValidationError
from typing import Optional, List
from colorama import Fore
from haystack import component

# Define the component input parameters
@component
class OutputValidator:
  """
  Validates if a JSON object complies with the provided Pydantic model. If it doesn't, this component
  returns an error message along with the incorrect object.
  """
    def __init__(self, pydantic_model: pydantic.BaseModel):
      """
      Initialize the OutputValidator component.
      
      :param pydantic_model: The Pydantic model the JSON object should comply with.
      """
        self.pydantic_model = pydantic_model
        self.iteration_counter = 0

    # Define the component output
    @component.output_types(valid_replies=List[str], invalid_replies=Optional[List[str]], error_message=Optional[str])
    def run(self, replies: List[str]):
      """
      Validate a JSON object.
      
      :param replies: The LLM output that should be validated.
      """

        self.iteration_counter += 1

        ## Try to parse the LLM's reply ##
        # If the LLM's reply is a valid object, return `"valid_replies"`
        try:
          output_dict = json.loads(replies[0])
          self.pydantic_model.parse_obj(output_dict)
          print(
            Fore.GREEN
            + f"OutputValidator at Iteration {self.iteration_counter}: Valid JSON from LLM - No need for looping: {replies[0]}"
          )
          return {"valid_replies": replies}

        # If the LLM's reply is corrupted or not valid, return "invalid_replies" and the "error_message" for LLM to try again
        except (ValueError, ValidationError) as e:
          print(
            Fore.RED
            + f"OutputValidator at Iteration {self.iteration_counter}: Invalid JSON from LLM - Let's try again.\n"
            f"Output from LLM:\n {replies[0]} \n"
            f"Error from OutputValidator: {e}"
          )
          return {"invalid_replies": replies, "error_message": str(e)}

For more information and examples, see Haystack resources:

Components In a Pipeline

This is an example of an indexing pipeline that uses a custom component called CharacterSplitter. This component splits text into smaller chunks by the number of characters you specify. It takes the split_length parameter, accepts a list of documents as input, and returns as list of split documents.

components:
...
   custom_component: # this is a custom name for your component, it's up to you
    init_parameters: #here you can set init parameters for your component, if you added any. Otherwise delete init_parameters.
       split_length: 50
    type: dc_custom_component.components.splitters.character_splitter.CharacterSplitter
      # this is the path to your custom component; it reflects the template structure starting from the "src" directory and separated with periods. 
      # If you changed the path, the type must reflect this.
      # This component's path was "./dc-component-template/src/dc_custom_component/components/splitters/character_splitter.py"
   
   pptx_converter:
    type: haystack.components.converters.pptx.PPTXToDocument
    init_parameters: {}
   
   document_embedder:
    type: haystack.components.embedders.sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder
    init_parameters:
      model: "intfloat/e5-base-v2"
      ...
    
    
    connections:
    - receiver: custom_component.documents  # Define how to connect to your component to other components, make sure the input and output types match.
      sender: pptx_converter.documents
    - receiver: document.embedder.documents
      sender: custom_component.documents
    
    inputs:  # Define the inputs for your pipeline
      files:  # These components will receive files as input
      - ...
    
    
    max_loops_allowed: 100
    metadata: {}