deepset Cloud and Haystack

Haystack is deepset's open source Python framework for creating production-ready AI-based systems. It's also the underlying technology for deepset Cloud.

deepset Cloud uses Haystack's components, pipelines, and data classes under the hood. Let's explain what this means and how you can use it in your day-to-day work.

If you're not familiar with Haystack, check the following resources:

Pipelines

deepset Cloud uses Haystack pipelines, giving you the full flexibility of Haystack features, including loops and multiple branches in your workflows.

Limitations

  • File types: In deepset Cloud, the file types you can upload for your pipeline are limited. For information about currently supported file types, see Upload Files.
  • Initial and final components: The components you can use at the start and end of indexing and query pipelines are restricted:
    • Indexing pipelines must start with FilesInput, which acts as a placeholder for the files you upload to deepset Cloud. There are no restrictions on the final component, though it is typically DocumentWriter.
    • Query pipelines must start with Query and end with Output. Filters is an optional input component and the only other available one for query pipelines.

For more information, see Pipelines.

Components

deepset Cloud combines Haystack components with its own unique components. Components specific to deepset Cloud have names that start with Deepset. We introduced these custom components to support functionality tailored specifically for deepset Cloud.

Some components may appear to be duplicates, for example AnswerBuilder and DeepsetAnswerBuilder. Typically, these "duplicates" extend the original component by adding functionality needed for deepset Cloud. For instance, we created DeepsetAnswerBuilder to ensure document references display correctly in deepset Cloud's interface. This is enabled by an additional parameter, reference_pattern, which you can configure in DeepsetAnswerBuilder.

The list of components is constantly growing giving you more possibilities. Additionally, you can create your own custom components. For more information, see Custom Components.

Data Classes

deepset Cloud uses Haystack's objects, or data classes namely:

  • Answer, including ExtractedAnswer and GeneratedAnswer
  • ByteStream
  • ChatMessage
  • Document
  • StreamingChunk

Each of these objects has properties you can access. For detailed description of the data classes and their properties, see Haystack's Data Classes.

Data Classes in Jinja2 Templates

It's useful to know the objects you can use and their attributes when working with components that use Jinja2 templates: PromptBuilder, OutputAdapter, and ConditionalRouter. You can access an object's attribute in the template, for example:

{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}

This expression iterates through all documents to fetch their content. This is achieved by accessing the content attribute of the document data class using document.content.

You can also use the document's meta attribute to work with metadata. The syntax for this is: document.meta.metadata_key. For more examples, see Use Metadata in Your Search System.

The same approach applies to other data classes and their attributes.

Data Classes as Output and Input Variables

Additionally, certain components use or output data classes, for example AnswerBuilders output a list of GeneratedAnswer objects, or Rankers accept a list of document objects. It's good to be aware of that when connecting components to make sure the connections are compatible. You can check the details on each component documentation page. For details, see Pipeline Components.