deepset Cloud and Haystack
Haystack is deepset's open source Python framework for creating production-ready AI-based systems. It's also the underlying technology for deepset Cloud.
deepset Cloud uses Haystack's components, pipelines, and data classes under the hood. Let's explain what this means and how you can use it in your day-to-day work.
If you're not familiar with Haystack, check the following resources:
Pipelines
deepset Cloud uses Haystack pipelines, giving you the full flexibility of Haystack features, including loops and multiple branches in your workflows.
Limitations
- File types: In deepset Cloud, the file types you can upload for your pipeline are limited. For information about currently supported file types, see Upload Files.
- Initial and final components: The components you can use at the start and end of indexing and query pipelines are restricted:
- Indexing pipelines must start with
FilesInput
, which acts as a placeholder for the files you upload to deepset Cloud. There are no restrictions on the final component, though it is typicallyDocumentWriter
. - Query pipelines must start with
Query
and end withOutput
.Filters
is an optional input component and the only other available one for query pipelines.
- Indexing pipelines must start with
For more information, see Pipelines.
Components
deepset Cloud combines Haystack components with its own unique components. Components specific to deepset Cloud have names that start with Deepset
. We introduced these custom components to support functionality tailored specifically for deepset Cloud.
Some components may appear to be duplicates, for example AnswerBuilder
and DeepsetAnswerBuilder
. Typically, these "duplicates" extend the original component by adding functionality needed for deepset Cloud. For instance, we created DeepsetAnswerBuilder
to ensure document references display correctly in deepset Cloud's interface. This is enabled by an additional parameter, reference_pattern
, which you can configure in DeepsetAnswerBuilder
.
The list of components is constantly growing giving you more possibilities. Additionally, you can create your own custom components. For more information, see Custom Components.
Data Classes
deepset Cloud uses Haystack's objects, or data classes namely:
Answer
, includingExtractedAnswer
andGeneratedAnswer
ByteStream
ChatMessage
Document
StreamingChunk
Each of these objects has properties you can access. For detailed description of the data classes and their properties, see Haystack's Data Classes.
Data Classes in Jinja2 Templates
It's useful to know the objects you can use and their attributes when working with components that use Jinja2 templates: PromptBuilder
, OutputAdapter
, and ConditionalRouter
. You can access an object's attribute in the template, for example:
{% for document in documents %}
Document[{{ loop.index }}]:
{{ document.content }}
{% endfor %}
This expression iterates through all documents to fetch their content. This is achieved by accessing the content
attribute of the document
data class using document.content
.
You can also use the document
's meta
attribute to work with metadata. The syntax for this is: document.meta.metadata_key
. For more examples, see Use Metadata in Your Search System.
The same approach applies to other data classes and their attributes.
Data Classes as Output and Input Variables
Additionally, certain components use or output data classes, for example AnswerBuilders
output a list of GeneratedAnswer
objects, or Rankers
accept a list of document
objects. It's good to be aware of that when connecting components to make sure the connections are compatible. You can check the details on each component documentation page. For details, see Pipeline Components.
Updated 12 days ago