Basic Concepts

Answer

An answer object contains all the information about the prediction the Reader model made. It contains the answer string, the model's confidence score, the context around the answer and the document ID and metadata. It also contains the start and end offsets of the answer string relative to the full document text and the context window.

Document

Refers to an individual piece of text stored in the document store. Multiple documents may originally come from one file.

Document Store

A component that stores the text documents, their metadata, and (optionally) embeddings.

Evaluation Dataset

Also referred to as "eval set," an annotated set of data held back from your model. An evaluation dataset is required for experiments. It contains the gold answers against which deepset Cloud evaluates the actual answers that your pipeline returns during an experiment run.

For RAG pipelines to evaluate groundendess and no answer score, evaluation datasets can contain just queries, without gold answers.

For more information, see Evaluation Datasets.

Experiment Run

A single run of an experiment to evaluate your pipeline. When you create and start an experiment, it runs all the questions from the evaluation dataset through the pipeline and compares them to gold answers. After it finishes, it calculates the metrics you can use to tweak your pipeline. To learn more, see About Experiments.

File

Refers to the raw file you upload to deepset Cloud (for example, a PDF). When an indexing pipeline runs, files get converted, cleaned, and split into documents, which contain the actual text and are then used for finding the best answer to a query.

Groundedness Observability

A dashboard where you can check the groundedness score of your RAG pipelines. This score tells you how grounded in your documents the LLM's answers are.

Indexing

It refers to a process of preprocessing your files, turning them into Documents, and then storing those Documents in the DocumentStore. Indexing happens after you deploy a pipeline. The exact indexing steps are defined in the indexing pipeline (for example, the size of the Documents resulting from a File).

Node

A pipeline component. Nodes are the processing steps in a pipeline. They act like building blocks that you can mix and match or replace.

Organization

An entity that manages workspaces. When you invite users to an organization, they automatically gain access to all the workspaces within that organization.

Pipeline

Pipelines define the processing steps for executing a query and indexing your files. These steps are pipeline nodes. Nodes in pipelines are connected in series so that the output of one node is used by the next node. You can mix and match the nodes in a pipeline.

Prompt Studio

A sandbox environment for testing out prompts. It comes with a library of curated prompts you can use.

Retrieval Augmented Generation (RAG)

A pipeline that passes your documents to a large language model in a prompt to make sure it generates answers based on these documents. This approach makes the system more reliable and less prone to hallucinations.

Workspace

In deepset Cloud, you work in workspaces, where you upload your data and create and maintain your pipelines. Data and pipelines are not shared across workspaces.

All workspaces belong to an organization. When you invite people to your organization, they automatically receive access to all workspaces within this organization. You cannot limit access to a workspace.