The groundedness score is a retrieval-augmented generation (RAG) pipeline metric that measures how well-grounded the generated answers are in the documents. For your RAG pipelines, it's essential that the answers the LLM generates are grounded in your data. This ensures the generated content is based on information you can rely on and verify. It's especially important in apps where accuracy is critical. Users are also more likely to trust a system that consistently provides grounded and accurate information.
You can monitor your RAG pipelines' groundedness score using the Groundedness Observability Dashboard. The score ranges from 0 (poor groundedness) to 1 (very good; all answers are grounded in the data). It's calculated using a cross-encoder model.
- Log in to deepset Cloud and go to Groundedness.
- Choose the pipeline whose groundedness you want to check and you should see the data.
At the top of the dashboard, you can check the overall groundedness score for your pipeline (1). The graph in the Groundedness Score section shows how the score changed over time. Changes in groundedness can happen if the data, the model, or the pipeline is updated. By hovering your mouse over any point on the graph, you can see the average groundedness score for answers at that point in time (2).
You can choose the time range for the data and switch between pipelines. Groundedness score is available only for retrieval augmented generation (RAG) pipelines.
The Documents Referenced section shows you how a document's ranking correlates with its reference frequency. The ranking comes from the pipeline (from the last node that ranks documents, typically a Ranker or a Retriever). Beneath each rank, you can see a percentage representing that document's share of all references.
Understanding these metrics can help in several ways:
- It's an indication of your retriever's performance. If documents with lower ranks are referenced more often, it means the retriever could be improved.
- It's an opportunity to save costs. By identifying and excluding documents that are rarely used as references, you can reduce the number of tokens sent to the model in the prompt. For example, if documents ranked at 4 are not referenced anywhere, you can set the pipeline's
3. This way only documents ranked 1 to 3 are sent in the prompt as the context to generate answers.
(Tip: Modify the
top_kparameter of the node that sends documents to PromptNode. In a RAG pipeline, this is typically the Retriever.)
Updated about 1 month ago