Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

Monitor Pipeline Performance

If you're wondering how many requests you can send from your search system to Haystack Enterprise Platform or what speed you can expect, here are your answers. Use built-in metrics, logs, and KPI dashboards to monitor your pipeline performance.


Scaling

Unexpected surges in traffic are a challenge, but Haystack Enterprise Platform seamlessly handles scalability in response to increased demand. Haystack Enterprise Platform uses autoscaling, which automatically adjusts the infrastructure based on the usage. This ensures there are no disruptions in service, regardless of the number of concurrent requests. The system dynamically reallocates resources to maintain optimal performance. Scaling is entirely automated and doesn’t require any manual adjustments from you.

Speed

Several factors influence the speed of your search system:

  • Model size
    Pipelines with large models are slower. With LLMs, it’s often a tradeoff between speed and performance. Large models, like GPT-5, may give better answers than smaller models, but they’re also much slower because of their size.
  • Where your model runs
    You can run models locally, on your machine, or remotely, through API. Small models used with Retrievers or Readers are faster to run locally because they don’t need much power. Large models, like ChatGPT or GPT-4, need dedicated and optimized hardware, and running them remotely is usually faster.
  • Pipeline configuration
    Some components or their settings can slow your pipeline. For example, a Ranker improves the results but also slows the system down. The same goes for a Reader with a high top_k value. It’s often about finding the balance between speed and performance.
  • The length of generated responses
    For generative QA pipelines, the number of tokens you want the model to generate as an answer influences your system's speed. The longer the answer, the slower the system.

Optimizing the speed of your pipeline is all about finding the balance among these factors. It involves choosing the right-sized model, running it where it’s fastest, and finding a configuration with optimal performance.

If your pipeline contains components that work better on a GPU, turn on GPU acceleration. For details, see Enable GPU Acceleration for Pipelines.

Pipeline Details Dashboard

The Haystack Enterprise Platform dashboard gives you basic information about your pipeline, such as the average response time or the number of searches ran. To check what queries were asked, the top answers, and how long it took to find them, click the pipeline's name on the Pipelines page. This brings you to Pipeline Details, where you can view all the information about your pipeline.

Overview

The Pipeline Overview shows a KPI dashboard with key performance metrics in an easy-to-read format:

  • Total queries: Total number of queries your pipeline processed. This metric helps you understand usage patterns and traffic volume.
  • Documents: Total number of documents indexed for this pipeline.
  • Feedback coverage: Percentage of queries that received user feedback. Higher feedback coverage gives you better insights into pipeline performance. If this shows "N/A", it means no feedback was given yet.
  • Average response time: Average time it took to generate a response. Monitor this metric to identify performance issues or improvements after configuration changes.
  • Minimum inference time: The fastest response time recorded across all queries. This helps you understand your pipeline's best case performance.
  • Maximum inference time: The slowest response time recorded across all queries. Use this metric to identify potential performance bottlenecks or outliers.
  • Query volume: Number of queries your pipeline processed over the last 7 days.
  • Feedback distribution: Distribution of feedback received for the pipeline's responses.
  • Query Activity Heatmap: Shows when your pipeline is most active by displaying query volume across hours and days. This helps you understand usage patterns, identify peak activity period, or plan maintenance windows.

Search History

For detailed performance analysis, use the Search History tab to review individual queries and responses. You can customize the table to show different columns, you can also view the conversation history for each query.

Use this information to identify slow queries, improve answer quality, and understand user needs. Download the search history as a CSV file for further analysis.

Logs

Check pipeline logs to see what happened since it was deployed. You can view the logs on the Pipeline Details page (just click the pipeline name to get there). Expand messages for more details and possible actions.

Traces

Haystack Platform supports tracing with Langfuse and Weights & Biases' Weave. To learn how to set up tracing, see Trace Your Pipelines.