About This Task

Langfuse is a powerful tool for observability and tracing in complex AI workflows. It captures detailed traces of your pipeline operations, making it easier to debug, monitor, and improve your applications.

The deepset AI Platform integrates with Langfuse using the LangfuseConnector. To get started, connect the platform to Langfuse with your API key. Then, simply add the LangfuseConnector to your pipelines to start collecting and sending traces to Langfuse.

Prerequisites

You need the Langfuse public and secret API keys. You can obtain them from your Langfuse project settings. You must create new keys as the secret key can only be viewed and copied once, during creation.

Use Langfuse

Connect deepset AI Platform to Langfuse by creating two Langfuse secrets:
1. In deepset AI Platform, click your profile icon in the top right corner and choose Secrets.
2. Click Add New Secret.
3. Copy the secret key from your Langfuse project and paste it into the Secret field.
4. Type LANGFUSE_SECRET_KEY as the secret name and save the secret.
5. Click Add New Secret.
6. Copy the public key from your Langfuse project and paste it into the Secret field.
7. Type LANGFUSE_PUBLIC_KEY as the secret name and save the secret.
Add the LangfuseConnector component to the pipeline you want to trace but do not connect it to any other component.
Set the name parameter in LangfuseConnector to your pipeline name and save the pipeline.

When you run a query with this pipeline, its traces will appear in your Langfuse project under Traces.

Example

This is an example of a RAG pipeline with Langfuse tracing enabled. LangfuseConnector is in the pipeline but it's not connected to any other component:

YAML configuration


components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return
      fuzziness: 0

  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        You are a technical expert.
        You answer questions truthfully based on provided documents.
        If the answer exists in several documents, summarize them.
        Ignore documents that don't contain the answer to the question.
        Only answer based on the documents provided. Don't make things up.
        If no information related to the question can be found in the document, say so.
        Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
        Never name the documents, only enter a number in square brackets as a reference.
        The reference must only refer to the number that comes in square brackets after the document.
        Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.

        These are the documents:
        {%- if documents|length > 0 %}
        {%- for document in documents %}
        Document [{{ loop.index }}] :
        Name of Source File: {{ document.meta.file_name }}
        {{ document.content }}
        {% endfor -%}
        {%- else %}
        No relevant documents found.
        Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
        {% endif %}

        Question: {{ question }}
        Answer:

      required_variables: "*"
  llm:
    type: deepset_cloud_custom_nodes.generators.deepset_amazon_bedrock_generator.DeepsetAmazonBedrockGenerator
    init_parameters:
      model: anthropic.claude-3-5-sonnet-20241022-v2:0
      aws_region_name: us-west-2
      max_length: 650
      temperature: 0

  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

  LangfuseConnector:
    type: haystack_integrations.components.connectors.langfuse.langfuse_connector.LangfuseConnector
    init_parameters:
      name: RAG-QA-Claude-3.5-Sonnet-en
      public: false
      public_key:
        type: env_var
        env_vars:
        - LANGFUSE_PUBLIC_KEY
        strict: false
      secret_key:
        type: env_var
        env_vars:
        - LANGFUSE_SECRET_KEY
        strict: false
      httpx_client:
      span_handler:

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: prompt_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: llm.prompt
- sender: prompt_builder.prompt
  receiver: answer_builder.prompt
- sender: llm.replies
  receiver: answer_builder.replies

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "prompt_builder.question"
  - "answer_builder.query"

  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "meta_field_grouping_ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"  # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}