SagemakerGenerator

Generate text using large language models deployed on Amazon Sagemaker.

Basic Information

Type: haystack_integrations.components.generators.amazon_sagemaker.sagemaker.SagemakerGenerator
Components it can connect with:
- PromptBuilder: SagemakerGenerator can receive the prompt for the model from PromptBuilder.
- DeepsetAnswerBuilder: SagemakerGenerator can send the generated replies to DeepsetAnswerBuilder.

Inputs

Parameter	Type	Default	Description
prompt	str		The prompt with instructions for the model.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. These parameters potentially override the parameters passed in pipeline configuration.

Outputs

Parameter	Type	Default	Description
replies	List[str]		A list of strings containing the generated responses.
meta	List[Dict[str, Any]]		A list of dictionaries containing the metadata for each response.

Overview

With SagemakerGenerator you can use LLMs hosted and deployed on a SageMaker Inference Endpoint. For guidance on how to deploy a model to SageMaker, see SageMaker JumpStart foundation models documentation.

You can pass additional text generation parameters for your model using the generation_kwargs parameter. If your model requires custom attributes, pass them as a dictionary in the aws_custom_attributes parameter. For example, Llama2 models must be initiated with `"accept_eula: True".

Authentication

To use this component, connect deepset with Amazon Bedrock first.

Add Workspace-Level Integration

Click your profile icon and choose Settings.
Go to Workspace>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in the current workspace.

Add Organization-Level Integration

Click your profile icon and choose Settings.
Go to Organization>Integrations.
Find the provider you want to connect and click Connect next to them.
Enter the API key and any other required details.
Click Connect. You can use this integration in pipelines and indexes in all workspaces in the current organization.

For detailed explanation, see Use Amazon Bedrock and SageMaker Models.

Usage Example

Initializing the Component

components:
  SagemakerGenerator:
    type: haystack_integrations.components.generators.amazon_sagemaker.sagemaker.SagemakerGenerator
    init_parameters:

Using the Component in a Pipeline

This is a RAG pipeline that uses SagemakerGenerator with a Llama2 model:

components:
  bm25_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return
      fuzziness: 0

  query_embedder:
    type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
    init_parameters:
      normalize_embeddings: true
      model: intfloat/e5-base-v2

  embedding_retriever: # Selects the most similar documents from the document store
    type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
    init_parameters:
      document_store:
        type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
        init_parameters:
          hosts:
          index: 'Standard-Index-English'
          max_chunk_bytes: 104857600
          embedding_dim: 768
          return_embedding: false
          method:
          mappings:
          settings:
          create_index: true
          http_auth:
          use_ssl:
          verify_certs:
          timeout:
      top_k: 20 # The number of results to return

  document_joiner:
    type: haystack.components.joiners.document_joiner.DocumentJoiner
    init_parameters:
      join_mode: concatenate

  ranker:
    type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
    init_parameters:
      model: intfloat/simlm-msmarco-reranker
      top_k: 8

  meta_field_grouping_ranker:
    type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
    init_parameters:
      group_by: file_id
      subgroup_by:
      sort_docs_by: split_id

  prompt_builder:
    type: haystack.components.builders.prompt_builder.PromptBuilder
    init_parameters:
      template: |-
        You are a technical expert.
        You answer questions truthfully based on provided documents.
        If the answer exists in several documents, summarize them.
        Ignore documents that don't contain the answer to the question.
        Only answer based on the documents provided. Don't make things up.
        If no information related to the question can be found in the document, say so.
        Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
        Never name the documents, only enter a number in square brackets as a reference.
        The reference must only refer to the number that comes in square brackets after the document.
        Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.

        These are the documents:
        {%- if documents|length > 0 %}
        {%- for document in documents %}
        Document [{{ loop.index }}] :
        Name of Source File: {{ document.meta.file_name }}
        {{ document.content }}
        {% endfor -%}
        {%- else %}
        No relevant documents found.
        Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
        {% endif %}

        Question: {{ question }}
        Answer:

      required_variables: "*"
  answer_builder:
    type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
    init_parameters:
      reference_pattern: acm

  LangfuseConnector:
    type: haystack_integrations.components.connectors.langfuse.langfuse_connector.LangfuseConnector
    init_parameters:
      name: RAG-QA-Claude-3.5-Sonnet-en
      public: false
      public_key:
        type: env_var
        env_vars:
        - LANGFUSE_PUBLIC_KEY
        strict: false
      secret_key:
        type: env_var
        env_vars:
        - LANGFUSE_SECRET_KEY
        strict: false
      httpx_client:
      span_handler:
  SagemakerGenerator:
    type: haystack_integrations.components.generators.amazon_sagemaker.sagemaker.SagemakerGenerator
    init_parameters:
      model: jumpstart-dft-meta-textgenerationneuron-llama-2-7b
      aws_access_key_id:
        type: env_var
        env_vars:
        - AWS_ACCESS_KEY_ID
        strict: false
      aws_secret_access_key:
        type: env_var
        env_vars:
        - AWS_SECRET_ACCESS_KEY
        strict: false
      aws_session_token:
        type: env_var
        env_vars:
        - AWS_SESSION_TOKEN
        strict: false
      aws_region_name:
        type: env_var
        env_vars:
        - AWS_DEFAULT_REGION
        strict: false
      aws_profile_name:
        type: env_var
        env_vars:
        - AWS_PROFILE
        strict: false
      aws_custom_attributes:
      - accept_eula: true
      generation_kwargs:

connections:  # Defines how the components are connected
- sender: bm25_retriever.documents
  receiver: document_joiner.documents
- sender: query_embedder.embedding
  receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
  receiver: document_joiner.documents
- sender: document_joiner.documents
  receiver: ranker.documents
- sender: ranker.documents
  receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
  receiver: prompt_builder.documents
- sender: meta_field_grouping_ranker.documents
  receiver: answer_builder.documents
- sender: prompt_builder.prompt
  receiver: answer_builder.prompt
- sender: prompt_builder.prompt
  receiver: SagemakerGenerator.prompt
- sender: SagemakerGenerator.replies
  receiver: answer_builder.replies

inputs:  # Define the inputs for your pipeline
  query:  # These components will receive the query as input
  - "bm25_retriever.query"
  - "query_embedder.text"
  - "ranker.query"
  - "prompt_builder.question"
  - "answer_builder.query"

  filters:  # These components will receive a potential query filter as input
  - "bm25_retriever.filters"
  - "embedding_retriever.filters"

outputs:  # Defines the output of your pipeline
  documents: "meta_field_grouping_ranker.documents"  # The output of the pipeline is the retrieved documents
  answers: "answer_builder.answers"  # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

Parameter	Type	Default	Description
aws_access_key_id	Optional[Secret]	Secret.from_env_var(['AWS_ACCESS_KEY_ID'], strict=False)	The `Secret` for AWS access key ID.
aws_secret_access_key	Optional[Secret]	Secret.from_env_var(['AWS_SECRET_ACCESS_KEY'], strict=False)	The `Secret` for AWS secret access key.
aws_session_token	Optional[Secret]	Secret.from_env_var(['AWS_SESSION_TOKEN'], strict=False)	The `Secret` for AWS session token.
aws_region_name	Optional[Secret]	Secret.from_env_var(['AWS_DEFAULT_REGION'], strict=False)	The `Secret` for AWS region name. If not provided, the default region will be used.
aws_profile_name	Optional[Secret]	Secret.from_env_var(['AWS_PROFILE'], strict=False)	The `Secret` for AWS profile name. If not provided, the default profile will be used.
model	str		The name for SageMaker Model Endpoint.
aws_custom_attributes	Optional[Dict[str, Any]]	None	Custom attributes to be passed to SageMaker, for example `{"accept_eula": True}` in case of Llama-2 models.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. For a list of supported parameters see your model's documentation page, for example here for HuggingFace models: https://huggingface.co/blog/sagemaker-huggingface-llm#4-run-inference-and-chat-with-our-model Specifically, Llama-2 models support the following inference payload parameters: - `max_new_tokens`: Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer. - `temperature`: Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature=0`, it results in greedy decoding. If specified, it must be a positive float. - `top_p`: In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1. - `return_full_text`: If `True`, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is `False`.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

Parameter	Type	Default	Description
prompt	str		The string prompt to use for text generation.
generation_kwargs	Optional[Dict[str, Any]]	None	Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the `__init__` method.

Was this page helpful?

Basic Information​

Inputs​

Outputs​

Overview​

Authentication​

Add Workspace-Level Integration​

Add Organization-Level Integration​

Usage Example​

Initializing the Component​

Using the Component in a Pipeline​

Parameters​

Init Parameters​

Run Method Parameters​