Skip to main content
For the complete documentation index for agents and LLMs, see llms.txt.

SagemakerGenerator

Generate text using large language models deployed on Amazon Sagemaker.

Key Features

  • Connects to LLMs hosted on a SageMaker Inference Endpoint.
  • Accepts string prompts and returns string replies.
  • Supports configurable generation parameters via generation_kwargs.
  • Supports custom attributes for models that require special initialization, such as Llama-2 models that require accept_eula: True.
  • Requires AWS credentials set via environment variables.

Configuration

  1. Drag the SagemakerGenerator component onto the canvas from the Component Library.
  2. Click on the component to open the configuration panel.
  3. On the General tab:
    • Set the model parameter to the SageMaker Model Endpoint name.
    • Connect the platform with Amazon Bedrock first.

For detailed explanation, see Use Amazon Bedrock and SageMaker Models.

:::

  1. Drag the SagemakerGenerator component onto the canvas from the Component Library.
  2. Click the component to open the configuration panel.
  3. On the General tab:
    1. Enter the SageMaker Model Endpoint name, such as jumpstart-dft-meta-textgenerationneuron-llama-2-7b.
  4. Go to the Advanced tab to configure AWS credentials, custom attributes, and generation kwargs.

Connections

SagemakerGenerator receives a prompt string from PromptBuilder. It outputs replies (a list of generated strings) and meta (response metadata). Connect its replies output to AnswerBuilder or DeepsetAnswerBuilder.

  1. Go to the Advanced tab to configure generation_kwargs, aws_custom_attributes, and AWS credential parameters.

Source Code

To check this component's source code, open sagemaker.py in the Haystack Core Integrations repository.

Connections

SagemakerGenerator accepts a prompt string as input. Connect its prompt input to the prompt output of PromptBuilder.

It outputs replies as a list of strings and meta as a list of metadata dictionaries. Connect its replies output to DeepsetAnswerBuilder.

Usage Examples

Basic Configuration

  SagemakerGenerator:
type: haystack_integrations.components.generators.amazon_sagemaker.sagemaker.SagemakerGenerator
init_parameters:
model: jumpstart-dft-meta-textgenerationneuron-llama-2-7b
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
aws_custom_attributes:
- accept_eula: true

This is a RAG pipeline that uses SagemakerGenerator with a Llama2 model:

components:
bm25_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.bm25_retriever.OpenSearchBM25Retriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return
fuzziness: 0

query_embedder:
type: deepset_cloud_custom_nodes.embedders.nvidia.text_embedder.DeepsetNvidiaTextEmbedder
init_parameters:
normalize_embeddings: true
model: intfloat/e5-base-v2

embedding_retriever: # Selects the most similar documents from the document store
type: haystack_integrations.components.retrievers.opensearch.embedding_retriever.OpenSearchEmbeddingRetriever
init_parameters:
document_store:
type: haystack_integrations.document_stores.opensearch.document_store.OpenSearchDocumentStore
init_parameters:
hosts:
index: 'Standard-Index-English'
max_chunk_bytes: 104857600
embedding_dim: 768
return_embedding: false
method:
mappings:
settings:
create_index: true
http_auth:
use_ssl:
verify_certs:
timeout:
top_k: 20 # The number of results to return

document_joiner:
type: haystack.components.joiners.document_joiner.DocumentJoiner
init_parameters:
join_mode: concatenate

ranker:
type: deepset_cloud_custom_nodes.rankers.nvidia.ranker.DeepsetNvidiaRanker
init_parameters:
model: intfloat/simlm-msmarco-reranker
top_k: 8

meta_field_grouping_ranker:
type: haystack.components.rankers.meta_field_grouping_ranker.MetaFieldGroupingRanker
init_parameters:
group_by: file_id
subgroup_by:
sort_docs_by: split_id

prompt_builder:
type: haystack.components.builders.prompt_builder.PromptBuilder
init_parameters:
template: |-
You are a technical expert.
You answer questions truthfully based on provided documents.
If the answer exists in several documents, summarize them.
Ignore documents that don't contain the answer to the question.
Only answer based on the documents provided. Don't make things up.
If no information related to the question can be found in the document, say so.
Always use references in the form [NUMBER OF DOCUMENT] when using information from a document, e.g. [3] for Document [3] .
Never name the documents, only enter a number in square brackets as a reference.
The reference must only refer to the number that comes in square brackets after the document.
Otherwise, do not use brackets in your answer and reference ONLY the number of the document without mentioning the word document.

These are the documents:
{%- if documents|length > 0 %}
{%- for document in documents %}
Document [{{ loop.index }}] :
Name of Source File: {{ document.meta.file_name }}
{{ document.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
Respond with "Sorry, no matching documents were found, please adjust the filters or try a different question."
{% endif %}

Question: {{ question }}
Answer:

required_variables: "*"
answer_builder:
type: deepset_cloud_custom_nodes.augmenters.deepset_answer_builder.DeepsetAnswerBuilder
init_parameters:
reference_pattern: acm

LangfuseConnector:
type: haystack_integrations.components.connectors.langfuse.langfuse_connector.LangfuseConnector
init_parameters:
name: RAG-QA-Claude-3.5-Sonnet-en
public: false
public_key:
type: env_var
env_vars:
- LANGFUSE_PUBLIC_KEY
strict: false
secret_key:
type: env_var
env_vars:
- LANGFUSE_SECRET_KEY
strict: false
httpx_client:
span_handler:
SagemakerGenerator:
type: haystack_integrations.components.generators.amazon_sagemaker.sagemaker.SagemakerGenerator
init_parameters:
model: jumpstart-dft-meta-textgenerationneuron-llama-2-7b
aws_access_key_id:
type: env_var
env_vars:
- AWS_ACCESS_KEY_ID
strict: false
aws_secret_access_key:
type: env_var
env_vars:
- AWS_SECRET_ACCESS_KEY
strict: false
aws_session_token:
type: env_var
env_vars:
- AWS_SESSION_TOKEN
strict: false
aws_region_name:
type: env_var
env_vars:
- AWS_DEFAULT_REGION
strict: false
aws_profile_name:
type: env_var
env_vars:
- AWS_PROFILE
strict: false
aws_custom_attributes:
- accept_eula: true
generation_kwargs:

connections: # Defines how the components are connected
- sender: bm25_retriever.documents
receiver: document_joiner.documents
- sender: query_embedder.embedding
receiver: embedding_retriever.query_embedding
- sender: embedding_retriever.documents
receiver: document_joiner.documents
- sender: document_joiner.documents
receiver: ranker.documents
- sender: ranker.documents
receiver: meta_field_grouping_ranker.documents
- sender: meta_field_grouping_ranker.documents
receiver: prompt_builder.documents
- sender: meta_field_grouping_ranker.documents
receiver: answer_builder.documents
- sender: prompt_builder.prompt
receiver: answer_builder.prompt
- sender: prompt_builder.prompt
receiver: SagemakerGenerator.prompt
- sender: SagemakerGenerator.replies
receiver: answer_builder.replies

inputs: # Define the inputs for your pipeline
query: # These components will receive the query as input
- "bm25_retriever.query"
- "query_embedder.text"
- "ranker.query"
- "prompt_builder.question"
- "answer_builder.query"

filters: # These components will receive a potential query filter as input
- "bm25_retriever.filters"
- "embedding_retriever.filters"

outputs: # Defines the output of your pipeline
documents: "meta_field_grouping_ranker.documents" # The output of the pipeline is the retrieved documents
answers: "answer_builder.answers" # The output of the pipeline is the generated answers

max_runs_per_component: 100

metadata: {}

Parameters

Inputs

ParameterTypeDefaultDescription
promptstrThe prompt with instructions for the model.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation. These parameters potentially override the parameters passed in pipeline configuration.

Outputs

ParameterTypeDefaultDescription
repliesList[str]A list of strings containing the generated responses.
metaList[Dict[str, Any]]A list of dictionaries containing the metadata for each response.

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
aws_access_key_idOptional[Secret]Secret.from_env_var(['AWS_ACCESS_KEY_ID'], strict=False)The Secret for AWS access key ID.
aws_secret_access_keyOptional[Secret]Secret.from_env_var(['AWS_SECRET_ACCESS_KEY'], strict=False)The Secret for AWS secret access key.
aws_session_tokenOptional[Secret]Secret.from_env_var(['AWS_SESSION_TOKEN'], strict=False)The Secret for AWS session token.
aws_region_nameOptional[Secret]Secret.from_env_var(['AWS_DEFAULT_REGION'], strict=False)The Secret for AWS region name. If not provided, the default region will be used.
aws_profile_nameOptional[Secret]Secret.from_env_var(['AWS_PROFILE'], strict=False)The Secret for AWS profile name. If not provided, the default profile will be used.
modelstrThe name for SageMaker Model Endpoint.
aws_custom_attributesOptional[Dict[str, Any]]NoneCustom attributes to be passed to SageMaker, for example {"accept_eula": True} in case of Llama-2 models.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation. For a list of supported parameters, see your model's documentation page.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
promptstrThe string prompt to use for text generation.
generation_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method.