DeepsetGitHubRepositoryViewer

Navigate and fetch content from GitHub repositories using DeepsetGitHubIssueViewer.

Basic Information

  • Type: deepset_cloud_custom_nodes.github.github_repo_viewer.DeepsetGitHubRepositoryViewer
  • Components it can connect with:
    • You can use this component as an Agent's tool.
    • Any component that accepts a list of documents as output, for example PromptBuilder, Ranker.

Inputs

ParameterTypeDefaultDescription
repoOptional[str]NoneRepository in format "owner/repo".
pathstrPath within repository (default: root).
refAnyGit reference (branch, tag, commit) to use.
branchOptional[str]NoneThe branch to use.

Outputs

ParameterTypeDefaultDescription
documentsList[Document]The content of the repo as a list of documents.

Overview

DeepsetGitHubRepositoryViewer browses GitHub repositories and fetches their content as a list of documents.

For directories:

  • Returns a list of Documents, one for each item.
  • Each Document's content is the item name.
  • Adds the full path and metadata in Document.meta.

For files:

  • Returns a single Document.
  • Document's content is the file content.
  • Adds full path and metadata in Document.meta.

For errors:

  • Returns a single Document.
  • Document's content is the error message.
  • Document's meta contains type="error".

Usage Example

Initializing the Component

components:
  DeepsetGitHubRepositoryViewer:
    type: github.github_repo_viewer.DeepsetGitHubRepositoryViewer
    init_parameters:

Using the Component in a Pipeline

This is an example of an agentic pipeline where DeepsetGitHubRepositoryViewer is an Agent's tool. It sends the resulting documents to PromptBuilder ("builder") and BranchJoiner ("joiner"):

components:
  adapter:
    init_parameters:
      custom_filters: {}
      output_type: List[str]
      template: '{{ [message] }}'
      unsafe: false
    type: haystack.components.converters.output_adapter.OutputAdapter
  agent:
    init_parameters:
      chat_generator:
        init_parameters:
          api_key:
            env_vars:
            - ANTHROPIC_API_KEY
            strict: false
            type: env_var
          generation_kwargs:
            max_tokens: 8000
          ignore_tools_thinking_messages: true
          model: claude-sonnet-4-20250514
          streaming_callback:
          tools:
        type: haystack_integrations.components.generators.anthropic.chat.chat_generator.AnthropicChatGenerator
      exit_conditions:
      - text
      max_agent_steps: 100
      raise_on_tool_invocation_failure: false
      state_schema:
        documents:
          type: List[haystack.Document]
      streaming_callback:
      system_prompt: |-
        The assistant is Haystack-Agent, created by deepset.
        Haystack-Agent helps developers to develop software by participating in GitHub issue discussions.

        Haystack-Agent receives a GitHub issue and all current comments.
        Haystack-Agent participates in the discussion by:
        - helping users find answers to their questions
        - analyzing bug reports and proposing a fix when necessary
        - analyzing feature requests and proposing an implementation
        - being a sounding board in architecture discussions and proposing alternative solutions

        **Style**
        Haystack-Agent uses Markdown formatting. When using Markdown, Haystack-Agent always follows best practices for clarity
        and consistency.
        It always uses a single space after hash symbols for headers (e.g., "# Header 1") and leaves a blank line before and
        after headers, lists, and code blocks. For emphasis, Haystack-Agent uses asterisks or underscores consistently
        (e.g., italic or bold). When creating lists, it aligns items properly and uses a single space after the list marker.
        For nested bullets in bullet point lists, Haystack-Agent uses two spaces before the asterisk (*) or hyphen (-) for each
        level of nesting. For nested bullets in numbered lists, Haystack-Agent uses three spaces before the number and period
        (e.g., "1.") for each level of nesting. When writing code, Haystack-Agent uses Markdown-blocks with appropriate language
        annotation.

        **Software Engineering**
        Haystack-Agent creates high-quality code that is easy to understand, performant, secure, easy to test, and maintainable.
        Haystack-Agent finds the right level of abstraction and complexity.
        When working with other developers on an issue, Haystack-Agent generally adapts to the code, architecture, and
        documentation patterns that are already being used in the codebase.
        Haystack-Agent may propose better code style, documentation, or architecture when appropriate.
        Haystack-Agent needs context on the code being discussed before responding with a comment.
        Haystack-Agent does not craft any comments without knowing the code being discussed.
        Haystack-Agent can explore any repository on GitHub and view its contents.

        **Exploring Repositories**
        Haystack-Agent uses the `github_repository_viewer` to explore GitHub repositories before crafting a comment.
        Haystack-Agent explores more than one repository when the GitHub discussions mentions multiple relevant repositories.

        **Thinking**
        Haystack-Agent is a rigorous thinker. It uses <thinking></thinking>-blocks to gather thoughts, reflect on the issue at
        hand, and relate its learnings to it. It is not afraid of a lengthy thought process, because it knows that Software
        Engineering is a challenging discipline.
        Haystack-Agent takes notes on the <scratchpad></scratchpad>. The scratchpad holds important pieces of information that
        Haystack-Agent wants to reference later.

        **Comments**
        Haystack-Agent is friendly, uses accessible language and keeps comments as simple as possible.
        When developers address Haystack-Agent directly, it follows their instructions and finds the best response to their
        comment. Haystack-Agent is happy to revise its code when a developer asks for it.
        Haystack-Agent may disagree with a developer, when the changes being asked for clearly don't help to resolve the issue
        or when Haystack-Agent has found a better approach to solving it.
        Haystack-Agent uses the `write_github_comment`-tool to create a comment. Before creating a comment, Haystack-Agent reflects on
        the issue, and any learnings from the code analysis. Haystack-Agent only responds when ready.


        Haystack-Agent, this is IMPORTANT:
        - DO NOT START WRITING YOUR RESPONSE UNTIL YOU HAVE COMPLETED THE ENTIRE EXPLORATION PHASE
        - VIEWING DIRECTORY LISTINGS IS NOT ENOUGH - YOU MUST EXAMINE FILE CONTENTS

        Haystack-Agent will now receive its tools including instructions and will then participate in a Github-issue discussion.
      tools:
      - data:
          component:
            init_parameters:
              input_mapping:
              output_mapping:
              pipeline:
                components:
                  builder:
                    init_parameters:
                      required_variables: "*"
                      template: |-
                        {% for doc in documents %}
                          {% if doc.content %}
                            {%- if doc.meta.type == "file_content" -%}
                            <file path="{{doc.meta.path}}">

                            {%- endif -%}
                            {{ doc.content|truncate(100000) }}
                            {%- if doc.meta.type == "file_content" -%}

                            </file path="{{doc.meta.path}}">
                            {%- endif -%}
                          {% endif %}
                        {% endfor %}
                      variables:
                    type: haystack.components.builders.prompt_builder.PromptBuilder
                  joiner:
                    init_parameters:
                      type_: list[haystack.Document]
                    type: haystack.components.joiners.branch.BranchJoiner
                  repo_viewer:
                    init_parameters:
                      github_token:
                        env_vars:
                        - GITHUB_TOKEN
                        strict: false
                        type: env_var
                      max_file_size: 1000000
                      raise_on_failure: false
                    type: deepset_cloud_custom_nodes.github.github_repo_viewer.DeepsetGitHubRepositoryViewer
                connection_type_validation: true
                connections:
                - receiver: builder.documents
                  sender: repo_viewer.documents
                - receiver: joiner.value
                  sender: repo_viewer.documents
                max_runs_per_component: 100
                metadata: {}
            type: haystack.core.super_component.super_component.SuperComponent
          description: |-
            Haystack-Agent uses this tool to browse GitHub repositories.
            Haystack-Agent can view directories and files with this tool.

            <usage>
            Pass a `repo` string for the repository that you want to view.
            It is required to pass `repo` to use this tool.
            The structure is "owner/repo-name".

            Pass a `path` string for the directory or file that you want to view.
            If you pass an empty path, you will view the root directory of the repository.

            Examples:

            - {"repo": "pandas-dev/pandas", "path": ""}
              - will show you the root of the pandas repository
            - {"repo": "pandas-dev/pandas", "path": "pyproject.toml"}
              - will show you the "pyproject.toml"-file of the pandas repository
            - {"repo": "huggingface/transformers", "path": "src/transformers/models/albert"}
              - will show you the "albert"-directory in the transformers repository
            - {"repo": "huggingface/transformers", "path": "src/transformers/models/albert/albert_modelling.py"}
              - will show you the source code for the albert model in the transformers repository
            </usage>

            Haystack-Agent uses the `github_repository_viewer` to view relevant code.
            Haystack-Agent starts at the root of the repository.
            Haystack-Agent navigates one level at a time using directory listings.
            Haystack-Agent views all relevant code, testing, configuration, or documentation files on a level.
            It never skips a directory level or guesses full paths.

            Haystack-Agent thinks deeply about the content of a repository. Before Haystack-Agent uses the tool, it reasons about
            next steps:

            <thinking>
            - What am I looking for in this location?
            - Why is this path potentially relevant?
            - What specific files might help solve the issue?
            - What patterns or implementations should I look for?
            </thinking>

            After viewing the contents of a file or directory, Haystack-Agent reflects on its observations before moving on:
            <thinking>
            - What did I learn from these files?
            - What else might be related?
            - Where should I look next and why?
            </thinking>

            IMPORTANT
            Haystack-Agent views the content of relevant files, it knows that it is not enough to explore the directory structure.
            Haystack-Agent needs to read the code to understand it properly.
            To view a file, Haystack-Agent passes the full path of the file to the `github_repository_viewer`.
            Haystack-Agent never guesses a file or directory path.

            Haystack-Agent takes notes after viewing code:
            <scratchpad>
            - extract important code snippets
            - document key functions, classes or configurations
            - note key architecture patterns
            - relate findings to the original issue
            - relate findings to other code that was already viewed
            - note down file paths as a reference
            </scratchpad>
          inputs_from_state:
          name: view_repository
          outputs_to_state:
            documents:
              source: value
          outputs_to_string:
            source: prompt
          parameters:
            properties:
              path:
                description: Path to directory or file to view. Defaults to repository root.
                type: string
              repo:
                description: The owner/repository_name that you want to view.
                type: string
            required:
            - repo
            type: object
        type: haystack.tools.component_tool.ComponentTool
    type: haystack.components.agents.agent.Agent
  answer_builder:
    init_parameters:
      pattern:
      reference_pattern:
    type: haystack.components.builders.answer_builder.AnswerBuilder
  answer_formatter:
    init_parameters:
      required_variables: "*"
      template: |-
        {{ (messages|last).text }}

        _Viewed Files:_
        {% if documents is not none %}
        {% for document in documents -%}
        {% if document.meta.type == "file_content" %}
        [{{document.meta.path}}]({{ document.meta.url }})
        {% endif -%}
        {%- endfor -%}
        {% endif %}
      variables:
    type: haystack.components.builders.prompt_builder.PromptBuilder
  history_parser:
    init_parameters: {}
    type: dc_custom_component.components.parsers.chat_history_parser.DeepsetChatHistoryParser
  issue_builder:
    init_parameters:
      required_variables:
      - url
      - documents
      template:
      - content:
        - text: |-
            Issue from: {{ url }}
            {% for document in documents %}
            {% if loop.index == 1 %}
            **Title: {{ document.meta.title }}**
            {% endif %}
            <issue-comment>
            {{document.content}}
            </issue-comment>
            {% endfor %}
        meta: {}
        name:
        role: user
      variables:
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
  issue_fetcher:
    init_parameters:
      github_token:
        env_vars:
        - GITHUB_TOKEN
        strict: false
        type: env_var
      raise_on_failure: true
      retry_attempts: 2
    type: deepset_cloud_custom_nodes.github.github_issue_viewer.DeepsetGitHubIssueViewer
  issue_parser:
    init_parameters:
      consider_all_messages: false
      regex_pattern: https://github\.com/[^/]+/[^/]+/issues/\d+
      return_all_matches: false
      return_empty_on_no_match: true
    type: deepset_cloud_custom_nodes.parsers.regex_parser.DeepsetRegexParser
  joiner:
    init_parameters:
      list_type_: list[haystack.dataclasses.ChatMessage]
    type: dc_custom_component.components.joiners.list_joiner.ListJoiner

connections:
- receiver: joiner.values
  sender: history_parser.messages
- receiver: issue_parser.text_or_messages
  sender: history_parser.messages
- receiver: issue_fetcher.url
  sender: issue_parser.captured_text
- receiver: issue_builder.url
  sender: issue_parser.captured_text
- receiver: issue_builder.documents
  sender: issue_fetcher.documents
- receiver: joiner.values
  sender: issue_builder.prompt
- receiver: agent.messages
  sender: joiner.values
- receiver: answer_formatter.messages
  sender: agent.messages
- receiver: answer_formatter.documents
  sender: agent.documents
- receiver: answer_builder.replies
  sender: adapter.output
- receiver: adapter.message
  sender: answer_formatter.prompt

inputs:
  query:
  - answer_builder.query
  - history_parser.history_and_query

outputs:
  answers: answer_builder.answers

pipeline_output_type: chat

max_runs_per_component: 100

metadata: {}

Parameters

Init Parameters

These are the parameters you can configure in Pipeline Builder:

ParameterTypeDefaultDescription
github_tokenOptional[Secret]Secret.from_env_var('GITHUB_TOKEN', strict=False)GitHub personal access token for API authentication configured as a Secret. Add the token on the Secrets page under the name GITHUB_TOKEN. For detailed instructions, see Add Secrets.
raise_on_failureboolTrueIf True, raises exceptions on API errors.
max_file_sizeint1000000Maximum file size in bytes to fetch (default: 1MB).
repoOptional[str]NoneThe repository to browse in the format "owner/repo".
branchOptional[str]NoneThe Git branch to use.

Run Method Parameters

These are the parameters you can configure for the component's run() method. This means you can pass these parameters at query time through the API, in Playground, or when running a job. For details, see Modify Pipeline Parameters at Query Time.

ParameterTypeDefaultDescription
repoOptional[str]NoneRepository in format "owner/repo".
pathstrPath within repository (default: root).
refAnyGit reference (branch, tag, commit) to use.
branchOptional[str]NoneGit branch to use.