# Haystack Enterprise Platform Documentation Knowledge Graph

This knowledge graph represents the structure and relationships between documentation topics in the dc-docs repository (Haystack Enterprise Platform documentation).

## Main Documentation Areas

```mermaid
graph TB
    Root[Haystack Enterprise Platform Documentation]
    
    Root --> GettingStarted[Getting Started]
    Root --> Learn[Learn]
    Root --> Concepts[Concepts]
    Root --> HowTo[How-To Guides]
    Root --> Tutorials[Tutorials]
    Root --> API[API Reference]
    Root --> Builder[Builder]
    
    style Root fill:#2563eb,stroke:#1e40af,color:#fff
    style GettingStarted fill:#059669,stroke:#047857,color:#fff
    style Learn fill:#dc2626,stroke:#b91c1c,color:#fff
    style Concepts fill:#7c3aed,stroke:#6d28d9,color:#fff
    style HowTo fill:#ea580c,stroke:#c2410c,color:#fff
    style Tutorials fill:#0891b2,stroke:#0e7490,color:#fff
    style API fill:#4b5563,stroke:#374151,color:#fff
    style Builder fill:#65a30d,stroke:#4d7c0f,color:#fff
```

## Core Concepts Relationships

```mermaid
graph LR
    subgraph Core["Core Concepts"]
        Pipelines[Pipelines]
        Components[Pipeline Components]
        Indexes[Indexes]
        DocStores[Document Stores]
        Data[Data Flow]
        LLMs[Language Models]
        Jobs[Jobs]
        Agents[AI Agents]
    end
    
    Pipelines -->|contain| Components
    Pipelines -->|use| Indexes
    Pipelines -->|connect to| DocStores
    Pipelines -->|use| LLMs
    Pipelines -->|run as| Jobs
    Pipelines -->|can include| Agents
    
    Indexes -->|write to| DocStores
    Indexes -->|process| Data
    
    Components -->|read from| DocStores
    Components -->|use| LLMs
    
    Agents -->|use| Components
    Agents -->|call| Tools[Agent Tools]
    Agents -->|maintain| Memory[Agent Memory]
    
    style Pipelines fill:#2563eb,stroke:#1e40af,color:#fff
    style Components fill:#7c3aed,stroke:#6d28d9,color:#fff
    style Indexes fill:#059669,stroke:#047857,color:#fff
    style DocStores fill:#dc2626,stroke:#b91c1c,color:#fff
    style Agents fill:#ea580c,stroke:#c2410c,color:#fff
```

## Pipeline Components Ecosystem

```mermaid
graph TB
    subgraph ComponentTypes["Component Types"]
        Embedders[Embedders]
        Generators[Generators]
        Retrievers[Retrievers]
        Rankers[Rankers]
        Builders[Builders]
        Converters[Converters]
        Preprocessors[Preprocessors]
        Joiners[Joiners]
        Routers[Routers]
        Writers[Writers]
    end
    
    subgraph Providers["Model Providers"]
        OpenAI[OpenAI]
        Anthropic[Anthropic]
        Cohere[Cohere]
        AmazonBedrock[Amazon Bedrock]
        GoogleVertex[Google Vertex]
        HuggingFace[Hugging Face]
        Nvidia[Nvidia]
        Ollama[Ollama]
    end
    
    Embedders -->|powered by| Providers
    Generators -->|powered by| Providers
    Rankers -->|powered by| Providers
    
    Converters --> Preprocessors
    Preprocessors --> Embedders
    Embedders --> Retrievers
    Retrievers --> Rankers
    Rankers --> Builders
    Builders --> Generators
    Generators --> Writers
    
    style ComponentTypes fill:#f3f4f6,stroke:#9ca3af
    style Providers fill:#fef3c7,stroke:#fbbf24
```

## Document Stores Relationships

```mermaid
graph TB
    subgraph DocStores["Document Stores"]
        OpenSearch[OpenSearch<br/>Core Store]
        Elasticsearch[Elasticsearch]
        Pinecone[Pinecone]
        Weaviate[Weaviate]
        Qdrant[Qdrant]
        MongoDB[MongoDB Atlas]
        PgVector[PgVector]
        Snowflake[Snowflake]
    end
    
    subgraph Components["Components That Use Stores"]
        DocWriter[DocumentWriter]
        BM25[BM25 Retriever]
        Embedding[Embedding Retriever]
        Hybrid[Hybrid Retriever]
    end
    
    OpenSearch -->|core managed| DocWriter
    OpenSearch -->|retrieve from| BM25
    OpenSearch -->|retrieve from| Embedding
    
    Elasticsearch --> DocWriter
    Pinecone --> DocWriter
    Weaviate --> DocWriter
    Qdrant --> DocWriter
    MongoDB --> DocWriter
    PgVector --> DocWriter
    
    style OpenSearch fill:#2563eb,stroke:#1e40af,color:#fff
    style DocWriter fill:#059669,stroke:#047857,color:#fff
```

## Data Flow

```mermaid
graph LR
    subgraph Upload["Data Upload"]
        Files[Files]
        S3[AWS S3]
        VPC[Private VPC]
    end
    
    subgraph Indexing["Indexing Process"]
        Index[Index]
        Preprocess[Preprocessing]
        Documents[Documents]
    end
    
    subgraph Storage["Storage Layer"]
        DocStore[Document Store]
        Database[SQL Database]
    end
    
    subgraph Query["Query Pipeline"]
        UserQuery[User Query]
        Retriever[Retriever]
        GenOrLLM[Generator or LLM]
        AnswerBuilder[Answer builder]
        Answer[Answer]
    end
    
    Files --> S3
    Files --> VPC
    S3 --> Index
    VPC --> Index
    
    Index --> Preprocess
    Preprocess --> Documents
    Documents --> DocStore
    Files -->|metadata| Database
    
    UserQuery --> Retriever
    DocStore --> Retriever
    Retriever --> GenOrLLM
    GenOrLLM --> AnswerBuilder
    AnswerBuilder --> Answer
    Answer -->|results| Database
    
    style Upload fill:#dbeafe,stroke:#3b82f6
    style Indexing fill:#dcfce7,stroke:#22c55e
    style Storage fill:#fce7f3,stroke:#ec4899
    style Query fill:#fef3c7,stroke:#f59e0b
```

## AI Agents Architecture

```mermaid
graph TB
    subgraph Agent["AI Agent System"]
        AgentComp[Agent Component]
        LLM[Language Model]
        Memory[Agent Memory]
        Tools[Agent Tools]
    end
    
    subgraph ToolTypes["Tool Types"]
        Pipelines[Pipelines]
        CustomFunc[Custom Functions]
        MCP[MCP Servers]
        WebSearch[Web Search]
    end
    
    UserInput[User Input] --> AgentComp
    AgentComp --> LLM
    LLM -->|decides| Tools
    Tools -->|execute| ToolTypes
    ToolTypes -->|results| LLM
    LLM -->|output| Memory
    Memory -->|context| LLM
    LLM --> Answer[Answer]
    
    style AgentComp fill:#ea580c,stroke:#c2410c,color:#fff
    style Tools fill:#7c3aed,stroke:#6d28d9,color:#fff
    style Memory fill:#059669,stroke:#047857,color:#fff
```

## How-To Guides Organization

```mermaid
graph TB
    HowTo[How-To Guides]
    
    HowTo --> BuildingAgents[Building Agents]
    HowTo --> DesigningPipelines[Designing Pipelines]
    HowTo --> WorkingWithIndexes[Working with Indexes]
    HowTo --> WorkingWithData[Working with Data]
    HowTo --> Searching[Searching]
    HowTo --> Evaluating[Evaluating]
    HowTo --> Optimizing[Optimizing]
    HowTo --> Productionizing[Productionizing]
    HowTo --> ManagingAccess[Managing Access]
    HowTo --> WorkingWithJobs[Working with Jobs]
    HowTo --> UsingSDK[Using SDK]
    
    DesigningPipelines --> CreatePipeline[Create Pipeline]
    DesigningPipelines --> DeployPipeline[Deploy Pipeline]
    DesigningPipelines --> EditPipeline[Edit Pipeline]
    DesigningPipelines --> WorkWithLLMs[Work with LLMs]
    DesigningPipelines --> CustomComponents[Custom Components]
    DesigningPipelines --> HostedModels[Hosted Models]
    
    BuildingAgents --> ConfigureAgent[Configure Agent]
    BuildingAgents --> AdvancedConfig[Advanced Config]
    BuildingAgents --> MinimalAgent[Minimal Agent]
    BuildingAgents --> Troubleshooting[Troubleshooting]
    
    style HowTo fill:#ea580c,stroke:#c2410c,color:#fff
```

## Learning Path

```mermaid
graph LR
    subgraph Learn["Learn Section"]
        Basics[5-Step Guide]
        AppComponents[App Components]
        DocRetrieval[Document Retrieval]
        Extractive[Extractive QA]
        RAG[RAG QA]
        LLMOverview[LLM Overview]
        PromptEng[Prompt Engineering]
        MCP[Model Context Protocol]
    end
    
    Basics --> AppComponents
    AppComponents --> DocRetrieval
    DocRetrieval --> Extractive
    Extractive --> RAG
    RAG --> LLMOverview
    LLMOverview --> PromptEng
    
    style Basics fill:#059669,stroke:#047857,color:#fff
    style RAG fill:#dc2626,stroke:#b91c1c,color:#fff
```

## Tutorials Journey

```mermaid
graph TB
    Tutorials[Tutorials]
    
    subgraph Basics["Learn the Basics"]
        FirstSearch[First Document Search]
        FirstQA[First QA App]
        RobustRAG[Robust RAG System]
        DataCleaning[Data Cleaning Agent]
        PII[PII Masking]
        MongoDB[MongoDB RAG]
        AutoTagging[Auto-tagging with LLM]
        ConnectUI[Connect to UI]
        UploadCLI[Upload with CLI]
    end
    
    subgraph Advanced["Learn Advanced Features"]
        CustomComponent[Custom Component]
        DemoApp[Demo Your App]
        PythonUpload[Upload with Python]
    end
    
    subgraph RESTAPI["REST API Tutorials"]
        ChatApp[Chat App API]
        FeedbackAPI[Feedback API]
    end
    
    Tutorials --> Basics
    Tutorials --> Advanced
    Tutorials --> RESTAPI
    
    FirstSearch --> FirstQA
    FirstQA --> RobustRAG
    
    style Tutorials fill:#0891b2,stroke:#0e7490,color:#fff
```

## Cross-Cutting Concerns

```mermaid
graph TB
    subgraph CrossCutting["Cross-Cutting Concerns"]
        Security[Secrets & Integrations]
        Roles[User Roles & Permissions]
        Workspaces[Workspaces]
        Organizations[Organizations]
        Settings[Settings]
        Status[Platform Status]
    end
    
    subgraph AllAreas["Affects All Areas"]
        Pipelines2[Pipelines]
        Indexes2[Indexes]
        Data2[Data]
        Agents2[Agents]
    end
    
    Security -.->|secures| AllAreas
    Roles -.->|controls access| AllAreas
    Workspaces -.->|contains| AllAreas
    Organizations -.->|manages| Workspaces
    
    style CrossCutting fill:#f3f4f6,stroke:#9ca3af
```

## Component Providers and Integrations

```mermaid
graph TB
    subgraph Haystack["Haystack Components"]
        HaystackCore[Core Components]
        Embedders2[Embedders]
        Generators2[Generators]
        Retrievers2[Retrievers]
        Preprocessors2[Preprocessors]
        Builders2[Builders]
    end
    
    subgraph DeepsetCustom["deepset Custom Nodes"]
        Augmenters[Augmenters]
        Code[Code]
        Crawler[Firecrawl]
        DeepsetGen[deepset Generators]
        DeepsetConv[deepset Converters]
    end
    
    subgraph ThirdParty["Third-Party Integrations"]
        OpenAI2[OpenAI]
        Anthropic2[Anthropic]
        Cohere2[Cohere]
        Bedrock2[Amazon Bedrock]
        Vertex2[Google Vertex]
        Nvidia2[Nvidia]
        Jina2[Jina]
        Voyage2[Voyage]
        Mistral2[Mistral]
    end
    
    HaystackCore --> Embedders2
    HaystackCore --> Generators2
    HaystackCore --> Retrievers2
    HaystackCore --> Preprocessors2
    HaystackCore --> Builders2
    
    ThirdParty -.->|powers| HaystackCore
    ThirdParty -.->|powers| DeepsetCustom
    
    style Haystack fill:#2563eb,stroke:#1e40af,color:#fff
    style DeepsetCustom fill:#059669,stroke:#047857,color:#fff
    style ThirdParty fill:#fef3c7,stroke:#fbbf24
```

## Documentation Structure Summary

### Top-Level Organization

File counts are MDX pages under `docs/` (approximate; rerun `find` when restructuring).

1. **Getting Started** (~37 files)
   - Basic concepts
   - Quick start guide
   - What's new (releases)
   - Platform status
   - Settings management
   - Working in Haystack Enterprise Platform

2. **Learn** (~9 files)
   - 5-step guide to prototyping
   - Document retrieval
   - Extractive QA
   - RAG QA
   - LLM overview
   - Prompt engineering
   - Model Context Protocol

3. **Concepts** (~26 files under `docs/concepts/`)
   - Pipelines (including examples and multimodal topics)
   - AI Agents
   - Document stores
   - Indexes
   - Data in the platform
   - Language models
   - Jobs
   - Roles, secrets, and integrations (cross-cutting)

4. **Reference — Pipeline components** (~234 files under `docs/reference/pipeline-components/`)
   - AI components (for example, `Agent`, `LLM`)
   - Knowledge retrieval, data processing, logic and flow
   - Third-party integrations (provider-specific components)
   - Custom `Code` and workspace custom components
   - Legacy and deprecated components
   - Input, output, and overview pages

5. **How-To Guides** (~107 files)
   - Building agents
   - Designing pipelines (including smart connections and hosted models)
   - Working with indexes and data
   - Searching, evaluating, optimizing
   - Productionizing, managing access, jobs
   - Using the SDK and REST API workflows

6. **Tutorials** (~14 files)
   - Learn the basics (9 tutorials)
   - Learn advanced features (3 tutorials)
   - REST API tutorials (2 tutorials)

7. **API Reference** (~227 files)
   - Main REST API (OpenAPI-generated pages)
   - Jobs API and related endpoints

8. **Builder** (1 file)
   - Deploy with Hayhooks

## Key Relationships Summary

### Primary Dependencies
- **Pipelines** depend on Components, Indexes, Document Stores, and LLMs
- **Indexes** depend on Document Stores and process Data
- **Components** depend on Document Stores and LLMs
- **AI Agents** depend on Components, Tools, and maintain Memory
- **Query pipelines** depend on enabled Indexes
- **Document Stores** are used by both Indexes (write) and Retrievers (read)

### Data Flow Path
1. Files → Upload to S3/VPC
2. Files → Index (preprocessing)
3. Index → Documents → Document Store
4. User Query → Retriever → Document Store
5. Retriever → Documents → Generator or LLM
6. Generator or LLM → Answer builder (`DeepsetAnswerBuilder` or Haystack `AnswerBuilder`) → Answer

### Agent Workflow
1. User Input → Agent Component
2. Agent → LLM (with tools list)
3. LLM → Tool Call OR Direct Answer
4. Tool → Execution → Result
5. Result → Check Exit Condition
6. Continue loop or return answer

### Component Pipeline
Files → Converters → Preprocessors → Embedders → (Storage) → Retrievers → Rankers → Prompt builders → Generators or LLM → Answer builders → Output

Smart connections can merge compatible lists (for example, multiple retrievers into one `documents` input) and convert between some types (for example, `string` and `ChatMessage`), which reduces the need for joiners and adapters in many pipelines.

## Cross-References and Integration Points

- **Security & Access**: Applies to all pipelines, indexes, and data
- **Workspaces**: Contain pipelines, indexes, and files
- **Organizations**: Contain multiple workspaces
- **Jobs**: Can run any pipeline type
- **LLMs**: Used by Generators, ChatGenerators, the `LLM` component, and Agents
- **Document Stores**: Central to both indexing and querying
- **Embedders**: Must use same model in indexing and query pipelines
- **Agents**: Can use pipelines as tools

## Special Component Combinations

Common patterns documented:
- Retriever + Ranker
- PromptBuilder + Generator
- ChatPromptBuilder + ChatGenerator
- Embedder + Retriever
- Joiner + Ranker (often replaceable with smart connections)
- Generator or LLM + `DeepsetAnswerBuilder` (RAG answers with references in the UI) or Haystack `AnswerBuilder`
- `Input` (`messages`) + `Agent` (minimal agent pipelines)
- Router + Multiple paths
- Validator + Loop
